CN111582464B

CN111582464B - Neural network processing method, computer system and storage medium

Info

Publication number: CN111582464B
Application number: CN202010364385.5A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2023-09-29
Anticipated expiration: 2037-12-29
Also published as: CN111582464A; CN109993288B; CN109993288A

Abstract

The invention provides a neural network processing method, which comprises the following steps: obtaining a model data set and model structure parameters of an original network, wherein the model data set comprises network weights corresponding to all computing nodes in the original network, and the model structure parameters comprise connection relations of a plurality of computing nodes in the original network and computing attributes of all computing nodes; operating the original network according to the model data set and the model structure parameters of the original network to obtain instructions corresponding to all computing nodes in the original network; and generating an offline model corresponding to the original network according to the network weight and the instruction corresponding to each computing node of the original network, and storing the offline model corresponding to the original network into a nonvolatile memory. The invention also provides a computer system and a storage medium. The neural network processing method, the computer system and the storage medium shorten the running time of the processor running the same network and improve the processing speed and efficiency of the processor.

Description

Neural network processing method, computer system and storage medium

Technical Field

The present invention relates to the field of deep learning technologies, and in particular, to a neural network processing method, a computer system, and a storage medium.

Background

With the development of artificial intelligence technology, deep learning is now ubiquitous and indispensable, and many scalable deep learning systems, such as TensorFlow, MXNet, caffe and PyTorch, etc., have been developed with the development, which can be used to provide various neural network models capable of running on a processor such as a CPU or GPU. Generally, when a neural network model is operated, for example, when a Caffe network model is operated, each computing node in the neural network model needs to be compiled and parsed respectively, and then each computing node is executed according to a certain form according to the structural form of the neural network model. The neural network model and the network structure can be trained or untrained artificial neural network model data. The processing method for the neural network can influence the processing speed of the processor, and the processing efficiency is low.

Disclosure of Invention

In view of the problem of low processing efficiency caused by the network model processing method, the invention aims to provide a neural network processing method, a computer system and a storage medium, and the processing speed and processing efficiency of the device on the neural network are improved.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a method of processing a neural network, the method comprising the steps of:

obtaining a model data set and model structure parameters of an original network, wherein the model data set comprises network weights corresponding to all computing nodes in the original network, and the model structure parameters comprise connection relations of a plurality of computing nodes in the original network and computing attributes of all computing nodes;

operating the original network according to a model data set and model structure parameters of the original network to obtain instructions corresponding to all computing nodes in the original network;

and generating an offline model corresponding to the original network according to the network weight and the instruction corresponding to each computing node of the original network, and storing the offline model corresponding to the original network into a nonvolatile memory.

In one embodiment, the step of operating the original network according to the model dataset and the model structure parameters of the original network to obtain the instructions corresponding to each computing node in the original network includes:

obtaining the execution sequence of each computing node in the original network according to the model structure parameters of the original network;

And operating the original network according to the execution sequence of each computing node in the original network, and respectively obtaining the instructions corresponding to each computing node in the original network.

In one embodiment, the step of generating the offline model corresponding to the original network according to the network weights and the instructions corresponding to the computing nodes of the original network includes:

obtaining a memory allocation mode of the original network according to the model data set and the model structure parameters of the original network;

storing related data in the operation process of the original network into a first memory according to a memory allocation mode of the original network, wherein the related data in the operation process of the original network comprises network weights, instructions, input data and output data corresponding to all computing nodes of the original network;

and acquiring network weights and instructions corresponding to all computing nodes of the original network from the first memory, and storing the network weights and instructions corresponding to all computing nodes of the original network in a second memory to generate the offline model, wherein the second memory is a nonvolatile memory.

In one embodiment, the offline model further includes node interface data, where the node interface data is used to represent a connection relationship between each computing node of the original network.

In one embodiment, the step of operating the original network based on a model dataset and model structure parameters of the original network comprises:

a processor or virtual device of the computer system runs the original network according to a model dataset and model structure parameters of the original network.

In one embodiment, the method further comprises the steps of:

obtaining a new model data set and model structure parameters of the original network;

if the new original network has the corresponding offline model, acquiring the offline model corresponding to the new original network from the nonvolatile memory, and operating the new original network according to the offline model corresponding to the new original network;

and if the new original network does not have the corresponding offline model, operating the new original network according to the model data set and the model structure parameters of the new original network, and generating and storing the offline model corresponding to the new original network.

Meanwhile, the invention also provides a neural network processing method, which further comprises the following steps:

obtaining model structure parameters of an original network, wherein the model structure parameters comprise connection relations of a plurality of computing nodes in the original network;

acquiring an offline model corresponding to the original network from a nonvolatile memory, wherein the offline model corresponding to the original network comprises network weights and instructions corresponding to all computing nodes of the original network;

and operating the original network according to the offline model corresponding to the original network and the model structure parameters of the original network.

Based on the same inventive concept, the invention also provides a computer system, which comprises a processor, a first memory and a second memory, wherein the first memory or the second memory stores a computer program, and the processor executes any one of the methods when executing the computer program.

In one embodiment, the processor is a central processor, an image processor, a digital signal processor, a field programmable gate array, or a dedicated neural network processor.

Furthermore, the present invention provides a computer storage medium having a computer program stored therein, which, when executed by one or more processors, performs the method of any of the above.

The beneficial effects of the invention are as follows:

according to the neural network processing method, the computer system and the storage medium, the instructions corresponding to each computing node generated in the operation process of the original network and the network weights corresponding to each computing node of the original network are stored to obtain the offline model corresponding to the original network, so that when the original network is operated again, the offline model corresponding to the original network can be operated directly, the compiling of related data such as a model data set and model structure parameters of the original network is not needed again, the operation time of the processor for operating the same network is shortened, and the processing speed and efficiency of the processor are improved.

Drawings

FIG. 1 is a system block diagram of a computer system of an embodiment;

FIG. 2 is a system block diagram of a computer system of another embodiment;

FIG. 3 is a flowchart of a method for processing a neural network according to an embodiment;

FIG. 4 is a flowchart of a method for processing a neural network according to another embodiment;

FIG. 5 is a flowchart of a method of processing a neural network according to yet another embodiment;

FIG. 6 is a flowchart of a method of processing a neural network according to yet another embodiment;

FIG. 7 is a network architecture diagram of a neural network of an embodiment;

FIG. 8 is a schematic diagram of an offline model generation process of the neural network of FIG. 7;

FIG. 9 is a system block diagram of a computer system of yet another embodiment;

FIG. 10 is a flowchart of a method for processing a neural network according to an embodiment;

FIG. 11 is a flowchart of a method for processing a neural network according to an embodiment;

FIG. 12 is a flowchart of a method for processing a neural network according to an embodiment;

FIG. 13 is a flowchart of a method for processing a neural network according to an embodiment;

FIG. 14 is a network structure diagram of a neural network of an embodiment and an equivalent network structure diagram;

FIG. 15 is a schematic diagram of a first offline model generation process of the neural network of FIG. 14;

fig. 16 is a schematic diagram of a process for generating the first offline model and the second offline model of the neural network in fig. 14.

Detailed Description

In order to make the technical scheme of the present invention clearer, the neural network processing method, the computer system and the storage medium of the present invention are described in further detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

FIG. 1 is a block diagram of a computer system that may include a processor 110, a first memory 120 and a second memory 130 coupled to the processor 110, according to one embodiment. The processor 110 is configured to provide computing and control capabilities, and may include an acquisition module 111, an operation module 113, and a control module 112, where the acquisition module 111 may be a hardware module such as an IO (Input/Output) interface, and the operation module 113 and the control module 112 are both hardware modules. For example, the operation module 113 and the control module 112 may be digital circuits or analog circuits. Physical implementations of the above hardware circuits include, but are not limited to, physical devices including, but not limited to, transistors and memristors, and the like.

Alternatively, the processor may be a general-purpose processor, such as a CPU (Central Processing Unit ), GPU (Graphics Processing Unit, graphics processor) or DSP (Digital Signal Processing ), and may also be a dedicated neural network processor such as an IPU (Intelligence Processing Unit, intelligent processor). Of course, the processor may also be an instruction set processor, an associated chipset, a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or an on-board memory for cache use, or the like.

The first memory or the second memory may also store a computer program for implementing the neural network processing method provided in the embodiment of the present application. Specifically, the neural network processing method is used for generating an offline model corresponding to an original network received by the processor, the offline model corresponding to the original network may include necessary network structure information such as a network weight of each computing node in the original network and an instruction, where the instruction may be used to indicate what computing function the computing node is used to execute, and the instruction may specifically include information such as a computing attribute of each computing node in the original network and a connection relationship between each computing node, so when the processor runs the original network again, the offline model corresponding to the original network may be directly run, and the same original network does not need to be compiled again, so that the running time when the processor runs the network is shortened, and further the processing speed and efficiency of the processor are improved.

Further, the first memory 120 may be an internal memory, such as a volatile memory like a cache, which may be used to store relevant data during operation of the neural network, such as network input data, network output data, network weights and instructions, and so on. The second memory 130 may be a nonvolatile memory such as an external memory, and the second memory may be used to store an offline model corresponding to the neural network. Therefore, when the computer system needs to compile the same original network again to run the original network, the offline model corresponding to the original network can be directly obtained from the second memory, so that the processing speed and the processing efficiency of the processor are improved.

Of course, in other embodiments, the computer system may also include a processor and a memory, as shown in FIG. 2, the computer system may include a processor 210 and a memory 220 coupled to the processor 210. The processor 210 may include an acquisition module 211, a control module 212, and an operation module 213, the specific structure of which may be found in the description of the processor 110 above. The memory 220 may include a first storage unit 221, a second storage unit 222, and a third storage unit 223, where the first storage unit 221 may be used to store a computer program for implementing the neural network processing method provided in the embodiment of the present application. The second storage unit 222 may be configured to store related data during operation of the original network, and the third storage unit 223 is configured to store an offline model corresponding to the original network. Further, the number of memory cells included in the memory may be greater than three, which is not particularly limited herein.

It should be clear that the running raw network in this embodiment refers to that the processor runs some machine learning algorithm (such as a neural network algorithm) using the artificial neural network model data, and implements the target application of the algorithm (such as an artificial intelligence application like speech recognition) by performing a forward operation. In this embodiment, directly running the offline model corresponding to the original network refers to running a machine learning algorithm (such as a neural network algorithm) corresponding to the original network using the offline model, and implementing a target application (such as an artificial intelligence application such as voice recognition) of the algorithm by executing a forward operation.

As shown in fig. 3, the processing method of the neural network according to an embodiment of the present invention is configured to generate and store an offline model of an original network according to acquired related data of the original network, so that when the processor operates the original network again, the offline model corresponding to the original network can be directly operated without performing operations such as compiling on the same original network again, thereby shortening the operation time when the processor operates the network, and further improving the processing speed and efficiency of the processor. Specifically, the method comprises the following steps:

s100, acquiring a model data set and model structure parameters of an original network, wherein the model data set and the model structure parameters of the original network can be acquired through an acquisition module of a processor, and a network structure diagram of the original network can be acquired through the model data set and the model structure parameters of the original network. The model dataset includes data such as network weights corresponding to the computing nodes in the original network, and W1 to W6 in the neural network shown in fig. 7 are used to represent the network weights of the computing nodes. The model structure parameter includes connection relations of a plurality of computing nodes in the original network and computing attributes of the computing nodes, wherein the connection relations among the computing nodes are used for indicating whether data transmission exists among the computing nodes, for example, when data flow transmission exists among the computing nodes, the connection relations among the computing nodes can be described. Further, the connection relationships of the computing nodes may include input relationships and output relationships, and so on. As shown in fig. 7, when the output of the computing node F1 is input to the computing nodes F4 and F5, it can be explained that the computing node F1 and the computing node F4 have a connection relationship, and the computing node F1 and the computing node F4 have a connection relationship. For another example, if there is no data transfer between the computing node F1 and the computing node F2, it may be explained that there is no connection relationship between the computing node F1 and the computing node F2.

The calculation attribute of each calculation node may include a calculation type and a calculation parameter of the corresponding calculation node, where the calculation type of the calculation node refers to what calculation the calculation node is used to perform, for example, the calculation type of the calculation node may include addition operation, subtraction operation, convolution operation, and the like, and accordingly, the calculation node may be a calculation node for implementing addition operation, a calculation node for implementing subtraction operation, or a calculation node for implementing convolution operation, and the like. The computation parameters of a computation node may be necessary parameters required to complete the computation type corresponding to the computation node. For example, the calculation type of the calculation node may be a calculation node for implementing an addition operation, and accordingly, the calculation parameter of the calculation node may be an addend in the addition operation, the added number in the addition operation may be obtained as input data by the obtaining module, or the added number in the addition operation may be output data of a previous calculation node of the calculation node, or the like.

Alternatively, the original network may be an artificial neural network built for a general purpose processor such as a CPU, GPU, or DSP based on a deep learning system such as TensorFlow, MXNet, caffe and PyTorch. The original network can also be an artificial neural network established for intelligent processors such as IPUs. For example, when the original network is a neural network built based on Caffe, a model dataset (caffemodel) and model structure parameters (prototxt) of the Caffe network may be obtained. The model data set (caffeodel) includes data such as a network weight of the Caffe network, and the model structure parameter (prototxt) includes a calculation attribute of each calculation node of the Caffe network, a connection relationship between a plurality of calculation nodes, and the like.

And S200, operating the original network according to the model data set and the model structure parameters of the original network to obtain instructions corresponding to all computing nodes in the original network. Specifically, the operation module of the processor may operate the original network according to the model dataset and the model structure parameter of the original network, and obtain the instruction corresponding to each computing node in the original network. Further, the acquiring module of the processor may further acquire input data of the original network, and the operation module of the processor may operate the original network according to the input data of the original network, the network model dataset and the model structure parameters, so as to obtain instructions corresponding to each computing node in the original network. Still further, the above-described process of executing the instructions of the original network to obtain the instructions of the respective computing nodes is essentially a compiled process, which may be implemented by a processor or a virtual device of a computer system. That is, a processor or virtual device of the computer system runs the original network based on the model dataset and model structure parameters of the original network. The virtual device refers to a section of processor running space virtualized in the memory space of the storage.

It should be clear that the running raw network in this embodiment refers to that the processor runs some machine learning algorithm (such as a neural network algorithm) using the artificial neural network model data, and implements the target application of the algorithm (such as an artificial intelligence application like speech recognition) by performing a forward operation.

S300, generating an offline model corresponding to the original network according to the network weight and the instruction corresponding to each computing node of the original network, and storing the offline model corresponding to the original network into a nonvolatile memory (such as a database). Specifically, the control module of the processor may generate the offline model corresponding to the original network according to the network weights and the instructions corresponding to the computing nodes of the original network, for example, the control module of the processor may store the network weights and the instructions corresponding to the computing nodes of the original network in the nonvolatile second memory, so as to generate and store the offline model. The network weight and the instruction of each computing node of the original network are stored in a one-to-one correspondence mode. Therefore, when the original network is operated again, the offline model corresponding to the original network can be directly obtained from the nonvolatile memory, and the original network is operated according to the offline model corresponding to the offline model, so that all computing nodes of the original network do not need to be compiled on line to obtain instructions, and the operation speed and efficiency of the system are improved.

It should be clear that, in this embodiment, directly running the offline model corresponding to the original network refers to running the machine learning algorithm (such as the neural network algorithm) corresponding to the original network using the offline model, and implementing the target application (such as the artificial intelligence application of voice recognition, etc.) of the algorithm by executing the forward operation.

Optionally, as shown in fig. 4, the step S200 may include:

s210, according to the model structure parameters of the original network, the execution sequence of each computing node in the original network is obtained. Specifically, the operation module of the processor may obtain the execution sequence of each computing node in the original network according to the model structure parameters of the original network, and further, the operation module of the processor may obtain the execution sequence of each computing node in the original network according to the connection relationship of each computing node in the original network. For example, as shown in fig. 7, the input data of the computing node F4 is the output data of the computing node F1 and the output data of the computing node F2, and the input data of the computing node F6 is the output data of the computing node F4 and the output data of the computing node F5. Thus, the order of execution of the various compute nodes in the neural network shown in FIG. 7 may be F1-F2-F3-F4-F5-F6 or F1-F3-F2-F5-F4-F6, etc. Of course, the computing nodes F1, F2 and F3 may be executed in parallel, and the computing nodes F4 and F5 may be executed in parallel, which is merely illustrated herein and not particularly limited in order of execution.

S220, operating the original network according to the execution sequence of each computing node in the original network, and respectively obtaining the instructions corresponding to each computing node in the original network. Specifically, the operation module of the processor may operate the original network according to the execution sequence of each computing node in the original network, so as to obtain an instruction corresponding to each computing node in the original network, that is, the processor may compile data such as a model dataset of the original network to obtain an instruction corresponding to each computing node, and by using the instruction corresponding to each computing node, it may be known what computing function the computing node is used to implement, that is, it may be able to obtain computing attributes such as a computing type and a computing parameter of the computing node.

Further, as shown in fig. 4, the step S300 further includes:

s310, obtaining the memory allocation mode of the original network according to the model data set and the model structure parameters of the original network. Specifically, the operation module of the processor may obtain the memory allocation manner of the original network according to the model dataset and the model structure parameter of the original network. Further, the processor may obtain an execution sequence of each computing node in the original network according to the model structure parameter of the original network, and determine a memory allocation manner of the current network according to the execution sequence of each computing node in the original network. For example, the relevant data of each computing node in the running process is saved into a stack according to the execution sequence of each computing node. The memory allocation method refers to determining storage locations of data (including input data, output data, network weight data, intermediate result data, etc.) related to each computing node in the original network on a memory space (such as the first memory). For example, a data table may be used to store mappings between data associated with each computing node (input data, output data, network weight data, intermediate result data, etc.) and memory space.

And S320, storing related data in the operation process of the original network into a first memory according to a memory allocation mode of the original network, wherein the related data in the operation process of the original network comprises network weights, instructions, input data, intermediate calculation results, output data and the like corresponding to all calculation nodes of the original network. For example, as shown in fig. 7, X1 and X2 represent input data of the neural network, Y represents output data of the neural network, and the processor may convert the output data of the neural network into control commands for controlling the robot or a different digital interface. W1-W6 are used to represent the network weights corresponding to the computing nodes F1, F2 and F3, and the output data of the computing nodes F1-F5 can be used as intermediate computing results. The processor may store the related data in the original network operation process to the first memory, such as the internal memory or the volatile memory such as the cache, according to the determined memory allocation manner, and the specific storage manner may be referred to the left half storage space in fig. 8.

S330, acquiring network weights and instructions corresponding to all computing nodes of the original network from the first memory, and storing the network weights and instructions corresponding to all computing nodes of the original network in the second memory to generate an offline model. The second memory may be a nonvolatile memory such as an external memory. The generation process of the offline model can be specifically shown in fig. 8, and the offline model stored in the storage space of the right half part in fig. 8 is the corresponding offline model of the original network.

As shown in fig. 7 and 8, the offline model generation process described above is described below with reference to the drawings:

first, the processor may obtain a model dataset, model structure parameters, and input data of the original network, so that a network structure diagram of the original network may be obtained from the model dataset and model structure parameters of the original network, as shown in fig. 7.

And secondly, the processor can obtain the connection relation of each calculation node of the original network according to the model structure parameters of the original network, and obtain the execution sequence of each calculation node in the original network and the memory allocation mode of the original network in the operation process according to the connection relation of each calculation node, so that the storage position of the related data of the original network in the operation process can be obtained. As shown in the left half of the memory space in fig. 8, the relevant data of the original network during operation may be stored in a stack according to the execution sequence of each computing node.

Finally, the processor may store the network weights and instructions corresponding to the computing nodes of the original network in the nonvolatile second memory, and generate an offline model, where the storage manner of the offline model may be shown in the right half of the storage space in fig. 8. In addition, the offline model only contains the data such as network weights and instructions necessary for operating the original network, and does not need to store input data, output data or intermediate calculation results and the like in the operation process of the original network, so that the consumption of the storage space in the second memory can be reduced.

As a further improvement, the offline model also comprises node interface data, wherein the node interface data is used for representing the connection relation of each computing node of the original network. In particular, the node interface data may include input data sources and output data sources for the respective computing nodes. For example, as shown in fig. 7, the node interface data may include computing nodes F1, F2, and F3 as starting computing nodes, inputs of which are preset input data, respectively, and output data of the computing node F1 is input data of the computing nodes F4 and F5, and so on. Thus, when the original network is operated again, only the initial computing node and the input data of the original network are needed to be obtained, and then the original network can be executed according to the offline model corresponding to the original network.

In one embodiment, the offline model may be used to perform operations immediately after the offline model is generated, or the offline model may be saved and reused when calculation is needed. As shown in fig. 5, the method further includes the following steps:

s400, acquiring a new model data set and model structure parameters of the original network; specifically, the acquisition module of the processor acquires a model data set and model structure parameters of a new original network, and the network structure diagram of the new original network can be obtained through the model data set and the model structure parameters of the new original network.

S500, judging whether a new original network has a corresponding offline model or not; specifically, whether the model data set of the new original network is the same as the model data set of the original network or not, whether the model structure parameters of the new original network are the same as the model structure parameters of the original network or not may be determined, if the model data sets of the new original network are the same as the model structure parameters of the original network, the new original network and the original network may be considered to be the same network, and at this time, it may be determined that the new original network has a corresponding offline model.

If the new original network does not have the offline model, the new original network is operated according to the model data set and the model structure parameters of the new original network, the offline model corresponding to the new original network is generated, and the offline model corresponding to the new original network is stored in the nonvolatile memory. In particular, when the model dataset of the new original network is different from the model dataset of the original network and/or the model structure parameters of the new original network are different from the model structure parameters of the original network, the original network may be considered to belong to a different network than the new original network, which has no offline model. When the new original network does not have the offline model, the above steps S100 to S300 may be performed, and the specific implementation process may be referred to the above description, which is not repeated here.

Further, the new original network can be judged whether to have the offline model corresponding to the new original network by traversing the data set containing a plurality of offline models or traversing a plurality of data sets containing a plurality of offline model records.

If the new original network has the offline model, the offline model corresponding to the new original network can be obtained from the nonvolatile memory, and the new original network is operated according to the offline model corresponding to the new original network. Specifically, if the new original network has an offline model, the following steps may be performed:

s510, acquiring a new offline model corresponding to the original network; specifically, the obtaining module of the processor may read the offline model corresponding to the new original network from the second memory, that is, the obtaining module of the processor may read the network weights and the instructions corresponding to the computing nodes in the new original network from the second memory.

S520, according to the model structure parameters of the new original network, the execution sequence of each calculation node in the new original network is obtained; specifically, the operation module of the processor may obtain the execution sequence of each computing node in the new original network according to the connection relationship of each computing node in the new original network. For specific execution, reference is made to step S210.

And S530, according to the execution sequence of each computing node in the new original network, network weights and instructions corresponding to each computing node of the new original network are sequentially obtained from the offline model. Specifically, the obtaining module of the processor may sequentially obtain, according to the execution sequence of each computing node in the new original network, the network weights and instructions corresponding to each computing node in the new original network from the offline model.

S540, running the new original network according to the network weight and the instruction corresponding to each computing node of the new original network. Specifically, the operation module of the processor may directly operate the new original network according to the network weights and instructions corresponding to the computing nodes of the new original network, without repeatedly compiling the computing nodes.

For example, the execution sequence of each computing node in the neural network shown in fig. 7 may be F1-F2-F3-F4-F5-F6, when the neural network has an offline model, the network weight and the instruction corresponding to the computing node F1 may be obtained from the offline model first, and then the network weights and the instructions of the computing instructions F2-F6 may be obtained sequentially, so that each computing node of the new original network may be operated sequentially without recompilation of each node in the new original network, thereby improving the operation speed and efficiency of the processor.

In one embodiment, the offline model further includes node interface data, where the node interface data is used to represent a connection relationship between each computing node of the original network, and for example, the node interface data may include an input data source and an output data source of each computing node. At this time, the step of running the new original network according to the offline model corresponding to the original network includes:

acquiring a new offline model corresponding to the original network; specifically, the obtaining module of the processor may read an offline model corresponding to the new original network from the second memory, where the offline model includes node interface data.

Obtaining a starting computing node of the new original network according to the model structure parameters of the new original network; specifically, the operation module of the processor may obtain the initial computing node of the new original network according to the model structure parameters of the new original network. The input data of the initial computing node is network input data, such as computing nodes F1, F2 and F3, and no other computing node exists before the initial computing node.

And according to the initial computing node of the new original network and node interface data in the offline model, network weights and instructions corresponding to all computing nodes of the new original network are sequentially obtained from the offline model. Specifically, the obtaining module of the processor may sequentially obtain, from the offline model, the network weights and instructions corresponding to the computing nodes of the new original network according to the initial computing node of the new original network and the node interface data in the offline model.

And operating the new original network according to the network weights and the instructions corresponding to the computing nodes of the new original network. Specifically, the operation module of the processor may operate the new original network according to the network weights and instructions corresponding to the computing nodes of the new original network.

For example, when the starting computing nodes in the neural network shown in fig. 7 are the computing nodes F1, F2 and F3, when the neural network has an offline model, the instructions and the network weights of the respective starting computing nodes F1, F2 and F3 may be obtained from the offline model first, and then the computing nodes F4 and F5 connected to the starting computing nodes F1, F2 and F3 may be obtained from the node interface data in the offline model, so that the instructions and the network weights of the computing nodes F4 and F5 may be obtained. And then, according to the node interface data in the offline model, obtaining a computing node F6 connected with the computing nodes F4 and F5, and obtaining an instruction and a network weight of the computing node F6. Therefore, according to each calculation node of the new original network which is operated in turn, each node in the new original network does not need to be recompiled, and the operation speed and the efficiency of the processor are improved.

Further, when the offline model includes node interface data, the interface data not only includes connection relations between each computing node in the original network, but also includes information such as a starting computing node of the original network. At this time, the step of running the new original network according to the offline model corresponding to the original network includes:

Acquiring a new offline model corresponding to the original network; specifically, the obtaining module of the processor may read an offline model corresponding to the new original network from the second memory, where the offline model includes node interface data, and information such as connection relationships between the starting computing node and other computing nodes of the new original network may be obtained through the node interface data.

In one embodiment, as shown in fig. 6, an embodiment of the present invention further provides a method for processing a neural network, where the method includes the following steps:

S610, obtaining model structure parameters of the original network, wherein the model structure parameters comprise connection relations of a plurality of computing nodes in the original network. Specifically, the connection relationship between the computing nodes is used to indicate whether there is data transfer between the computing nodes, for example, when there is data flow transfer between the computing nodes, it may be explained that there is a connection relationship between the computing nodes. Further, the connection relationships of the computing nodes may include input relationships and output relationships, and so on.

S620, acquiring an offline model corresponding to the original network from a nonvolatile memory, wherein the offline model corresponding to the original network comprises network weights and instructions corresponding to all computing nodes of the original network, and storing the network weights and instructions of the computing nodes in a one-to-one correspondence manner for each computing node of the original network. The processor can know the calculation function used by the calculation node through the corresponding instruction of each calculation node, and can obtain the calculation attributes such as the calculation type, calculation parameters and the like of the calculation node.

And S630, operating the original network according to the offline model corresponding to the original network and the model structure parameters of the original network. Specifically, in this embodiment, directly running the offline model corresponding to the original network refers to running the machine learning algorithm (such as a neural network algorithm) corresponding to the original network using the offline model, and implementing the target application (such as an artificial intelligence application such as speech recognition) of the algorithm by executing the forward operation.

In one embodiment, the step S630 may be specifically implemented by steps S510 to S540 in fig. 5. Specifically, the step S630 may include the following steps:

obtaining the execution sequence of each computing node in the original network according to the model structure parameters of the original network; specifically, the operation module of the processor may obtain the execution sequence of each computing node in the original network according to the connection relationship of each computing node in the original network. For specific execution, reference is made to step S210.

And according to the execution sequence of each computing node in the original network, network weights and instructions corresponding to each computing node of the original network are sequentially obtained from the offline model. Specifically, the obtaining module of the processor may sequentially obtain, according to the execution sequence of each computing node in the original network, the network weights and instructions corresponding to each computing node in the original network from the offline model.

And operating the original network according to the network weight and the instruction corresponding to each computing node of the original network. Specifically, the operation module of the processor can directly operate the original network according to the network weight and the instruction corresponding to each calculation node of the original network, without repeatedly compiling each calculation node.

In one embodiment, the offline model further includes node interface data, where the node interface data is used to represent a connection relationship between each computing node of the original network, and for example, the node interface data may include an input data source and an output data source of each computing node. At this time, the step of operating the original network according to the offline model corresponding to the original network and the model structure parameters of the original network includes:

obtaining an initial computing node of the original network according to the model structure parameters of the original network; specifically, the operation module of the processor may obtain the initial computing node of the original network according to the model structure parameters of the original network. The input data of the initial computing node is network input data, such as computing nodes F1, F2 and F3, and no other computing node exists before the initial computing node.

And according to the initial computing node of the original network and node interface data in the offline model, network weights and instructions corresponding to all computing nodes of the original network are sequentially obtained from the offline model. Specifically, the obtaining module of the processor may sequentially obtain, from the offline model, the network weights and instructions corresponding to the computing nodes of the original network according to the initial computing node of the original network and the node interface data in the offline model.

And operating the original network according to the network weight and the instruction corresponding to each computing node of the original network.

Further, when the offline model includes node interface data, the step of running the original network according to the offline model corresponding to the original network and the model structure parameters of the original network may include:

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Meanwhile, an embodiment of the present invention further provides a computer system, including a processor, a first memory and a second memory, where the first memory or the second memory stores a computer program, and when the processor executes the computer program, the processor executes the method of any one of the embodiments above. Specifically, when the processor executes the above computer program, the following steps are specifically performed:

the method comprises the steps of obtaining a model data set and model structure parameters of an original network, specifically, obtaining the model data set and the model structure parameters of the original network through an obtaining module of a processor, and obtaining a network structure diagram of the original network through the model data set and the model structure parameters of the original network. The model data set includes data such as network weights corresponding to each computing node in the original network, the model structure parameters include connection relations of a plurality of computing nodes in the original network and computing attributes of each computing node, wherein the connection relations among the computing nodes are used for representing whether data transmission exists among the computing nodes, the computing attributes of each computing node can include computing types and computing parameters of the corresponding computing node, the computing types of the computing nodes refer to what computing the computing node is used for completing, for example, the computing types of the computing nodes can include addition operation, subtraction operation, convolution operation and the like, and correspondingly, the computing nodes can be computing nodes used for achieving addition operation, computing nodes used for achieving subtraction operation or computing nodes used for achieving convolution operation and the like. The computation parameters of a computation node may be necessary parameters required to complete the computation type corresponding to the computation node.

And operating the original network according to the model data set and the model structure parameters of the original network to obtain instructions corresponding to all the computing nodes in the original network. Specifically, the operation module of the processor may operate the original network according to the model dataset and the model structure parameter of the original network, and obtain the instruction corresponding to each computing node in the original network. Further, the acquiring module of the processor may further acquire input data of the original network, and the operation module of the processor may operate the original network according to the input data of the original network, the network model dataset and the model structure parameters, so as to obtain instructions corresponding to each computing node in the original network. Still further, the above-described process of executing the instructions of the original network to obtain the instructions of the respective computing nodes is essentially a compiled process, which may be implemented by a processor of a virtual device or a computer system. I.e. the processor of the virtual device or computer system runs the original network based on the model dataset and model structure parameters of the original network. The virtual device refers to a section of processor running space virtualized in the memory space of the storage.

And generating an offline model corresponding to the original network according to the network weight and the instruction corresponding to each computing node of the original network, and storing the offline model corresponding to the original network into a nonvolatile memory. Specifically, the control module of the processor may generate an offline model corresponding to the original network according to the network weights and instructions corresponding to the computing nodes of the original network, for example, the control module of the processor may store the network weights and instructions corresponding to the computing nodes of the original network in a nonvolatile memory such as a second memory, so as to implement generation and storage of the offline model. Therefore, when the original network is operated again, the original network can be operated directly according to the offline model corresponding to the original network, and all computing nodes of the original network do not need to be compiled on line to obtain instructions, so that the operation speed and efficiency of the system are improved.

Further, the computer system may be the computer system shown in fig. 1 or fig. 2, and the processor of the computer system may be one or more of a central processing unit, an image processor, a digital signal processor, a field-programmable gate array, or an intelligent processor. It should be clear that the working principle of the computer system in this embodiment is substantially identical to the execution process of each step in the above method, and specific reference may be made to the above description, which is not repeated here.

Furthermore, an embodiment of the present invention provides a computer storage medium having a computer program stored therein, which when executed by one or more processors, performs the method of any of the above embodiments. Wherein the computer storage medium may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

According to the neural network processing method, the computer system and the storage medium, the instructions corresponding to each computing node generated in the operation process of the original network and the network weights corresponding to each computing node of the original network are stored to obtain the offline model corresponding to the original network, so that when the original network is operated again, the offline model corresponding to the original network can be operated directly, the compiling of related data such as the model data set and the model structure parameters of the original network is not needed again, the operation time of the processor for operating the same network is shortened, and the processing speed and the processing efficiency of the processor are improved.

In other embodiments of the present application, the computer system may be a multi-split system formed by a plurality of processors, where the processors may include a main processor and one or more coprocessors, and each processor may be correspondingly provided with a memory. Specifically, as shown in fig. 9, the computer system 300 may include a first processor 310, a first memory 320 and a second memory 330 connected to the first processor 310, one or more second processors 340 connected to the first processor 310, and a third memory 350 corresponding to each second processor 340. The first processor 310 is configured to provide computing and control capabilities, and may include a first obtaining module 311, a first operation module 313, a first control module 312, and the like, where the first obtaining module 311 may be a hardware module such as an IO (Input/Output) interface, and the first operation module 313 and the first control module 312 are both hardware modules. For example, the first operation module 313 and the first control module 312 may be digital circuits or analog circuits. Physical implementations of the above hardware circuits include, but are not limited to, physical devices including, but not limited to, transistors and memristors, and the like.

Alternatively, the first processor 310 may be a main processor, and the first processor 310 may be a general-purpose processor such as a CPU (Central Processing Unit ), GPU (Graphics Processing Unit, graphics processor), FPGA (Field-Programmable Gate Array, field programmable gate array) or DSP (Digital Signal Processing ), or the like. One or more second processors may be used as coprocessors, the second processor 340 may be a dedicated neural network processor such as an IPU (Intelligence Processing Unit, intelligent processor), and the second processor 340 may also be a general-purpose processor. Further, the second processor 340 may include a second obtaining module, a second operation module, a second control module, and the like, where the second obtaining module may be a hardware module such as an IO (Input/Output) interface, and the second operation module and the second control module are both hardware modules, for example, the second operation module and the second control module may be digital circuits or analog circuits, and the like. Physical implementations of the above hardware circuits include, but are not limited to, physical devices including, but not limited to, transistors and memristors, and the like. The connection relationship among the second acquisition module, the second operation module and the second control module is similar to the connection relationship of each module in the first processor, and can be seen from the connection relationship of each module in the first processor.

The first memory 320 or the second memory 330 may also store a computer program for implementing the neural network processing method provided in the embodiment of the present application. Specifically, the neural network processing method is used for generating an offline model corresponding to the original network received by the first processor, the offline model may include a first offline model, where the first offline model includes network weights and instructions corresponding to all computing nodes with first operation attributes in the original network, so that when the processor runs the original network again, the network weights and instructions corresponding to all computing nodes with the first operation attributes can be directly obtained from the first offline model, and operations such as compiling and the like do not need to be performed again on the computing nodes with the first operation attributes in the same original network, thereby shortening the running time when the processor runs the network, and further improving the processing speed and efficiency of the processor. Further, the offline model corresponding to the original network may further include a second offline model, where the second offline model includes network weights and instructions corresponding to all computing nodes having the second operation attribute in the original network.

Further, the first memory 320 may be an internal memory, such as a volatile memory like a cache, which may be used to store relevant data during operation of the neural network, such as network input data, network output data, network weights and instructions, and so on. The second memory 330 and the third memory 350 may be nonvolatile memories such as external memories. Therefore, when the computer system needs to compile the same original network again to run the original network, the network weight and the instruction corresponding to each computing node in the original network can be directly obtained from the first offline model and the second offline model, so that the processing speed and the processing efficiency of the processor are improved.

In one embodiment, the method for processing a neural network according to an embodiment of the present invention is used in the computer system shown in fig. 9 to generate an offline model corresponding to the original network received by the first processor, so as to improve the processing efficiency and speed of the computer system. Specifically, as shown in fig. 10, the above method includes the steps of:

s700, obtaining a model data set and model structure parameters of an original network, wherein the model data set comprises network weights corresponding to all computing nodes in the original network, and the model structure parameters comprise connection relations of a plurality of computing nodes in the original network. Specifically, the first acquisition module of the first processor may acquire a model dataset and model structure parameters of the original network, and the network structure diagram of the original network may be obtained through the model dataset and model structure parameters of the original network. The model dataset includes data such as network weights corresponding to the computing nodes in the original network, and W1 to W6 in the neural network shown in fig. 14 are used to represent the network weights of the computing nodes. The model structure parameter includes connection relations of a plurality of computing nodes in the original network and computing attributes of the computing nodes, wherein the connection relations among the computing nodes are used for indicating whether data transmission exists among the computing nodes, for example, when data flow transmission exists among the computing nodes, the connection relations among the computing nodes can be described. Further, the connection relationships of the computing nodes may include input relationships and output relationships, and so on. As shown in fig. 14, when the output of the computing node C1 is input to the computing nodes I1 and I2, it can be explained that the computing node C1 and the computing node I1 have a connection relationship, and the computing node C1 and the computing node I2 have a connection relationship. For another example, if there is no data transfer between the computing node C1 and the computing node C2, it may be explained that there is no connection relationship between the computing node C1 and the computing node C2.

S710, obtaining operation attributes of all computing nodes in the original network, wherein the operation attributes of the computing nodes comprise a first operation attribute and a second operation attribute. Specifically, the first obtaining module or the first operation module of the first processor may obtain the operation attribute of each computing node in the original network. The operation attribute of the computing node is used for identifying which processor the computing instruction corresponding to the computing node can execute on. In this embodiment, the first operation attribute may be used to indicate that the computing instruction corresponding to the computing node can be executed on a special-purpose neural network processor such as an IPU, and the second operation attribute is used to indicate that the computing instruction corresponding to the computing node can be executed on a general-purpose processor such as CPU, GPU, DSP.

Further, the operational attributes of the various computing nodes may be represented and saved by an enumeration method. For example, one enumeration variable device may be defined, which may include more than two enumeration values. If the operation attribute of the current computing node is the first operation attribute, the enumeration value of the enumeration variable may be 1; if the operation attribute of the current computing node is the second operation attribute, the enumeration value of the enumeration variable may be 0.

S720, operating the original network according to the model data set, the model structure parameters and the operation attributes of all the computing nodes of the original network to obtain instructions corresponding to all the computing nodes in the original network. Specifically, the first operation module of the first processor may operate the original network according to the model dataset, the model structure parameter and the operation attribute of each computing node of the original network, so as to obtain the instruction corresponding to each computing node in the original network. Further, the first obtaining module of the first processor may further obtain input data of the original network, and the first operation module of the first processor may operate the original network according to the input data of the original network, the network model dataset, the model structure parameters and the operation attribute of each computing node, so as to obtain an instruction corresponding to each computing node in the original network. Still further, the above-described process of executing the instructions of the original network to obtain the instructions of the respective computing nodes is essentially a compiled process, which may be implemented by a processor or a virtual device of a computer system. That is, a processor or virtual device of the computer system runs the original network based on the model dataset and model structure parameters of the original network. The virtual device refers to a section of processor running space virtualized in the memory space of the storage.

And S730, if the operation attribute of the current computing node is the first operation attribute, storing the network weight and the instruction corresponding to the current computing node into a first nonvolatile memory to obtain a first offline model corresponding to the original network. Specifically, if the operation attribute of the current computing node is the first operation attribute, the first control module of the first processor may store the network weight and the instruction corresponding to the current computing node into the first nonvolatile memory, so as to obtain a first offline model corresponding to the original network. The first nonvolatile memory may be a third memory corresponding to a dedicated neural network processor such as an IPU. Further, for each computing node of the original network, the network weight and the instruction of the computing node are stored in a one-to-one correspondence. Therefore, when the original network is operated again, the first offline model corresponding to the original network can be directly obtained from the first nonvolatile memory, all the computing nodes with the first operation attribute in the original network are operated according to the first offline model, and the computing nodes with the first operation attribute in the original network do not need to be compiled on line to obtain instructions, so that the operation speed and the operation efficiency of the system are improved.

It should be clear that, in this embodiment, directly running the first offline model corresponding to the original network refers to running the machine learning algorithm (such as a neural network algorithm) corresponding to the original network using the first offline model, and implementing the target application (such as an artificial intelligence application of speech recognition, etc.) of the algorithm by executing the forward operation.

Optionally, the operation attribute of each computing node may be pre-stored in the network structure parameter or model dataset of the original network, and at this time, in the process of reading the original network, the operation attribute of each computing node in the original network may be directly obtained. That is, the step S710 may specifically include the following steps:

the operation attribute of each calculation node in the original network is obtained from the model data set or the model structure parameter of the original network. In particular, the operational attributes of the individual computing nodes may be pre-stored in the network structure parameters or model dataset of the original network. The first obtaining module of the first processor obtains the operation attribute of each computing node in the original network in the process of obtaining the model data set or the model structure parameter of the original network.

Alternatively, the operation attribute of each computing node in the original network may be obtained in real time during the process of obtaining the original network by the first obtaining module of the first processor. The step S710 specifically includes the following steps:

The determining unit may be configured to determine whether each computing node is executable on the dedicated neural network processor, and in particular, the first operation module of the first processor may determine whether each computing node is executable on the dedicated neural network processor.

If the current computing node can be executed on the special neural network processor, marking the current computing node as a first operation attribute; if the current computing node is capable of executing only on the general purpose processor, the current computing node is marked as a second operational attribute. Therefore, in the process of reading the original network by the first processor, the operation attribute of each computing node in the original network can be judged in real time. For example, if the operation attribute of the current computing node is the first operation attribute, the enumeration variable corresponding to the current computing node is marked as 1, otherwise, the enumeration variable corresponding to the current computing node is marked as 0.

Further, the first operation module of the first processor may query, through a preset function table, whether an equivalent computing node having the first operation attribute exists in the current computing node, and if the current computing node has the equivalent computing node having the first operation attribute, may mark the operation attribute of the current computing node as the first operation attribute. And if the equivalent computing node matched with the current computing node and having the first operation attribute is not found through the preset function table, the operation attribute of the current computing node is considered to be the second operation attribute. As shown in fig. 12, the step S710 further includes the following steps:

S711, inquiring whether an equivalent computing node exists in the current computing node through a preset function table, wherein the equivalent computing node is a computing node capable of being executed on a special neural network processor.

If the equivalent computing node exists in the current computing node, step S712 is performed to determine that the current computing node can be executed on the private neural network processor. At this time, step S713 may be performed to mark the operation attribute of the current computing node as the first operation attribute. Specifically, if the calculation instruction corresponding to the current calculation node can be converted into the calculation instruction corresponding to the special neural network processor, the current calculation node is considered to have an equivalent calculation node with the first operation attribute, and the operation attribute of the calculation node can be marked as the first operation attribute.

If the current computing node does not have an equivalent computing node, step S714 may be executed to determine that the current computing node can only be executed on a general-purpose processor such as a CPU. At this time, step S715 may be performed to mark the operation attribute of the current computing node as the second operation attribute. Specifically, if the calculation instruction corresponding to the current calculation node cannot be converted into the calculation instruction corresponding to the special neural network processor, it is considered that the current calculation node does not have an equivalent calculation node with the first operation attribute, and the operation attribute of the current calculation node can be marked as the second operation attribute.

In one embodiment, as shown in fig. 11, the step S720 further includes:

s721, according to the model structure parameters of the original network, obtaining the execution sequence of each calculation node in the original network; specifically, the first operation module of the first processor may obtain the execution sequence of each computing node in the original network according to the model structure parameter of the original network, and further, the first operation module of the first processor may obtain the execution sequence of each computing node in the original network according to the connection relationship of each computing node in the original network. For example, as shown in fig. 14, the input data of the computing node I1 is the output data of the computing node C1 and the output data of the computing node C2, and the input data of the computing node I3 is the output data of the computing node I1 and the output data of the computing node I2. Thus, the order of execution of the various compute nodes in the neural network shown in FIG. 14 may be C1-C2-C3-I1-I2-I3-C4-C5-I4, and so on. Of course, the computing nodes C1, C2 and C3 may be executed in parallel, and the computing nodes I1 and I2 may also be executed in parallel, which is only illustrated herein and not particularly limited in order of execution.

S722, respectively obtaining target processors corresponding to all the computing nodes according to the operation attributes of all the computing nodes; specifically, the first operation module of the first processor may obtain, according to the operation attribute of each computing node in the original network, a target processor corresponding to each computing node. For example, when the operation attribute of the computing nodes C1 to C5 is the second operation attribute (e.g., CPU operation attribute), the target processor corresponding to the computing nodes C1 to C5 is a CPU, which may be a first processor or a second processor serving as a coprocessor. When the operation attribute of the computing nodes I1 to I4 is the first operation attribute, the operation target processors corresponding to the computing nodes I1 to I4 are special processors of the neural network such as the IPU.

S723, executing each computing node through a target processor corresponding to each computing node according to the execution sequence of each computing node in the original network, and obtaining an instruction corresponding to each computing node in the original network. Specifically, the first control module of the first processor may control the neural network special processor serving as the coprocessor to execute the current computing node according to the execution sequence of each computing node in the original network, if the target processor corresponding to the current computing node is the neural network special processor such as the IPU, so as to obtain the instruction corresponding to the current computing node. If the target processor corresponding to the current computing node is a CPU, the CPU may be controlled to execute the current computing node to obtain an instruction corresponding to the current computing node. If the current computing node corresponds to the target processor and is a GPU, the GPU can be controlled to execute the current computing node, or general-purpose processors such as a CPU and the like can be controlled to execute the current computing node.

Further, as shown in fig. 11, the step S730 further includes the following steps:

s731, obtaining a memory allocation mode of the original network according to the model data set and the model structure parameters of the original network; specifically, the first operation module of the first processor may obtain the memory allocation manner of the original network according to the model dataset and the model structure parameter of the original network. Further, the first processor may obtain an execution sequence of each computing node in the original network according to the model structure parameter of the original network, and determine a memory allocation manner of the current network according to the execution sequence of each computing node in the original network. For example, the relevant data of each computing node in the running process is saved into a stack according to the execution sequence of each computing node. The memory allocation method refers to determining storage locations of data (including input data, output data, network weight data, intermediate result data, etc.) related to each computing node in the original network on a memory space (such as the first memory). For example, a data table may be used to store mappings between data associated with each computing node (input data, output data, network weight data, intermediate result data, etc.) and memory space.

S732, storing related data in the operation process of the original network into a first memory according to a memory allocation mode of the original network, wherein the related data in the operation process of the original network comprises network weights, instructions, input data and output data corresponding to all computing nodes of the original network. For example, as shown in fig. 14, X1 and X2 represent input data of the neural network, W1 to W6 are used to represent network weights corresponding to the calculation nodes C1, C2 and C3, and output data of the calculation nodes I1 to I3 and output data of the calculation nodes C4 and C5 may be used as intermediate calculation results. The first processor may store the related data in the original network operation process to the first memory, such as the internal memory or the volatile memory such as the cache, according to the determined memory allocation manner, and the specific storage manner may be referred to the left storage space in fig. 15.

S733, acquiring network weights and instructions corresponding to the computing nodes with the first operation attributes in the original network from the first memory, and storing the network weights and instructions corresponding to the computing nodes with the first operation attributes in the original network in the first nonvolatile memory to generate a first offline model. The first nonvolatile memory may be a third memory corresponding to the special neural network processor. The generation process of the first offline model can be specifically shown in fig. 15, and the corresponding first offline model stored in the storage space on the right in fig. 15 is the original network.

As shown in fig. 14 and 15, the offline model generation process described above is described below with reference to the drawings:

first, the first processor may obtain a model dataset, model structure parameters, and input data for the original network. And, the first processor may also obtain operational attributes of the various computing nodes in the original network.

And secondly, the first processor can obtain the connection relation of each calculation node of the original network according to the model structure parameters of the original network, and obtain the execution sequence of each calculation node in the original network and the memory allocation mode of the original network in the operation process according to the connection relation of each calculation node, so that the storage position of the related data of the original network in the operation process can be obtained. As shown in the left memory space of fig. 15, the related data of the original network during the operation may be stored in a stack according to the execution sequence of each computing node.

Finally, the first processor may store the network weight and the instruction corresponding to the computing node having the first operation attribute in the original network in the first nonvolatile memory, so as to generate a first offline model, where a storage manner of the first offline model may be shown in a storage space on the right in fig. 15. In addition, the first offline model only contains data such as network weights and instructions necessary for operating the computing nodes with the first operation attribute in the original network, and input data, output data or intermediate computing results and the like in the operation process of the original network are not required to be stored, so that the consumption of storage space can be reduced.

Optionally, the method further comprises the following steps:

according to the connection relation of multiple computing nodes in the original network, all the First computing nodes between two or more sequentially executed second computing nodes are equivalent to one First Offline node (First Offline 1), so that an equivalent network structure corresponding to the original network can be obtained, as shown in fig. 14. The first computing node is a computing node with a first operation attribute, and the second computing node is a computing node with a second operation attribute; the First Offline model further comprises interface data between a First Offline node (First Offline 1) and a second computing node, the interface data is used for representing the connection relationship between the First Offline node (First Offline 1) and other second computing nodes in an equivalent network structure of the original network, and the interface data can comprise input data sources and output data sources of the First Offline nodes.

More specifically, as shown in fig. 14, according to the connection relationship between the respective calculation nodes of the original network, all calculation nodes of the First operation attribute between calculation nodes of the adjacent two CPU operation attributes are equivalent to one First Offline node (First Offline 1), thereby obtaining an equivalent network of the original network. Optionally, since the First Offline node is an equivalent node of a plurality of computing nodes having the First operation attribute, the operation attribute of the First Offline node (First Offline 1) is the First operation attribute. Further, a specific connection relationship between the First Offline node (First Offline 1) and a second computing node in the original network having a second operational attribute may be determined based on the input or output data. For example, as shown in fig. 14, the specific connection relationship and the network weight between the First Offline node (First Offline 1) and the Second computing nodes C1, C2 and C3 may be determined according to the input data of the Offline node (First Offline 1), and the specific connection relationship and the network weight between the First Offline node (First Offline 1) and the Second Offline node (Second Offline 1) may be determined according to the output data of the First Offline node (First Offline 1). Further, the interface data may include that the input data of the First Offline node (First Offline 1) is output data of the second computing nodes C1, C2 and C3, and the output data of the First Offline node is input data of the second computing nodes C4 and C5.

As a further improvement, the offline model of the original network may further include network weights and instructions corresponding to other computing nodes having the second operation attribute in the original network. As shown in fig. 11, the above method further includes the steps of:

and if the operation attribute of the current computing node is the second operation attribute, storing the network weight and the instruction corresponding to the current computing node into a second nonvolatile memory so as to obtain a second offline model corresponding to the original network. That is, during operation of the original network, network weights and instructions for each computing node having the second operational attribute are stored in a second non-volatile memory (e.g., a second memory). The network weights and instructions of the plurality of computing nodes having the second operational attribute form a second offline model unit of the original network. Therefore, when the original network needs to be operated again, instructions such as calculation attributes of the calculation nodes with the second operation attributes in the original network and corresponding network weights can be directly obtained from the second nonvolatile memory. Specifically, the step S730 may further include the following steps:

s734, network weights and instructions corresponding to the computing nodes with the second operation attributes in the original network are obtained from the first memory, and the network weights corresponding to the computing nodes with the second operation attributes in the original network are stored in the second nonvolatile memory to generate a second offline model. The generation process of the second offline model can be seen in fig. 16, and the second offline model is stored in the left storage space in fig. 16.

The first offline model stored in the first nonvolatile memory and the second offline model stored in the second nonvolatile memory form an offline model of the original network. Thus, when the offline model needs to be executed again, the network weight and the instruction of the computing node with the first operation attribute can be directly obtained from the first nonvolatile memory, and the network weight and the instruction of the computing node with the second operation attribute can be directly obtained from the second nonvolatile memory, so that the original network can be directly executed without compiling the original network again.

Further, the first processor may further equivalent all second computing nodes between the two or more sequentially executed first computing nodes to one second offline node according to connection relationships of the plurality of computing nodes in the original network; the first computing node is a computing node with a first operation attribute, and the second computing node is a computing node with a second operation attribute; interface data between the second offline node and the first computing node is also included in the second offline model.

As shown in fig. 14 and 16, the offline model generation process described above is described below with reference to the drawings:

First, the first processor may obtain a model dataset, model structure parameters, and input data of the original network, so that a network structure diagram of the original network may be obtained from the model dataset and model structure parameters of the original network, as shown in fig. 14. And, the first processor may also obtain operational attributes of the various computing nodes in the original network.

And secondly, the first processor can obtain the connection relation of each calculation node of the original network according to the model structure parameters of the original network, and obtain the execution sequence of each calculation node in the original network and the memory allocation mode of the original network in the operation process according to the connection relation of each calculation node, so that the storage position of the related data of the original network in the operation process can be obtained. The relevant data of the original network during operation may be stored in a stack in the order of execution of the individual computing nodes, as shown in the middle portion of the memory space of fig. 16.

Finally, the first processor may store the network weights and instructions corresponding to the computing nodes having the first operation attribute in the original network in the first nonvolatile memory, and generate a first offline model, where a storage manner of the first offline model may be shown in a storage space of a right half of fig. 16. Meanwhile, the first processor may store the network weight and the instruction corresponding to the computing node having the second operation attribute in the original network in the second nonvolatile memory, so as to generate a second offline model, where a storage manner of the second offline model may be shown in a storage space of a left half of fig. 16. In addition, the first offline model and the second offline model only contain data such as network weights and instructions necessary for running each computing node in the original network, and input data, output data or intermediate computing results and the like in the running process of the original network are not required to be stored, so that the consumption of storage space can be reduced.

Further, the general purpose processor includes one or more of a central processing unit, an image processor, a digital signal processor, and a field editable gate array. Accordingly, the second operational attributes may include one or more of CPU operational attributes, GPU operational attributes, DSP operational attributes, and FPGA operational attributes. For example, when the computer system includes a first processor (e.g., CPU) and a second processor (e.g., IPU), then the operational attribute of the compute node may be a CPU operational attribute, a first operational attribute, or a combination of both. When the operation attribute of the computing node is the operation attribute of the CPU, it indicates that the computing instruction corresponding to the computing node needs to be executed on the first processor (e.g., CPU). When the operation attribute of the computing node is the first operation attribute, the computing instruction corresponding to the computing node is indicated to be executed on the second processor. When the operation attribute of the computing node is the combination of the CPU operation attribute and the first operation attribute, the operation attribute corresponding to the computing node can be indicated to be executed on the first processor or the second processor, and the operation attribute of the computing node can be marked as the first operation attribute. Further, operational attributes of the individual computing nodes may be identified and saved by an enumeration method. For example, the enumeration variable may include more than two enumeration values. If the operation attribute of the current computing node is the first operation attribute, the enumeration value of the enumeration variable may be 1; if the operation attribute of the current computing node is the CPU operation attribute, the enumeration value of the enumeration variable may be 0.

For another example, when the computer system includes a first processor (e.g., CPU), a second processor (e.g., GPU), and a second processor (e.g., IPU), then the operational attribute of the compute node may be one or a combination of CPU operational attributes, GPU operational attributes, or first operational attributes. When the operation attribute of the computing node is the operation attribute of the CPU, it indicates that the computing instruction corresponding to the computing node needs to be executed on the first processor (e.g., CPU). When the operation attribute of the computing node is the GPU operation attribute, it indicates that the computing instruction corresponding to the computing node needs to be executed on a second processor (e.g., GPU). When the operation attribute of the computing node is the first operation attribute, it indicates that the computing instruction corresponding to the computing node needs to be executed on a second processor (such as IPU). When the operation attribute of the computing node is a combination of the CPU operation attribute and the first operation attribute, it indicates that the computing instruction corresponding to the computing node may be executed on the first processor or may be executed on the second processor (e.g. IPU), where the operation attribute of the computing node may be marked as the first operation attribute. Further, the enumerated values of the enumerated variables may also be three, for example, if the operation attribute of the current computing node is the first operation attribute, the enumerated values of the enumerated variables may be 1; if the operation attribute of the current computing node is the CPU operation attribute, the enumeration value of the enumeration variable can be 0; if the operation attribute of the current computing node is the GPU operation attribute, the enumeration value of the enumeration variable may be 2. So that the operational attribute of the current computing node can be known through the numerical value of the enumeration variable.

Further, the second offline model may include a plurality of second offline sub-models, for example, one of the second offline sub-models may include instructions and network weights corresponding to computing nodes of all CPU operation attributes, one of the second offline sub-models may include instructions and network weights corresponding to computing nodes of all GPU operation attributes, one of the second offline sub-models may include instructions and network weights corresponding to computing nodes of all DSP operation attributes, and so on.

In one embodiment, the offline model may be used to perform operations immediately after the offline model is generated, or the offline model may be saved and reused when calculation is needed. As shown in fig. 13, an embodiment of the present invention further provides a neural network processing method, where the method includes the following steps:

s800, obtaining a model data set and model structure parameters of an original network, wherein the model data set comprises network weights corresponding to all computing nodes in the original network, and the model structure parameters comprise connection relations of a plurality of computing nodes in the original network. The specific implementation process may be referred to the above step S700, and will not be described herein.

S810, according to the connection relation of a plurality of computing nodes in an original network, all first computing nodes between two or more sequentially executed second computing nodes are equivalent to a first offline node, and an equivalent network corresponding to the original network is obtained; the first computing node is a computing node with a first operation attribute, and the second computing node is a computing node with a second operation attribute. Further, the execution order of the individual computing nodes in the equivalent network structure of the original network may be obtained.

For example, as shown in fig. 14, according to the connection relationship between the respective calculation nodes of the original network, all calculation nodes of the First operation attribute between calculation nodes of the adjacent two CPU operation attributes are equivalent to one First Offline node (First Offline 1), thereby obtaining an equivalent network of the original network. Optionally, since the First Offline node is an equivalent node of a plurality of computing nodes having the First operation attribute, the operation attribute of the First Offline node (First Offline 1) is the First operation attribute. Further, a specific connection relationship between the First Offline node (First Offline 1) and a second computing node in the original network having a second operational attribute may be determined based on the input or output data. The execution sequence of each computing node in the equivalent network corresponding to the original network may be C1-C2-C3-First Offline1-C4-C5-First Offline2, where the second computing nodes C1, C2 and C3 may be executed simultaneously, and the computing nodes C4 and C5 may also be executed simultaneously, so as to improve the processing efficiency of the computer system.

S820, if the current computing node in the equivalent network structure is a first offline node, a first offline model is obtained from the first nonvolatile memory, and the first offline node is executed according to the first offline model, wherein the first offline model comprises network weights and instructions corresponding to all the first computing nodes with the first operation attribute in the original network. Specifically, the first obtaining module of the first processor may obtain, according to an execution sequence of each computing node in the equivalent network of the original network, a network weight and an instruction corresponding to the computing node having the first operation attribute from the first offline model if the current computing node is the first offline node.

As shown in fig. 14, when the original network is operated again, according to the equivalent network of the original network, if the current computing node is the First Offline node First Offline1, the network weight and the instruction corresponding to each First computing node in the First Offline node First Offline1 can be obtained from the First nonvolatile memory, so that instruction compiling for each First computing node in the First Offline node First Offline1 is not required, and the processing efficiency of the First processor is improved. When the operation of the first offline node is completed, the first processor may continue to execute the second computing nodes C4 and C5 according to the equivalent network structure corresponding to the original network. Then, the First processor may obtain, from the First non-volatile memory, the network weights and instructions corresponding to the First computing nodes in the First Offline node First Offline2 according to the equivalent network structure corresponding to the original network, so that instruction compilation is not required for each First computing node in the First Offline node First Offline 2.

Further, the first offline model further includes interface data between the first offline node and the second computing node. Specifically, the First Offline model further includes interface data between each First Offline node and the second computing node connected to the First Offline node, for example, the interface data may include that input data of the First Offline node First Offline1 is output data of the second computing nodes C1 to C3, and output data of the First Offline node First Offline1 may be used as the second computing nodes C4 and C5.

At this time, when the original network is operated again, according to the equivalent network of the original network, if the current computing node is the First Offline node First Offline1, the network weight and the instruction corresponding to each First computing node in the First Offline node First Offline1 can be obtained from the First nonvolatile memory, so that instruction compiling for each First computing node in the First Offline node First Offline1 is not required, and the processing efficiency of the First processor is improved. Meanwhile, the First processor may obtain, according to the interface data of the second computing node connected to the First Offline node First Offline line1 in the First Offline model, that the second computing nodes C4 and C5 should be continuously executed after the execution of the First Offline node First Offline line1 is completed. After the operation of the second computing nodes C4 and C5 is completed, the First processor may obtain, from the First nonvolatile memory, the network weights and the instructions corresponding to the First computing nodes in the First Offline node First Offline2 according to the equivalent network structure corresponding to the original network, so that instruction compilation is not required for each First computing node in the First Offline node First Offline 2.

It may be understood that the first offline model may include instructions and weights corresponding to a plurality of first offline nodes, and the execution sequence of the plurality of first offline nodes may be determined according to an equivalent network corresponding to the original network, and each first offline node may be labeled according to the execution sequence. When the network weight and the instruction corresponding to a certain first offline node need to be acquired from the first offline model, the network weight and the instruction only need to be searched according to the label of the first offline node. Of course, the network weight and the instruction corresponding to the first offline node can be directly read according to the storage address of each first offline node, so as to realize accurate searching.

Furthermore, the offline model of the original network may further include network weights and instructions corresponding to other computing nodes having the second operation attribute in the original network. The method further comprises the following steps:

and if the current computing node in the equivalent network structure is not the first offline node, obtaining a second offline model from the second nonvolatile memory, and executing the current computing node in the equivalent network according to the second offline model. Specifically, the first processor may further equivalent all second computing nodes between the two or more sequentially executed first computing nodes to one second offline node according to the connection relationships of the plurality of computing nodes in the original network; the first computing node is a computing node with a first operation attribute, and the second computing node is a computing node with a second operation attribute; interface data between the second offline node and the first computing node is also included in the second offline model.

At this time, when the original network is operated again, according to the equivalent network of the original network, if the current computing node is the First Offline node First Offline1, the network weight and the instruction corresponding to each First computing node in the First Offline node First Offline1 can be obtained from the First nonvolatile memory, so that instruction compiling for each First computing node in the First Offline node First Offline1 is not required, and the processing efficiency of the First processor is improved. Meanwhile, the First processor may obtain, according to the interface data of the second computing node connected to the First Offline node First Offline line1 in the First Offline model, that the second computing nodes C4 and C5 should be continuously executed after the execution of the First Offline node First Offline line1 is completed. Then, the first processor may obtain the network weight and the instruction corresponding to the Second computing node C4 and the network weight and the instruction corresponding to the Second computing node C5 from the Second Offline model, and execute the computing nodes C4 and C5 according to the Second Offline model corresponding to the Second Offline node Second Offline 1. Meanwhile, the First processor may obtain, according to the interface data of the First computing node connected to the Second Offline node Second Offline line1 in the Second Offline model, that the First Offline node First Offline line2 should be continuously executed after the execution of the Second Offline node Second Offline line1 is completed. After the operation of the Second Offline node Second Offline1 is completed, the First processor may obtain, from the First nonvolatile memory, the network weights and the instructions corresponding to the First computing nodes in the First Offline node First Offline2, so that instruction compilation is not required for each First computing node in the First Offline node First Offline2.

It may be understood that the second offline model may include instructions and weights corresponding to a plurality of second offline nodes, and the execution sequence of the plurality of second offline nodes may be determined according to the equivalent network corresponding to the original network, and each second offline node may be labeled according to the execution sequence. When the network weight and the instruction corresponding to a certain second offline node are required to be acquired from the second offline model, the network weight and the instruction are only required to be searched according to the label of the second offline node. Of course, the network weight and the instruction corresponding to the second offline node can be directly read according to the storage address of each second offline node, so as to realize accurate searching.

As a further refinement, the second offline model may include a plurality of second offline sub-models, for example, one of the second offline sub-models may include instructions and network weights corresponding to computing nodes of all CPU operation attributes, one of the second offline sub-models may include instructions and network weights corresponding to computing nodes of all GPU operation attributes, one of the second offline sub-models may include instructions and network weights corresponding to computing nodes of all DSP operation attributes, and so on. At this time, when the original network needs to be operated again, according to the equivalent network structure corresponding to the original network, if the current computing node is a first offline node with a first operation attribute, the first processor may obtain, from the first nonvolatile memory, a network weight and an instruction corresponding to each first computing node in the first offline node, and directly execute the first offline node. If the current computing node is a computing node with the CPU operation attribute, the network weight and the instruction of the current computing node can be obtained from the second offline submodel corresponding to the current computing node, and the current computing node can be directly executed. If the current computing node is the computing node with the GPU operation attribute, the network weight and the instruction of the current computing node can be obtained from the second offline submodel corresponding to the current computing node, and the current computing node can be directly executed. If the current computing node is the computing node with the DSP operation attribute, the network weight and the instruction of the current computing node can be obtained from the second offline submodel corresponding to the current computing node, and the current computing node is directly executed.

Meanwhile, the present invention also provides a computer system 300, which includes a first processor 310, and a first memory 320 and a second memory 330 disposed corresponding to the first processor 310; one or more second processors 340 and one or more third memories 350 provided corresponding to the second processors 340, each second processor 340 being connected to the first processor 310; the first memory 320 or the second memory 330 has stored therein a computer program which, when executed, performs the method of any of the embodiments described above. Specifically, the first processor 310, when executing the above-mentioned computer program, specifically performs the following steps:

and obtaining a model data set and model structure parameters of the original network, wherein the model data set comprises network weights corresponding to all computing nodes in the original network, and the model structure parameters comprise connection relations of a plurality of computing nodes in the original network.

The method comprises the steps of obtaining operation attributes of all computing nodes in an original network, wherein the operation attributes of the computing nodes comprise a first operation attribute and a second operation attribute. Specifically, the first obtaining module or the first operation module of the first processor may obtain the operation attribute of each computing node in the original network. The operation attribute of the computing node is used for identifying which processor the computing instruction corresponding to the computing node can execute on. In this embodiment, the first operation attribute may be used to indicate that the computing instruction corresponding to the computing node can be executed on a special-purpose neural network processor such as an IPU, and the second operation attribute is used to indicate that the computing instruction corresponding to the computing node can be executed on a general-purpose processor such as CPU, GPU, DSP.

And operating the original network according to the model data set, the model structure parameters and the operation attributes of each computing node of the original network to obtain the instructions corresponding to each computing node in the original network. Further, the first obtaining module of the first processor may further obtain input data of the original network, and the first operation module of the first processor may operate the original network according to the input data of the original network, the network model dataset, the model structure parameters and the operation attribute of each computing node, so as to obtain an instruction corresponding to each computing node in the original network. Still further, the above-described process of executing the instructions of the original network to obtain the instructions of the respective computing nodes is essentially a compiled process, which may be implemented by a processor or a virtual device of a computer system. That is, a processor or virtual device of the computer system runs the original network based on the model dataset and model structure parameters of the original network. The virtual device refers to a section of processor running space virtualized in the memory space of the storage.

And if the operation attribute of the current computing node is the first operation attribute, storing the network weight and the instruction corresponding to the current computing node into a first nonvolatile memory so as to obtain a first offline model corresponding to the original network. The first nonvolatile memory may be a third memory corresponding to a dedicated neural network processor such as an IPU. Further, for each computing node of the original network, the network weight and the instruction of the computing node are stored in a one-to-one correspondence. Therefore, when the original network is operated again, the first offline model corresponding to the original network can be directly obtained from the first nonvolatile memory, all the computing nodes with the first operation attribute in the original network are operated according to the first offline model, and the computing nodes with the first operation attribute in the original network do not need to be compiled on line to obtain instructions, so that the operation speed and the operation efficiency of the system are improved.

Further, the computer system may be the computer system shown in fig. 9, where the first processor of the computer system may be a central processor, an image processor, a digital signal processor, or a field programmable gate array, and the second processor may be a special-purpose neural network processor, a central processor, an image processor, a digital signal processor, or a field programmable gate array, and so on. It can be understood that the working principle of the computer system in this embodiment is consistent with the execution process of each step in the processing method of the neural network shown in fig. 10 to 13, and the description is specifically referred to above, and will not be repeated here.

Furthermore, an embodiment of the present invention provides a computer storage medium in which a computer program is stored, which when executed by one or more first processors, performs the method of any of the above embodiments. Wherein the computer storage medium may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

According to the neural network processing method, the computer system and the storage medium, the first offline model corresponding to the original network is obtained by storing the instruction and the network weight corresponding to the computing node with the first operation attribute in the operation process of the original network, so that when the original network is operated again, the instruction and the network weight corresponding to the computing node with the first operation attribute can be obtained from the first offline model corresponding to the original network, and related data such as a model data set and model structure parameters corresponding to the computing node with the first operation attribute in the original network do not need to be compiled again, so that the operation time of the processor for operating the same network is shortened, and the processing speed and the processing efficiency of the processor are improved.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. A method for processing a neural network, the method comprising the steps of:

acquiring an original neural network model;

compiling the original neural network model to obtain instructions corresponding to all computing nodes in the original neural network model;

storing network weights corresponding to computing nodes of the original neural network model and compiled instructions into a memory to obtain an offline model corresponding to the original neural network model, wherein the offline model can be directly executed by a processor; the offline model comprises a network weight and an instruction corresponding to a computing node in the original neural network model.

2. The method of claim 1, wherein obtaining an original neural network model comprises:

and obtaining a model data set and model structure parameters of the original neural network model to obtain the original neural network model, wherein the model data set comprises network weights corresponding to all computing nodes in the original network, and the model structure parameters comprise connection relations of a plurality of computing nodes in the original network.

3. The method of claim 2, wherein compiling the original neural network model to obtain the instruction corresponding to the computing node in the original neural network model comprises:

Obtaining the execution sequence of each calculation node in the original neural network model according to the model structure parameters of the original neural network model;

compiling the original neural network model according to the execution sequence of each computing node in the original neural network model, and obtaining the instruction corresponding to each computing node in the original neural network model.

4. The method according to claim 2 or 3, wherein storing the network weights and the instructions corresponding to the computing nodes of the original neural network model in the memory to obtain the offline model corresponding to the original neural network model includes:

obtaining a memory allocation mode of the original neural network model according to a model data set and model structure parameters of the original neural network model;

storing related data in the operation process of the original neural network model into a first memory according to a memory allocation mode of the original neural network model, wherein the related data in the operation process of the original neural network model comprises network weights, instructions, input data and output data corresponding to computing nodes of the original neural network model;

And acquiring the network weight and the instruction corresponding to the computing node of the original neural network model from the first memory, and storing the network weight and the instruction corresponding to the computing node of the original neural network model in an external memory to acquire an offline model corresponding to the original neural network model.

5. The method of claim 4, wherein the external memory is a non-volatile memory.

6. The method of claim 1, wherein the method is applied in a computer system comprising a neural network processor;

the offline model comprises a first offline model capable of being directly operated by the neural network processor, and the first offline model comprises network weights and instructions corresponding to computing nodes with first operation attributes in the original neural network model.

7. The method of claim 6, wherein the offline model further comprises a second offline model capable of being directly run by a general purpose processor in the computer system, the second offline model containing network weights and instructions corresponding to computing nodes in the original neural network model having a second operational attribute;

Further, the second offline model includes at least one second offline sub-model, each of which is consistent with a type of operational attribute of a computing node executing on a second processor.

8. The method of claim 6, further comprising node interface data in the offline model, wherein the node interface data is used to represent a connection relationship between each computing node of the original network.

9. The method according to claim 1, characterized in that the method further comprises the steps of:

acquiring a new original neural network model;

if the new original neural network model has a corresponding offline model, acquiring the offline model corresponding to the new original neural network model from the memory, and operating the new original neural network model according to the offline model corresponding to the new original neural network model;

if the new original neural network model does not have the corresponding offline model, compiling the new original neural network model, generating the offline model corresponding to the new original neural network model, and storing the offline model corresponding to the new original neural network model into a memory.

10. A method for processing a neural network, the method further comprising the steps of:

obtaining model structure parameters of an original neural network model, wherein the model structure parameters comprise connection relations of a plurality of computing nodes in the original network;

acquiring an offline model corresponding to the original neural network model from a memory, wherein the offline model corresponding to the original neural network model comprises a network weight corresponding to a computing node of the original neural network model and a compiled instruction;

and operating the original neural network model according to the offline model corresponding to the original neural network model and the model structure parameters of the original neural network model.

11. The method of claim 10, wherein the method is applied in a computer system comprising a neural network processor; the offline model includes a first offline model executable by a neural network processor; the operating the original neural network model according to the offline model corresponding to the original network and the model structure parameters of the original network includes:

the neural network processor operates a computing node with a first operation attribute in the original neural network model according to the first offline model and the model structure parameters of the original network, which are obtained from the memory;

Wherein the computing node of the first operational attribute refers to a computing node executable on the neural network processor.

12. The method of claim 10, wherein the method is applied in a computer system comprising a neural network processor and a general purpose processor; the offline model includes a first offline model executable by the neural network processor and a second offline model executable by the general purpose processor; the operating the original network according to the offline model corresponding to the original network and the model structure parameters of the original network includes:

according to the model structure parameters of the original network, when the computing node is a computing node with a first operation attribute, the neural network processor operates the computing node with the first operation attribute in the original neural network model according to the first offline model acquired from the memory;

according to the model structure parameters of the original network, when the computing node is a computing node with a second operation attribute, the general processor operates the computing node with the second operation attribute in the original neural network model according to the second offline model acquired from the memory;

Wherein the computing node of the first operational attribute refers to a computing node executable on the neural network processor; the computing node of the second operational attribute refers to a computing node capable of executing on the general purpose processor.

13. The method of claim 10, wherein the offline model further comprises node interface data, wherein the node interface data is used to represent connection relationships between computing nodes of the original neural network model.

14. A computer system, comprising:

the first processor and the first memory and the second memory which are arranged corresponding to the first processor;

one or more second processors and one or more third memories arranged corresponding to the second processors, wherein each second processor is connected to the first processor;

the first memory or the second memory having stored therein a computer program, which when executed by the first processor performs the method of any of claims 1-9 or any of claims 10-13.

15. The computer system of claim 14, wherein the first processor is a central processor; the second processor is an image processor, a digital signal processor, a field editable gate array, or a neural network processor.

16. A computer storage medium having a computer program stored therein, which when executed by one or more processors performs the method of any of claims 1-9 or any of claims 10-13.