WO2022083536A1

WO2022083536A1 - Neural network construction method and apparatus

Info

Publication number: WO2022083536A1
Application number: PCT/CN2021/124360
Authority: WO
Inventors: 韩凯; 王云鹤; 张秋林; 张维; 许春景; 钱莉
Original assignee: 华为技术有限公司
Priority date: 2020-10-21
Filing date: 2021-10-18
Publication date: 2022-04-28
Also published as: CN112418392A

Abstract

A neural network construction method and apparatus in the field of artificial intelligence are used for accurately and efficiently constructing a target neural network better adapted to hardware under a given hardware constraint condition. The method comprises: performing sampling in a preset search space to obtain at least one set of first parameter combinations, wherein the search space comprises value ranges of various parameters used in constructing a neural network; constructing a plurality of first neural networks according to the at least one set of first parameter combinations; producing a mapping relationship, wherein the mapping relationship comprises relationships between the various parameters and evaluation results of the plurality of first neural networks; acquiring a constraint range, wherein the constraint range comprises a numerical range that identifies the computing capabilities of a computing device; obtaining a second parameter combination corresponding to the constraint range according to the mapping relationship; and obtaining a target neural network according to the second parameter combination.

Description

A kind of neural network construction method and device

This application claims the priority of the Chinese patent application with the application number "202011131423.9" and the application title "A neural network construction method and device" filed with the China Patent Office on October 21, 2020, the entire contents of which are incorporated by reference in in this application.

technical field

The present application relates to the field of artificial intelligence, and in particular, to a method and apparatus for constructing a neural network.

Background technique

In the field of artificial intelligence, neural networks, especially deep neural networks, have achieved great success in the field of computational vision in recent years. Benefiting from the increase in computing power and the introduction of more and more different components, the neural network structure is developing in a more and more complex direction.

In order to adapt the neural network to run in different hardware devices, the depth or width of the neural network can be adjusted. However, if multiple items of the neural network need to be manually adjusted, a neural network that better matches the hardware device can be obtained, and the adjustment efficiency is low. Therefore, how to efficiently obtain a neural network adapted to hardware devices has become an urgent problem to be solved.

SUMMARY OF THE INVENTION

The present application provides a method and apparatus for constructing a neural network, which are used to accurately and efficiently construct a target neural network that is more suitable for hardware under given hardware constraints.

In view of this, in a first aspect, the present application provides a method for constructing a neural network, comprising: sampling at least one set of first parameter combinations from a preset search space, where the search space includes various parameters used in constructing a neural network The range of values, each first parameter combination in at least one set of first parameter combinations includes the value of each parameter in a variety of parameters; construct a plurality of first neural networks according to at least one set of first parameter combinations; obtain A constraint range, the constraint range includes a numerical range that identifies the computing capability of the computing device, and the constraint range may be a numerical range determined according to information on the computing capability of the computing device; according to the mapping relationship, a second parameter combination corresponding to the constraint range is obtained, The mapping relationship includes the relationship between various parameters and the evaluation results of the plurality of first neural networks, and the evaluation results are the results obtained by evaluating the structure of each first neural network in the plurality of first neural networks; The two parameters are combined to obtain the target neural network.

Therefore, in the embodiments of the present application, at least one set of parameter combinations can be obtained from the search space, multiple neural networks can be obtained based on the at least one set of parameter combinations, and based on the at least one set of parameter combinations and the structures of the multiple neural networks the evaluation results to generate the mapping relationship. Then, according to the mapping relationship, a parameter combination corresponding to the constraint range of the computing device can be obtained, so as to obtain an optimal model adapted to the hardware according to the parameter combination. It can be understood that the model can be scaled with the computing capability of the hardware as a constraint to obtain an optimal model adapted to the hardware.

In a possible implementation manner, before generating the mapping relationship, the method may further include: training a plurality of first neural networks by using a preset first data set to obtain a plurality of first neural networks after training; After the evaluation result of each first neural network in the plurality of first neural networks or the output accuracy of each trained first neural network after training, at least one second neural network is selected from the plurality of first neural networks Neural network; the mapping relationship is the relationship between the corresponding parameter combination of each second neural network in the at least one second neural network and the evaluation result of each second neural network.

Therefore, in the embodiment of the present application, a plurality of first neural networks can be trained, and a plurality of better second neural networks can be selected from the plurality of first neural networks obtained by training, so that the subsequent The second neural network is used to obtain the mapping relationship, so that the target neural network corresponding to the parameter combination selected according to the mapping relationship has better performance in structure or output accuracy, and can better adapt to the hardware.

In a possible implementation manner, a plurality of neural networks with the best evaluation results or the highest output accuracy can be selected from the plurality of first neural networks as the second neural networks, so that the subsequent neural networks can be based on the structure or output accuracy. A second neural network with better performance is used to generate a mapping relationship, so that the target neural network corresponding to the parameter combination selected according to the mapping relationship has better performance in structure or output accuracy, and can better adapt to the hardware.

In a possible implementation manner, the mapping relationship may be obtained by fitting the relationship between the corresponding parameter combination of each second neural network in the at least one second neural network and the evaluation result of each second neural network .

Therefore, in the embodiment of the present application, the mapping relationship can be fitted according to the second neural network that performs better in structure or output accuracy, so that the target neural network corresponding to the parameter combination selected according to the mapping relationship is in the structure or The output accuracy is better and more adaptable to hardware.

In a possible implementation manner, the parameters included in the search space include one or more of the following: width, depth, resolution or the size of the convolution kernel, where the width is the basic unit included in each layer of the neural network , the depth is the number of layers of the neural network, and the resolution is the resolution of the image input to the neural network.

Therefore, in the embodiments of the present application, a neural network may be constructed based on the depth, width, resolution, size of convolution kernels, or the size of groups of convolution kernels, or the basic neural network may be adjusted to obtain a new first Neural Networks.

In a possible implementation manner, the evaluation result includes: the total number of floating-point operations flops of each first neural network, the running time of forward inference of each first neural network, the time occupied by running each first neural network One or more of the amount of memory, or the amount of parameters of each first neural network.

Therefore, in the embodiments of the present application, the result of evaluating the structure of the first neural network may include flops, running time, memory usage, or the amount of parameters of the neural network, etc., and one or more of them are used to measure the first neural network. The quality of the structure of the neural network, so as to screen out the neural network that is more suitable for the hardware.

Specifically, if the evaluation result includes data such as running time or memory occupancy, the first neural network can be run in a computing device, which can also be a simulated device, and the first neural network can be run in the simulated device. The network is used to obtain data such as running time or memory usage. Usually, the data such as running time or memory usage corresponding to different structures may also be different. If the evaluation result includes data such as flops or the amount of parameters, the structure of the first neural network can be directly counted to obtain data such as flops or the amount of parameters.

In a possible implementation manner, the aforementioned obtaining of the constraint range may include: receiving user input data, and obtaining the constraint range according to the user input data. Therefore, in the embodiment of the present application, the constraint range can be obtained according to the user input data, and the target neural network can be constructed according to the user's needs to improve the user experience.

In a possible implementation manner, the aforementioned obtaining of the constraint range according to the user input data may include: obtaining the identification information of the computing device from the user input data; and obtaining the constraint scope according to the identification information of the computing device. Therefore, in the embodiment of the present application, the constraint scope can be known according to the identifier of the computing device, and the user only needs to provide the identifier of the computing device, and no more information needs to be provided, thereby improving the user's interactive experience.

In a possible implementation, the type of the target neural network may also be determined according to user input data. Therefore, in the embodiment of the present application, a target neural network that is adapted to the computing capability of the computing device and meets the user's needs can be constructed for the user, which greatly improves the user experience.

In a possible implementation manner, the above method may further include: using a preset second data set to train the target neural network to obtain a trained target neural network. In the embodiment of the present application, the neural network finally obtained may be a trained target neural network, so that the target neural network can be directly deployed in a computing device to realize corresponding functions.

In a possible implementation, the target neural network is used to perform at least one of feature extraction, semantic segmentation, classification tasks, super-resolution or target detection. Therefore, in the embodiments of the present application, the target neural network can be one or more functions of feature extraction, semantic segmentation, classification tasks, super-resolution, or target detection, adapting to more scenarios and having strong generalization ability.

In a second aspect, the present application provides a device for constructing a neural network, including:

The sampling module is configured to sample at least one set of first parameter combinations from a preset search space, where the search space includes the value ranges of various parameters used when constructing the neural network, and each of the at least one set of first parameter combinations a first parameter combination comprising a value for each of the plurality of parameters;

a building module for building a plurality of first neural networks according to at least one set of first parameter combinations;

an acquisition module, configured to acquire a constraint range, where the constraint range includes a numerical range identifying the computing capability of the computing device, the constraint range may be a numerical range determined according to information on the computing capability of the computing device, and the data type of the evaluation result includes the constraint range the corresponding data type;

A calculation module, configured to obtain a second parameter combination corresponding to the constraint range according to a mapping relationship, where the mapping relationship includes the relationship between the at least one set of parameter combinations and the evaluation results of the plurality of first neural networks, so The evaluation result is a result obtained by evaluating the structure of each first neural network in the plurality of first neural networks;

The building block is further used to obtain the target neural network according to the second parameter combination.

For the effect of the second aspect and any one of the embodiments of the second aspect, reference may be made to the foregoing first aspect, and details are not repeated here.

In a possible implementation, the device may further include:

a first training module, used for training a plurality of first neural networks by using a preset first data set to obtain a plurality of first neural networks after training;

The screening module is configured to select from the plurality of first neural networks according to the evaluation result of each first neural network in the plurality of first neural networks after training or the output accuracy of each trained first neural network after training At least one second neural network is screened out in the at least one second neural network; the mapping relationship may include a mapping relationship between the corresponding parameter combination of each second neural network in the at least one second neural network and the evaluation result of each second neural network.

In a possible implementation manner, the mapping relationship is obtained by fitting the relationship between the corresponding parameter combination of each second neural network in the at least one second neural network and the evaluation result of each second neural network.

In a possible implementation manner, the evaluation result may include: the total number of floating-point operations flops of each first neural network, the running time of the forward inference of each first neural network, the running time of each first neural network One or more of the amount of memory of , or the amount of parameters of each first neural network.

In a possible implementation manner, the obtaining module is specifically configured to receive user input data, and obtain the constraint range according to the user input data.

In a possible implementation manner, the obtaining module is specifically configured to: obtain identification information of the computing device from user input data; and obtain the constraint range according to the identification information of the computing device.

In a possible implementation manner, the above-mentioned apparatus may further include a second training module configured to use a preset second data set to train the target neural network to obtain a trained target neural network.

In a possible implementation, the target neural network is used to perform at least one of feature extraction, semantic segmentation, classification tasks, super-resolution or target detection.

In a third aspect, an embodiment of the present application provides an apparatus for constructing a neural network, and the apparatus for constructing a neural network has a function of implementing the method for constructing a neural network in the first aspect. This function can be implemented by hardware or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.

In a fourth aspect, an embodiment of the present application provides an apparatus for constructing a neural network, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor invokes program codes in the memory to execute any one of the above-mentioned first aspects The processing-related functions in the neural network construction method shown in item. Optionally, the neural network construction device may be a chip.

In a fifth aspect, an embodiment of the present application provides a neural network construction device. The neural network construction device may also be called a digital processing chip or a chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface. The instructions are executed by a processing unit, and the processing unit is configured to perform processing-related functions as in the first aspect or any of the optional embodiments of the first aspect.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to execute the method in the first aspect or any optional implementation manner of the first aspect.

In a seventh aspect, an embodiment of the present application provides a computer program product including instructions, which, when run on a computer, enables the computer to execute the method in the first aspect or any optional implementation manner of the first aspect.

Description of drawings

Fig. 1 is a kind of artificial intelligence main frame schematic diagram applied in this application;

2 is a schematic diagram of a system architecture provided by the present application;

3 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application;

4 is another schematic diagram of the system architecture provided by the present application;

5 is a schematic flowchart of a method for constructing a neural network according to an embodiment of the present application;

6 is a schematic flowchart of another method for constructing a neural network according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an application scenario provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of another application scenario provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of another application scenario provided by an embodiment of the present application;

10 is a schematic flowchart of another method for constructing a neural network according to an embodiment of the present application;

11A is a schematic diagram of a mapping relationship provided by an embodiment of the present application;

FIG. 11B is a schematic diagram of another mapping relationship provided by an embodiment of the present application;

FIG. 11C is a schematic diagram of another mapping relationship provided by an embodiment of the present application;

12 is a schematic structural diagram of an apparatus for constructing a neural network provided by an embodiment of the application;

13 is a schematic structural diagram of another apparatus for constructing a neural network provided by an embodiment of the present application;

FIG. 14 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

First, the overall workflow of the artificial intelligence system will be described. Please refer to Figure 1. Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence. The above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communicate with the outside through sensors; computing power is provided by intelligent chips, such as central processing unit (CPU), network processor (neural-network processing unit, NPU), graphics processor (English: graphics processing unit, GPU), Application specific integrated circuit (ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips) are provided; the basic platform includes distributed computing framework and network related platform guarantee and support, It can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.

(2) Data

The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3) Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.

Among them, machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.

Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.

Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.

(4) General ability

After the above-mentioned data processing, some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.

(5) Smart products and industry applications

Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart city, etc.

The embodiments of the present application involve a large number of related applications of neural networks. In order to better understand the solutions of the embodiments of the present application, the related terms and concepts of the neural networks that may be involved in the embodiments of the present application are first introduced below.

(1) Neural network

A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be shown in formula (1-1):

Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.

(2) Deep neural network

A deep neural network (DNN), also known as a multi-layer neural network, can be understood as a neural network with multiple intermediate layers. The DNN is divided according to the position of different layers. The neural network inside the DNN can be divided into three categories: input layer, intermediate layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all intermediate layers, or hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN looks complicated, each layer can be expressed as a linear relational expression:

in,

is the input vector,

is the output vector,

is the offset vector or the bias parameter, w is the weight matrix (also called the coefficient), and α() is the activation function. Each layer is just an input vector

After such a simple operation to get the output vector

Due to the large number of DNN layers, the coefficient W and offset vector

The number is also higher. These parameters are defined in the DNN as follows: Take the coefficient w as an example: Suppose in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as

The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

To sum up, the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as

It should be noted that the input layer does not have a W parameter. In a deep neural network, more intermediate layers allow the network to better capture the complexities of the real world. In theory, a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).

(3) Convolutional Neural Network

Convolutional neural network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some of its neighbors. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network. In addition, the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Recurrent neural networks (RNN), also known as recurrent neural networks, are used to process sequence data. In the traditional neural network model, from the input layer to the middle layer and then to the output layer, the layers are fully connected, but each node in each layer is unconnected. Although this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict the next word of a sentence, you generally need to use the previous words, because the front and rear words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output. The specific manifestation is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the middle layer and this layer are no longer unconnected but connected, and the input of the middle layer not only includes the input layer The output of also includes the output of the middle layer at the previous moment. In theory, RNN can process sequence data of any length. The training of RNN is the same as the training of traditional CNN or DNN.

(5) Residual Neural Network (ResNet)

Residual neural network is proposed as an example to solve the problem of degradation caused by too many hidden layers of neural network. The degradation problem refers to: when there are more hidden layers in the network, the accuracy of the network reaches saturation and then degrades sharply, and this degradation is not caused by overfitting, but when backpropagating, the correlation of each gradient when it propagates to the bottom layer If it is not large, the gradient update is insufficient, which reduces the accuracy of the predicted label of the final model. When the neural network degenerates, the shallow network can achieve a better training effect than the deep network. At this time, if the low-level features are transferred to the high-level, the effect should be at least no worse than that of the shallow network, so an identity mapping can be used. (Identity Mapping) to achieve this effect. This identity map is called a residual connection (shortcut), and optimizing this residual map is easier than optimizing the original map.

(6) Loss function

In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two to update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make the prediction lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which are used to measure the difference between the predicted value and the target value. important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training of the deep neural network becomes the process of reducing the loss as much as possible.

(7) Back propagation algorithm

The neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.

Generally, CNN is a commonly used neural network. For ease of understanding, the structure of the convolutional neural network is exemplarily introduced below.

The structure of the CNN will be described in detail below by way of example in conjunction with FIG. 2 . As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture. A deep learning architecture refers to an algorithm based on machine learning. learning at multiple levels of abstraction. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images fed into it.

As shown in FIG. 2 , a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230 . In the following embodiments of the present application, for ease of understanding, each layer is referred to as a stage. The relevant contents of these layers are described in detail below.

Convolutional layer/pooling layer 220:

Convolutional layer:

As shown in FIG. 2 , the convolutional layer/pooling layer 220 may include layers 221-226 as examples. For example, in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a Convolution layer, 224 layers are pooling layers, 225 are convolution layers, and 226 are pooling layers; in another implementation, 221 and 222 are convolution layers, 223 are pooling layers, and 224 and 225 are volumes Layer, 226 is the pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.

The following will take the convolutional layer 221 as an example to introduce the inner working principle of a convolutional layer.

The convolution layer 221 may include many convolution operators. The convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially Can be a weight matrix, which is usually pre-defined, usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image during the convolution operation on the image. ...It depends on the value of the stride step) to process, so as to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will result in a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same size (row × column) are applied, That is, multiple isotype matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" described above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Blur, etc. The multiple weight matrices have the same size (row×column), and the size of the feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted feature maps with the same size are combined to form a convolution operation. output.

The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .

When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (eg, 221 ) often extracts more general features, which can also be called low-level features; with the convolutional neural network As the depth of the network 200 deepens, the features extracted by the later convolutional layers (eg, 226) become more and more complex, such as features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved.

Pooling Layer/Pooling Layer 220:

Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer, which can also be called a downsampling layer. In each layer 221-226 exemplified by 220 in FIG. 2, it can be a convolutional layer followed by a pooling layer, or a multi-layer convolutional layer followed by one or more pooling layers. During image processing, the only purpose of pooling layers is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the pixel values in the image within a certain range to produce an average value as the result of average pooling. The max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.

Neural network layer 230:

After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to utilize the neural network layer 230 to generate one or a set of outputs of the desired number of classes. Therefore, the neural network layer 230 may include multiple intermediate layers (231, 232 to 23n as shown in FIG. 2) and an output layer 240. The output layer may also be called a fully connected (FC) layer. The parameters contained in the multi-layer intermediate layer can be obtained by pre-training according to the relevant training data of a specific task type, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.

After the multi-layer intermediate layers in the neural network layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, which has a loss function similar to categorical cross entropy, and is specifically used to calculate the prediction error. Once The forward propagation of the entire convolutional neural network 200 (as shown in Figure 2, the propagation from the direction 210 to 240 is forward propagation) is completed, and the back propagation (as shown in Figure 2, the propagation from 240 to 210 direction is the back propagation) will begin. The weight values and biases of the aforementioned layers are updated to reduce the loss of the convolutional neural network 200 and the error between the result outputted by the convolutional neural network 200 through the output layer and the ideal result.

It should be noted that the convolutional neural network 200 shown in FIG. 2 is only used as an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.

In the present application, the convolutional neural network 200 shown in FIG. 2 can be used to process the image to be processed to obtain a classification result of the image to be processed. As shown in FIG. 2 , the image to be processed is processed by the input layer 210 , the convolution layer/pooling layer 220 and the neural network layer 230 to output the classification result of the image to be processed.

The deep learning training method for a computing device provided by the embodiment of the present application may be executed on a server, and may also be executed on a terminal device. The terminal device can be a mobile phone with image processing function, tablet personal computer (TPC), media player, smart TV, laptop computer (LC), personal digital assistant (PDA) ), a personal computer (PC), a camera, a video camera, a smart watch, a wearable device (WD), or an autonomous vehicle, etc., which are not limited in this embodiment of the present application.

Referring to FIG. 3 , an embodiment of the present application provides a system architecture 300 . The system architecture includes a database 330 and a client device 340 . The data collection device 360 is used to collect data and store it in the database 330 , and the building module 302 generates the target model/rule 301 based on the data maintained in the database 330 . The following will describe in more detail how the building module 302 obtains the target model/rule 301 based on the data. The target model/rule 301 is the neural network constructed in the following embodiments of the present application. For details, please refer to the relevant descriptions in FIGS. 5-11C below.

The computing module may include a building module 302, and the target models/rules obtained by the building module 302 may be applied in different systems or devices. In FIG. 3, the execution device 310 configures the transceiver 312, the transceiver 312 can be a wireless transceiver, an optical transceiver or a wired interface (such as an I/O interface), etc., to perform data interaction with external devices, and the "user" can The client device 340 inputs data to the transceiver 312. For example, in the following embodiments of the present application, the client device 340 can send the basic network, constraints, etc. to the execution device 310, requesting the execution device to construct a target neural network under the constraints of the constraints based on the basic network. The internet. Optionally, the client device 340 may also send to the execution device 310 a database for constructing the target neural network, that is, the data set mentioned below in this application, which will not be repeated here.

The execution device 310 can call data, codes, etc. in the data storage system 350 , and can also store data, instructions, etc. in the data storage system 350 .

The calculation module 311 processes the input data. Specifically, the calculation module 311 is configured to: sample at least one set of first parameter combinations from a preset search space, where the search space includes the value ranges of various parameters used in constructing the neural network, at least one set of first parameter combinations Each first parameter combination in the combination includes the value of each parameter in the multiple parameters; constructs multiple first neural networks according to at least one set of first parameter combinations; generates a mapping relationship, the mapping relationship includes multiple parameters and multiple parameters. The relationship between the evaluation results of the first neural networks, the evaluation results are the results obtained by evaluating the structure of each first neural network in the plurality of first neural networks; obtain the constraint range, and the constraint range includes according to the calculation device. The numerical range determined by the information of the computing capability, the constraint range may include a numerical range that identifies the computing capability of the computing device; according to the mapping relationship, a second parameter combination corresponding to the constraint range is obtained; the target neural network is obtained according to the second parameter combination, the The target neural network is the target model/rule 301 shown in FIG. 3 .

The association function module 312 and the association function module 314 are optional modules, which can be used to construct other networks other than the backbone network associated with the target neural network, such as a region proposal network (RPN), a feature pyramid network (feature Pyramid network, FPN), etc.

Finally, the transceiver 312 returns the constructed target neural network to the client device 340 to deploy the target neural network in the client device 340 or other devices.

More deeply, the building module 302 can obtain corresponding target models/rules 301 based on different candidate sets for different target tasks, so as to provide users with better results.

In the case shown in FIG. 3 , the data input into the execution device 310 can be determined according to the input data of the user, for example, the user can operate in the interface provided by the transceiver 312 . In another case, the client device 340 can automatically input data to the transceiver 312 and obtain the result. If the client device 340 automatically inputs data and needs to obtain the user's authorization, the user can set the corresponding permission in the client device 340 . The user can view the result output by the execution device 310 on the client device 340, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 340 can also act as a data collection terminal to store the collected data in the database 330 .

It should be noted that FIG. 3 is only an exemplary schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 3 , the data storage system 350 is an external memory relative to the execution device 310 . In other scenarios, the data storage system 350 may also be placed in the execution device 310 .

The target model/rule 101 constructed according to the building block 302 can be applied to different systems or devices, such as mobile phones, tablet computers, laptop computers, augmented reality (AR)/virtual reality (VR) , a vehicle terminal, etc., or a server or a cloud device.

The target model/rule 101 may be the target neural network in the present application in the embodiment of the present application. Specifically, the target neural network provided in the embodiment of the present application may be a CNN, a deep convolutional neural network (deep convolutional neural network, DCNN). , recurrent neural network (RNN) and so on.

Referring to FIG. 4 , an embodiment of the present application further provides a system architecture 400 . The execution device 310 is implemented by one or more servers, and optionally, cooperates with other computing devices, such as: data storage, routers, load balancers and other devices; the execution device 310 may be arranged on a physical site, or distributed in multiple on the physical site. The execution device 310 can use the data in the data storage system 350 or call the program code in the data storage system 350 to implement the steps of the deep learning training method for a computing device corresponding to FIG. 6 below in this application.

A user may operate respective user devices (eg, local device 401 and local device 402 ) to interact with execution device 310 . Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, etc.

Each user's local device can interact with the execution device 310 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof. Specifically, the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, and the like. The wireless network includes but is not limited to: the fifth generation mobile communication technology (5th-Generation, 5G) system, the long term evolution (long term evolution, LTE) system, the global system for mobile communication (global system for mobile communication, GSM) or code division Multiple access (code division multiple access, CDMA) network, wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), Any one or a combination of radio frequency identification technology (radio frequency identification, RFID), long range (Long Range, Lora) wireless communication, and near field communication (near field communication, NFC). The wired network may include an optical fiber communication network or a network composed of coaxial cables, and the like.

In another implementation, one or more aspects of the execution device 310 may be implemented by each local device, for example, the local device 401 may provide the execution device 310 with local data or feedback calculation results. The local device may also be referred to as a computing device.

It should be noted that all the functions of the execution device 310 can also be implemented by the local device. For example, local device 401 implements the functions of performing device 310 and provides services to its own users, or provides services to users of local device 402 .

Neural networks have achieved great success in many visual tasks such as image recognition and object detection. The development of the neural network structure has greatly improved the performance of the network model, and promoted the effective implementation and development of the neural network in practical applications with different requirements.

In order to be able to apply a huge neural network to different hardware devices, you can adjust the depth of the network (Depth), the width of the network (Width) and the input resolution of the image (Resolution), the size of the convolution kernel, and the group of the convolution kernel. and other parameters to reduce the memory footprint and running delay of the model. However, if these parameters of the neural network are adjusted manually, it will take a lot of manpower, the efficiency of obtaining the optimal model is low, and it may not be possible to obtain a neural network with balanced performance. Therefore, the present application provides a method for constructing a neural network. By constructing the mapping relationship between various parameters of the neural network and hardware optimization indicators (such as the amount of calculation, the amount of parameters, or the amount of memory occupied, etc.) In the case of , appropriate parameters are obtained, so as to efficiently obtain a better neural network.

The method for constructing a neural network provided by the present application will be described in detail below in conjunction with the aforementioned neural network or system architecture.

Referring to FIG. 5 , a schematic flowchart of a method for constructing a neural network provided by the present application is as follows.

501. Obtain at least one set of first parameter combinations by sampling from a preset search space.

The search space may include value ranges of various parameters used in constructing the neural network. Specifically, the search space may include one or more items of width, depth, resolution, size of convolution kernel or size of a group of convolution kernels, etc. The width is equal to the size of each layer of the neural network. The number of basic units included, the depth is the number of layers of the neural network, and the resolution is the resolution of the image input to the neural network. Each set of first parameter combinations may include one or more parameters such as width, depth, resolution, size of convolution kernel, or size of a group of convolution kernels.

The manner of sampling in the search space may be random sampling, or may be sampling according to distribution, and may be adjusted according to actual application scenarios, which is not limited in this application. For example, m groups of parameter combinations may be randomly sampled in the search space, and each parameter combination may include one of depth, width, resolution convolution kernel size or convolution kernel group size, or The value of multiple parameters. For another example, taking the sampling of the width as an example, the sampling probability can be determined according to the distribution of the width. Wait.

502. Construct a plurality of first neural networks according to at least one set of first parameter combinations.

Wherein, after sampling to obtain at least one set of first parameter combinations, a plurality of first neural networks are constructed and obtained based on the parameters included in the first parameter combinations.

Specifically, based on the parameters included in the first parameter combination, the first neural network can be obtained by stacking using a preset basic unit, or it can be based on the parameters included in the first parameter combination, for a predetermined basic network. make adjustments.

For example, the first parameter combination can include the values of depth, width and resolution, the depth is 10, the width is 20, and the resolution is 100*200, and then the number of network layers is 10, and each layer includes the first layer of 20 basic units. In a neural network, the resolution can be the size of the input image that can be processed by the first neural network, and the size of the pooling layer can be constructed according to the resolution.

For another example, a basic network can be predetermined, which can be CNN, ResNet, RNN, etc. Here, CNN is used as an example to illustrate, after sampling the first parameter combination, according to the first parameter combination Including the depth and width, the number of network layers of the CNN and the number of basic units of each layer of the network are adjusted to the values in the first parameter combination to obtain the adjusted CNN.

Usually, the number of first neural networks is not less than the number of first parameter combinations. In some scenarios, the number of first parameter combinations is the same as the number of first neural networks. For example, if five sets of parameter combinations are obtained by sampling, five first neural networks can be constructed accordingly. In some scenarios, the number of first neural networks may be higher than the number of first parameter combinations. For example, if two sets of first parameter combinations are obtained by sampling, when constructing the first neural network according to the depth and width included in the two sets of first parameter combinations, different basic units or operators can be used, and the same depth and width can be used. In the case of , two or more first neural networks with different structures can be obtained.

503. Generate a mapping relationship, where the mapping relationship is a mapping relationship between various parameters and the evaluation results of the multiple first neural networks.

After a plurality of first neural networks are obtained, the structures of the plurality of first neural networks can be evaluated to obtain an evaluation result of each first neural network, and according to the parameter combination and evaluation result of each first neural network, Generate a mapping relationship between various parameters and evaluation results, usually neural networks with different structures may have different evaluation results. For example, the parameter combination of the first neural network may include depth and width, the evaluation result may include flops of the first neural network, and the mapping relationship may include the relationship between depth, width, and flops.

In a possible implementation, the evaluation result may include one or more of the following: flops of each first neural network, running time of forward inference of each first neural network, running each first neural network The amount of memory occupied or the amount of parameters of each first neural network, etc. Wherein, the flops of each first neural network, the running time of forward inference, or the amount of memory occupied, etc., may be obtained by running the first neural network in a computing device. For example, if the computing device is a terminal device, the first neural network can be run on the terminal device, and the process and results of the first neural network running in the computing device can be counted, so as to know that the first neural network is running on the terminal device. The flops of the mid-running time, the running time of a forward inference, or the amount of memory occupied by the terminal device, etc. The parameters of the first neural network or other parameters directly related to the structure of the first neural network can be obtained by directly evaluating the first neural network, such as directly counting the parameters included in the first neural network.

In a possible implementation manner, the plurality of first neural networks may also be trained by using a preset first data set to obtain a plurality of trained first neural networks. Then the results of the trained multiple first neural networks can be evaluated to obtain the evaluation results of each trained first neural network, or the data set can be used as the input of each trained first neural network, to calculate the output accuracy of each trained first neural network, and then select at least one second neural network from multiple first neural networks according to the evaluation result or output accuracy of each trained first neural network. Then, the mapping relationship is generated according to the parameter combination corresponding to each second neural network and the evaluation result.

Specifically, the relationship between the corresponding parameter combination of each second neural network in the at least one second neural network and the evaluation result of each second neural network may be fitted to obtain the mapping relationship. The mapping relationship may be a linear relationship or a non-linear relationship.

For example, after obtaining m parameter combinations, construct m first neural networks based on the m parameter combinations, m≥2, use the data set to train the m first neural networks, and obtain the m first neural networks after training. a neural network. Then, from the m first neural networks, n second neural networks with the best evaluation results or the best output accuracy can be selected, where n≤m. Then, the relationship between the parameter combination of each second neural network in the m second neural networks and the evaluation result is fitted to obtain a curve relationship; or, the parameter combination and evaluation of each second neural network can also be generated. A mapping table between results.

504. Acquire a constraint range corresponding to the computing device.

The constraint range may include a numerical range that identifies the computing capability of the computing device, or the constraint range may include a value range of an index associated with the computing capability of the computing device, and the constraint range may affect the neural network that the computing device can carry. The structure of the network. The constraint range may be a range related to the data type included in the aforementioned evaluation result, or data representing an index of the computing capability of the computing device. For example, if the constraint range can be a numerical range that does not exceed the maximum running duration of the neural network that can be carried by the computing device, the aforementioned evaluation result can include the running duration of the first neural network. For another example, if the constraint range is the flops range, the aforementioned evaluation result may include the value of the flops of the first neural network.

It should be noted that the computing device mentioned in this application may be a physical device, or referred to as hardware, or a virtual device, such as a device obtained through simulation. For ease of understanding, the computing device is taken as an example of hardware below. For exemplary illustration, the hardware mentioned below can also be replaced by a computing device or a virtual device, etc., which will not be described in detail below.

The computing power of hardware can be measured by a variety of parameters, such as the amount of available memory, running time, the amount of computation that can be supported, or the amount of parameters that can be supported. For example, a target neural network needs to be constructed in hardware, and the constraint range can be the range of the calculation amount of the target neural network that can be carried by the hardware, the value range of the running time, or the amount of available memory, etc., where the hardware The amount of computation can be measured by flops or multiply-accumulate (MACC) operations.

In a possible implementation manner, if the neural network construction method provided by this application can be executed by a cloud-side device or a terminal-side device, the cloud-side device or terminal-side device can extract its own hardware information, so as to know its own hardware information. Computational power, obtain the constraint range for the neural network that needs to run in the cloud-side device or the device-side device. For example, a cloud-side device can extract the range of flops supported by its own hardware, the amount of available memory, etc., and use the range of flops and the amount of available memory as hardware constraints.

In a possible implementation, if the neural network construction method provided by this application can be executed by a cloud-side device, the server can receive user input data sent by the terminal-side device, and then obtain the constraint range based on the user input data . The user input data may be information input by the user through the interactive interface of the terminal device, requesting the cloud side device to construct a target neural network adapted to the computing capability of the terminal device or other devices according to the information input by the user. For example, the user can input one or more ranges within the computing capability of the hardware, such as the flops range and the amount of memory occupied, in the interactive interface provided by the terminal device, so that the cloud-side device can use the one or more ranges as For the scope of hardware constraints, or, the user can also directly input hardware identification information such as the hardware model, hardware identification number, or hardware noun in the interactive interface provided by the terminal device, so that the cloud-side device can identify the identification information based on the identification information. The computing capability of the hardware, so that the range of the computing capability of the hardware is used as a constraint range, etc.

In addition, if the constraint information is obtained according to user input data, the type of target neural network to be constructed can also be determined according to the user input information. For example, the user can request to construct a neural network for image classification or target detection through user input data. Neural Networks.

Specifically, for example, the user input information may directly include the constraint range, or may include the identification information of the hardware. After the cloud-side device receives the user input data, it identifies the computing capability of the hardware according to the identification information of the hardware, and according to the calculation The capability determines the corresponding constraint range. For example, the user can enter the model of the central processing unit (CPU) of the terminal that needs to run the target neural network in the interactive interface of the terminal device, and send it to the cloud side device through the terminal device. After receiving the user input information, the cloud-side device extracts the flops range corresponding to the CPU, that is, the constraint range, from the local database according to the CPU model carried in the user input information.

It should also be noted that, step 504 in this embodiment of the present application may be performed before step 501, or may be performed after step 501, and may be adjusted according to actual application scenarios, which is not limited in this application. When step 501 is performed first, parameters that affect the structure of the neural network, such as depth, width, or the resolution of the input image, and parameters related to hardware performance, such as running time, memory usage, flops, etc., can be constructed. The range of constraints can be the range of parameters related to hardware performance, such as the range of runtime, the range of memory footprint, or the range of flops. Therefore, when a neural network running in hardware needs to be constructed subsequently, the mapping relationship can be reused, and the efficiency of constructing the neural network can be improved. If step 504 is performed first, the mapping relationship between parameters affecting the structure of the neural network, such as depth, width, or resolution of the input image, and parameters corresponding to the constraint range can be constructed. For example, if the constraint range is the constraint range of the runtime length , the mapping relationship between parameters affecting the structure of the neural network, such as depth, width, or the resolution of the input image, and the running time can be constructed. If the constraint range is the range of flops, the resolution of the depth, width, or input image can be constructed. The mapping relationship between parameters that affect the structure of the neural network and the flops, such as rate, eliminates the need to build more mapping relationships between parameters and reduces the workload.

505. Obtain a second parameter combination corresponding to the constraint range according to the mapping relationship.

After the mapping relationship and the constraint range are determined, the parameter combination corresponding to the constraint range is calculated through the mapping relationship. For example, the mapping relationship may be the relationship between parameters such as depth, width, and the resolution of the image input to the first neural network and flops, and the constraint range may include the flops range of the terminal set by the user. By substituting the range into the mapping relationship, one or more sets of parameter combinations corresponding to the flops range of the terminal including parameters such as depth, width, and resolution of the image input to the first neural network can be obtained.

For ease of understanding, the mapping relationship can be understood as the relationship between the independent variable and the dependent variable. The independent variable can include parameters such as depth, width, and the resolution of the image input to the first neural network. The dependent variable can include flops, running time or variables such as memory usage. After the constraint range is obtained, the constraint range can be regarded as a dependent variable, and the dependent variable can be substituted into the mapping relationship to invert the independent variable, which includes parameters such as depth, width, and the resolution of the image input to the first neural network. parameter combination. Of course, flops, running time, occupied memory, etc. can also be used as independent variables, and parameters such as depth, width, and the resolution of the image input to the first neural network can be used as dependent variables, which can be adjusted according to actual application scenarios. It is an exemplary illustration, not a limitation.

In a possible implementation manner, the mapping relationship may also be a mapping table, for example, parameters such as depth, width, and resolution of an image input to the first neural network are recorded and mapped to flops, running time, or occupied memory and other values. sheet. After the constraint range is determined, one or more sets of parameter combinations corresponding to the constraint range can be queried from the mapping table.

506. Obtain the target neural network according to the second parameter combination.

After one or more sets of second parameter combinations are obtained, a neural network can be constructed based on the one or more sets of second parameter combinations to obtain a target neural network.

Specifically, the second parameter combination may include depth, width, the resolution of the image input to the first neural network, the size of the convolution kernel, or the number of groups of the convolution kernel, etc., which may be based on the second parameter combination. The parameters included, and given the base unit, are constructed to obtain the target neural network.

In one scenario, multiple third neural networks can be constructed by combining one or more second parameters, and then the multiple third neural networks can be trained by using the preset second data set, Get the third neural network after training. Then, the optimal neural network may be selected from the plurality of trained third neural networks as the target neural network. For example, the neural network with the highest output accuracy can be selected from the multiple trained third neural networks as the target neural network, or the ratio between the output accuracy and the calculation amount can be selected from the multiple trained third neural networks within a certain value. A neural network in the range as a target neural network, etc.

In a possible scenario, multiple third neural networks can be constructed by combining one or more second parameters, and the optimal neural network can be directly selected from the third neural network as the target neural network . For example, the neural network with the lowest flops can be selected from the third neural network as the target neural network, or the neural network with the shortest running time for running one forward inference can be selected from the third neural network as the target neural network, or , the neural network that occupies the least memory can be selected from the third neural network as the target neural network, etc.

In another scenario, if only a set of second parameter combinations are obtained and a neural network is constructed, the neural network can be directly used as the target neural network.

Optionally, when the target neural network is obtained, if the target neural network has not been trained, the target neural network may also be trained using a preset second data set to obtain a trained target neural network.

More specifically, the target neural network mentioned in this application can be used for one or more of feature extraction, semantic segmentation, classification tasks, super-resolution or target detection. For example, after the backbone network is obtained by scaling the basic network, different headers can be added to achieve different functions, so that the obtained neural network can have one or more functions.

In a possible implementation manner, if a basic network is preset, the second parameter combination can be used to perform adjustment on the basic network to obtain the final target neural network. For example, if the user can give a basic network by inputting data, the basic network can be CNN, RNN or ResNet, etc. The second parameter combination can include parameters such as depth, width or input image resolution, based on the second parameter combination including Adjust the width, depth or size of the pooling layer of CNN, RNN or ResNet to obtain the adjusted network, that is, the target neural network. Therefore, in the embodiments of the present application, the model can be scaled by using the computing capability of the hardware as a constraint to obtain an optimal model adapted to the hardware.

Therefore, in the embodiments of the present application, at least one set of parameter combinations can be obtained from the search space, multiple neural networks can be obtained based on the at least one set of parameter combinations, and based on the at least one set of parameter combinations and the structures of the multiple neural networks the evaluation results to generate the mapping relationship. Then, according to the mapping relationship, a parameter combination corresponding to the constraint range of the hardware can be obtained, so as to obtain an optimal model adapted to the hardware according to the parameter combination. It can be understood that the model can be scaled with the computing capability of the hardware as a constraint to obtain an optimal model adapted to the hardware.

The foregoing describes the process of the method for constructing a neural network provided by the present application. The following describes the method for constructing a neural network provided by the present application in more detail with reference to the process in FIG. 5 and specific application scenarios.

In a possible implementation manner, the neural network construction method provided by this application may be executed by a server, a terminal or other devices. Taking a server as an example, if the server itself needs to build a target neural network deployed on the server, the server can Obtain its own hardware information, such as CPU model, available memory, or required running time, etc., generate a hardware constraint range based on the hardware information, build a model based on the hardware constraint range, or adjust the basic network to obtain a hardware adaptation. target neural network. For another example, the terminal-side device may adjust the model based on the adapted terminal-side hardware constraints, the adjusted model is used on the terminal-side, and the model is adjusted to utilize hardware resources more reasonably. After the end-side device obtains the basic network and corresponding parameter information, it obtains the relationship between flops and hyperparameters under the series of basic networks, and at the same time obtains local hardware information. Target flops. Based on the base model and target flops values, determine hyperparameter values such as width, depth, and input resolution. According to the basic model and hyperparameter values, the basic network is commended, and a model adapted to the local hardware is obtained, trained and used locally.

In some other possible scenarios, the neural network construction method provided in this application may be executed by a cloud-side device, and the constraint range for the target neural network may be sent by the terminal-side device to the cloud-side device, and then the cloud-side device will send the The data sent by the device is used to generate the mapping relationship, and the optimal parameter combination is calculated, and a neural network adapted to the hardware is constructed according to the optimal parameter combination.

For ease of understanding, referring to FIG. 6 , the flow of the neural network construction method provided by the present application in this scenario is exemplarily described. Wherein, the parts similar to the above-mentioned flow in FIG. 5 will not be repeated, and only more detailed contents will be explained below.

601. The terminal-side device sends user input information to the cloud-side device.

The end-side device can be a mobile phone, a tablet personal computer (TPC), a media player, a smart TV, a laptop computer (LC), a personal digital assistant (PDA), a personal computer (personal computer) computer, PC), camera, video camera, smart watch, wearable device (WD) or self-driving vehicle and other devices.

The terminal-side device may generate user input information according to its own hardware information, or may provide an interactive interface for the user, and obtain the user input data through the user's input. For example, the terminal device needs a neural network to classify photo albums. The terminal device can be based on the type of neural network to be built and its own hardware information, such as CPU model, NPU model, available memory, or neural network requirements. The running time, etc., generates the user input information, and then sends it to the cloud-side device through a wired or wireless network. Alternatively, the user can input the hardware information that needs to be deployed in another terminal device and the type of neural network in the interactive interface of the terminal device. Enter the hardware information of the mobile phone to request to build a neural network adapted to the mobile phone hardware for the mobile phone's album classification.

602. The cloud-side device extracts the constraint range.

The cloud-side device may include a server, a personal computer, or a computer workstation, etc. After the cloud-side device receives the user input data, it extracts information from the user input information to obtain the constraint range. The constraint range is a range within the computing capability of the hardware. Generally, the higher the computing capability of the hardware, the higher the upper limit of the constraint range, that is, the upper limit of the constraint range is positively correlated with the computing capability of the hardware. The computing power of the hardware can be measured in flops, the running time of the neural network that can be supported, the amount of available memory or the number of parameters of the neural network that can be carried, etc.

Specifically, the user input information may directly include a value range, and the value range may be directly used as a constraint range. For example, the user input information may directly include the range of hardware-related parameters such as the range of flops, the range of running time, or the range of memory occupancy, and the cloud-side device may directly extract the constraint range from the user input data.

The user input information may also include hardware identification information, such as the CPU model of the terminal device or the NPU model, etc. After receiving the user input information, the cloud-side device can retrieve data from the locally stored data according to the hardware identification information. Extract the hardware identification information to the constraint range, such as the range of flops that the CPU can support, the amount of memory that can be supported, and other information that represents the computing power of the CPU.

603. The cloud-side device samples the preset search space to obtain m groups of first parameter combinations.

After extracting the hardware constraints from the user input information, the cloud-side device samples from the preset search space to obtain m sets of first parameter combinations, each set of first parameter combinations includes depth, width or input image Parameters such as resolution affect the structure of the neural network, so that the first neural network can be constructed subsequently.

For example, the search space includes a numerical range of [5, 59], from which a set of width, depth and resolution values can be sampled, such as [15.2, 34.5, 55.6].

604. The cloud-side device is constructed to obtain m first neural networks.

After obtaining m groups of first parameter combinations, at least m first neural networks can be constructed.

Specifically, if there is a basic network, adjustment can be made directly on the basis of the basic network to obtain m first neural networks. If there is no basic network, the basic unit can be directly used for construction to obtain m first neural networks. For details, refer to the foregoing step 502, which will not be repeated here.

605. The cloud-side device selects n second neural networks from the m first neural networks.

After obtaining m first neural networks, the m first neural networks can be trained, for example, using a large amount of collected data or ImageNet data sets for training, etc., to obtain m first neural networks after training, and then Evaluate the structure or output accuracy of the m first neural networks after training, and select the optimal n second neural networks. For example, select n neural networks with the highest output accuracy from m first neural networks, or select n neural networks with optimal structure from the m first neural networks, or select from the m first neural networks N neural networks with the smallest ratio between output flops and output accuracy, etc.

606. The cloud-side device fits the relationship between the parameter combinations of the n second neural networks and the evaluation results to obtain a mapping relationship.

After the cloud-side device selects the optimal n second neural networks from the m first neural networks, it fits the relationship between the parameter combinations of the n second neural networks and the evaluation results, so as to obtain a mapping relation.

Specifically, a linear relationship may be used to perform fitting, or Gaussian fitting may be performed, etc., which may be adjusted according to actual application scenarios, which is not limited in this application.

For ease of understanding, the embodiment of the present application can be understood as sampling several different parameter combinations from the model parameters included in the search space, and changing the parameters of the basic model according to the different parameter combinations to construct several new models. Then train and test on the data set to obtain indicators such as the correct rate of the output results or the amount of calculation. According to the index such as the accuracy rate or calculation amount, select a model that meets the requirements (for example, the accuracy rate is high and the calculation amount is small), and the relationship between the parameters of the model and the calculation amount is obtained. Fitting this relationship with a formula (such as Gaussian process regression) results in a fitted curve.

Generally, some common neural network structures can be added to the basic model library, such as ResNet, MobileNet, etc., and the curve can be obtained by fitting the process provided in this application. When the basic model input by the user is in the basic model library, the fitted curve can be directly retrieved without the need to perform the above process and the service process can be accelerated.

607. The cloud-side device obtains a parameter combination corresponding to the constraint range according to the mapping relationship, and constructs a target neural network.

After obtaining the fitting curve of the parameters and the calculation amount, input the calculation amount required to build the model (and the target neural network), and output the parameter value. According to the hardware resource limitation of the actual application (such as the model calculation amount), input the calculation amount to the above formula, and output one or more parameter values, so as to scale the basic model according to the one or more parameter values, or rebuild the model to obtain target neural network.

608. The cloud-side device sends the target neural network to the terminal-side device.

After the cloud-side device obtains the target neural network, the target neural network can be deployed in the terminal-side device. For example, the terminal-side device can store the target neural network in a storage medium, so that the terminal-side device can use the target network. to perform related tasks.

Specifically, the cloud-side device can send the hyperparameters and weight parameters of the target neural network to the terminal-side device, so that the terminal-side device can locally construct the target neural network according to the hyperparameters and weight parameters, etc., to complete the target neural network. deployment, so that the end-side device can run the target neural network that is adapted to the computing power of the hardware.

Optionally, the target neural network can be trained by the cloud-side device, and the trained target neural network can be deployed in the terminal-side device, so that the terminal-side device can directly use the target neural network without training again. Improve the work efficiency of end-side equipment.

Therefore, this application provides a model scaling service that adapts to the user's hardware constraints. The user only needs to input the required model function (or provide a basic model), hardware model and speed requirements, and then a new model that satisfies the hardware constraints can be output, which can efficiently To obtain a model more suitable for the hardware, improve the user experience. For example, the user equipment sends model functions (or specified models), hardware models or delay requirements, etc. to the cloud-side device, such as a cloud server. The cloud server determines the basic model according to the input, such as selecting the basic model from the model library. According to the hardware model (select the calculation range), the lookup table is retrieved from the database, the delay requirement is mapped to a specific flops interval, and the upper limit or average value of the interval is taken as the input flops value. Based on the base model and the flops value, determine the values of parameters such as depth, width, or the resolution of the input image, or hyperparameter values. According to the basic model and hyperparameter values, the optimized model is obtained by training, and the optimized model is fed back to the user equipment.

The neural network construction method provided in the present application can be applied to various scenarios, and some scenarios are exemplified below for illustrative description.

Referring to FIG. 7 , a schematic diagram of an application scenario of the neural network construction method provided by the present application.

Among them, a basic network and hardware constraints can be determined, and the basic network can be a neural network provided by a user or a neural network selected from a variety of neural networks. For example, the basic network can be CNN, ResNet, RNN, etc.

The hardware constraints may be determined according to the computing capabilities of the hardware. For example, the hardware constraint can be the flops range of the hardware, the runtime of the neural network that can be hosted, or the amount of memory available, etc.

The underlying network and hardware constraints are used as input to the model scaling server, which outputs one or more models adapted to the hardware. The model scaling server can be used to execute the aforementioned process in FIG. 5 to obtain one or more final models adapted to the hardware. For example, neural network models of different sizes obtained by the model scaling server can be deployed on mobile terminals, control devices, autonomous vehicles, etc. according to hardware resource constraints.

Specifically, the model scaling server can be used to sample multiple sets of parameter combinations from the search space, and then deform the base network based on the multiple parameter combinations, such as adjusting the depth and width of the base network or the number or size of pooling layers etc., to get multiple neural networks. Then, the multiple neural networks can be run in hardware, and the process of running the multiple neural networks in the hardware can be evaluated to obtain the running time of the multiple neural networks in the hardware, the amount of memory occupied or flops, etc., and then The parameters in the construction parameter combination, such as the depth, width, resolution of the input image or the size of the convolution kernel, etc., and the mapping relationship between the running time, the amount of memory occupied, or flops, etc. Hardware constraints can include one or more of running time, occupied memory, or flops. Substitute the hardware constraints into the mapping relationship to obtain one or more sets of parameter combinations, which can include depth, width, resolution Or the value of parameters such as the size of the convolution kernel. Then, based on the combination of one or more sets of parameters, the basic network is scaled to obtain one or more neural networks. If multiple neural networks are obtained by scaling the basic network, the optimal neural network can be selected from the multiple neural networks as the optimal model. For example, the neural network with the highest output accuracy can be selected as the optimal model, or , on the basis of the same output accuracy, the neural network with less computation or shorter running time is selected as the optimal model. Therefore, in this application scenario, hardware constraints and a basic network can be input, and a model adapted to the hardware can be output, model scaling can be completed quickly and accurately, and an optimal model adapted to the hardware can be efficiently obtained.

In another specific scenario, the obtained optimal model can be used for image recognition and classification of images. Among them, the obtained optimal model can be applied to the terminal. As shown in FIG. 8 , the terminal used by the user may include multiple images, and the model deployed in the terminal and adapted to the hardware of the terminal can be used to compare the images in the terminal. to classify and arrange the images according to the classification. As shown in Figure 8, the images of dolphins and cats are classified respectively, and they are displayed in the display interface according to the classification of dolphins and cats, so that users can quickly browse the album. image found in . For example, users store a large number of pictures on mobile phones and cloud disks, and categorizing and managing albums can improve user experience. As shown in Figure 8, using a network model matching the current mobile phone computing resources in a series of convolutional neural network models provided by this application, the mobile phone terminal can be used to classify and manage pictures of different categories in the mobile phone album through image recognition, Thus, it is convenient for the user to search, saves the management time of the user, and improves the efficiency of album management.

In another specific application scenario, as shown in Figure 9, the optimal model obtained in this application may include a feature extraction network, and Mask RCNN can be regarded as an instance segmentation architecture, wherein the feature extraction network can be embedded in Mask RCNN , for feature extraction. It may also include a region generation network (Region Proposal Network, Region Proposal, RPN), a pooling layer (as shown in FIG. 9 , the point of interest pooling layer RoIPool). The purpose of setting RoIPool is to derive smaller feature maps from the region of interest (ROI) determined by the RPN network. After inputting the input image to the Mask RCNN, the target in the input image can be detected, such as the animal included in the input image as shown in Figure 9 can be efficiently recognized. Therefore, through the method provided in the present application, a feature extraction network adapted to the hardware can be obtained, so that the feature extraction can be efficiently performed within the range of the bearing capacity of the hardware.

More specifically, taking a specific scenario as an example below, a more detailed flow of the neural network construction method provided by the present application may be shown in FIG. 10 .

First, the search space 1001 may include the value range of parameters such as depth, width, resolution of the input image, size of the convolution kernel, or the number of groups of the convolution kernel.

Sampling is performed from the search space 1001 to obtain m sets of parameter combinations. Each set of parameter combinations may include values of parameters such as depth, width, resolution of the input image, size of the convolution kernel, or the number of groups of the convolution kernel.

After obtaining m groups of parameter combinations, if there is a basic network, you can adjust the basic network based on the m groups of parameter combinations, and adjust the depth, width, pooling layer size, convolution kernel size or volume of the basic network. The number of groups of accumulated kernels is obtained to obtain m first neural networks 1003 . For example, if the initial depth of the basic network, that is, the number of network layers, is 10, and the value of the depth included in the parameter combination is 15, then 5 layers of networks can be added to the basic network to obtain a neural network with a 15-layer network. Or, if the width of each layer of the basic network is 8, that is, each layer of the network includes 8 basic units, and the width included in the parameter combination is 16, you can add 8 basic units to each layer of the basic network. unit, resulting in a base network with a width of 16. If there is no basic network, m first neural networks 1003 can be directly constructed.

After the m first neural networks are obtained, the m neural networks can be trained by using the preset data set 1004 to obtain m neural networks 1005 after training.

Then, from the m neural networks 1005 after training, n second neural networks with the best computational performance are selected. The optimal performance can be high output precision and less computation, or high output precision and short running time.

Then, a Gaussian regression fitting is performed on the relationship between the parameters of the n neural network structures and the evaluation results, such as calculation amount, running time or memory occupancy, to obtain a mapping relationship 1007 .

Then, after obtaining the constraint range 1008 , one or more sets of parameter combinations 1009 are obtained by calculation based on the mapping relationship 1007 . And the target neural network 1010 is obtained by constructing based on one or more sets of parameter combinations 1009 .

Optionally, after the target neural network 1010 is obtained, the target neural network 1010 may also be trained to obtain a trained target neural network, and the trained target neural network is deployed in hardware.

For example, GhostNet can be used as a basic network for image classification tasks, and its computational load is 591M flops. The search space includes the width w (number of channels), depth d (number of layers) and input image resolution r of the neural network. Randomly sample w, d, and r from the interval [0.25, 4], and sample m parameter combinations, such as (w=0.33, d=0.78, r=2.33). According to the parameter combination, the width, depth and input resolution of the basic model are changed to obtain m new models, and the calculation amount of the m new models is calculated at the same time. Then, model training is performed on the ImageNet data set or its subset, and after the training is completed, the recognition accuracy is tested in the validation set, and the accuracy corresponding to m new models is obtained.

According to the calculation amount and the correct rate, select m models (m<=n) in the Pareto frontier from the n new models. These m models are relatively excellent models, so as to obtain their (w) , d, r) parameters and the relationship between the amount of calculation. For example, to calculate that can be measured in flops, the relationship between w, d, r and flops can be constructed separately. Illustratively, the relationship between width and flops may be as shown in FIG. 11A , the relationship between depth and flops may be as shown in FIG. 11B , and the relationship between resolution and flops may be as shown in FIG. 11C . Gaussian process regression can be used to fit the above relationship between w, d, r and flops. Taking the resolution r as an example, the horizontal and vertical coordinates of m points in the figure are flops, respectively.

and resolution

These m points are used as training data, and the joint distribution of training data and test point c ^* is as follows Gaussian distribution

in,

K() is the kernel function (such as the inner product function),

σ is the standard deviation of r. After deduction, the predicted value r ^* can be obtained, that is, the fitted curve formula:

in,

The relationship between width and depth and flops is similar to the calculation method of resolution r, and will not be repeated here.

After obtaining the curve formula fitted by the above method, the user only needs to input the calculation amount c ^* to obtain the predicted r ^* , d ^* , w ^* values. By (r ^* , d ^* , w ^* ), the width, depth and input resolution of the basic model GhostNet-A are changed to obtain a new model with a computational cost of c ^* . After the new model is trained, it can be deployed on different hardware devices, such as mobile phones, control devices and other devices.

For ease of understanding, the output results of the new model obtained by scaling the GhostNet model as the basic network in the embodiment of the present application on the ImageNet dataset are compared with the output results of some commonly used neural networks (EfficientNet or MobileNetV3, etc.).

Table 1

It can be seen from Table 1 that the output results of the new model obtained by scaling the GhostNet model provided in this application have better output accuracy when the width and flops of each neural network are close or similar. Therefore, the neural network obtained by the neural network construction method provided by the present application can obtain a neural network with better structure and output accuracy on the basis of efficiently constructing the neural network, so that the structure and output accuracy of the neural network are balanced, and can be better matching hardware.

In addition, the neural network construction method provided in this application can also be used to scale any network. For example, EfficientNet-B0 can be used as a basic model to obtain a series of smaller models. Exemplarily, the output results of the new model obtained after scaling with EfficientNet-B0 as the basic model on the ImageNet dataset are compared with the output results of some commonly used neural networks, and the comparison results can be shown in Table 2. Among them, RA stands for random data augmentation (Rand Augment).

Table 2

Similar to the output results of the new model obtained by scaling the GhostNet model above, it can be seen from Table 2 that when the width and flops of each neural network are close or similar, the output accuracy is better. Therefore, the neural network obtained by the neural network construction method provided by the present application can obtain a neural network with better structure and output accuracy on the basis of efficiently constructing the neural network, so that the structure and output accuracy of the neural network are balanced, and can be better matching hardware.

The flow of the method provided by the present application is explained in detail above, and the device provided by the present application is described in detail below.

Referring to FIG. 12, the present application provides a neural network construction device, including:

Sampling module 1201, configured to sample at least one set of first parameter combinations from a preset search space, where the search space includes the value ranges of various parameters used when constructing a neural network, and at least one set of first parameter combinations is included in the search space. each first parameter combination includes a value for each of the plurality of parameters;

a building module 1202, configured to build a plurality of first neural networks according to at least one set of first parameter combinations;

an obtaining module 1204, configured to obtain a constraint range, where the constraint range includes a numerical range identifying the computing capability of the computing device, and the constraint range may be a numerical range determined according to information on the computing capability of the computing device;

The calculation module 1205 is used to obtain the second parameter combination corresponding to the constraint range according to the mapping relationship, the mapping relationship includes the relationship between the at least one parameter combination and the evaluation results of multiple first neural networks, and the evaluation results are to many a result obtained by evaluating the structure of each of the first neural networks;

The building block 1202 is further configured to obtain the target neural network according to the second parameter combination.

In a possible implementation, the device may further include:

The mapping module 1203 is configured to generate the mapping relationship according to the relationship between the at least one parameter combination and the evaluation results of the multiple first neural networks.

In a possible implementation, the device may further include:

The first training module 1206 is used to train a plurality of first neural networks by using a preset first data set to obtain a plurality of first neural networks after training;

The screening module 1207 is configured to select from the plurality of first neural networks according to the evaluation result of each of the trained first neural networks or the output accuracy of each trained first neural network after training. Screening out at least one second neural network from the network;

The mapping relationship can be based on the relationship between the corresponding parameter combination of each second neural network in the at least one second neural network and the evaluation result of each second neural network.

In a possible implementation manner, the mapping module 1203 is specifically configured to establish the relationship between the corresponding parameter combination of each second neural network in the at least one second neural network and the evaluation result of each second neural network Fit to get the mapping relationship.

In a possible implementation manner, the evaluation result may include: the total number of floating-point operations flops of each first neural network, the running time of the forward inference of each first neural network, the running time of each first neural network One or more of the amount of memory, or the amount of parameters of each first neural network.

In a possible implementation manner, the obtaining module 1204 is specifically configured to receive user input data, and obtain the constraint range according to the user input data.

In a possible implementation manner, the obtaining module 1204 is specifically configured to: obtain the identification information of the computing device from the user input data; and obtain the constraint range according to the identification information of the computing device.

In a possible implementation manner, the above-mentioned apparatus may further include: a second training module 1208, configured to use a preset second data set to train the target neural network to obtain a trained target neural network.

Please refer to FIG. 13 , which is a schematic structural diagram of another apparatus for constructing a neural network provided by the present application, as described below.

The neural network construction apparatus may include a processor 1301 and a memory 1302 . The processor 1301 and the memory 1302 are interconnected by wires. Among them, the memory 1302 stores program instructions and data.

The memory 1302 stores program instructions and data corresponding to the steps in the aforementioned FIG. 5 to FIG. 11C .

The processor 1301 is configured to perform the method steps performed by the apparatus for constructing a neural network shown in any of the foregoing embodiments in FIG. 5 to FIG. 11C .

Optionally, the apparatus for constructing a neural network may further include a transceiver 1303 for receiving or sending data.

Embodiments of the present application also provide a computer-readable storage medium, where a program for generating a vehicle's running speed is stored in the computer-readable storage medium, and when the computer is running on a computer, the computer is made to execute the above-mentioned Fig. 5-Fig. 11C The illustrated embodiment describes the steps in the method.

Optionally, the aforementioned apparatus for constructing a neural network shown in FIG. 13 is a chip.

The embodiments of the present application also provide a neural network construction device, which may also be called a digital processing chip or a chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are processed. The unit is executed, and the processing unit is configured to execute the method steps executed by the apparatus for constructing a neural network shown in any of the foregoing embodiments in FIG. 5 to FIG. 11C .

The embodiments of the present application also provide a digital processing chip. The digital processing chip integrates circuits and one or more interfaces for realizing the above-mentioned processor 1301 or the functions of the processor 1301 . When a memory is integrated in the digital processing chip, the digital processing chip can perform the method steps of any one or more of the foregoing embodiments. When the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface. The digital processing chip implements the actions performed by the neural network construction device in the above embodiment according to the program codes stored in the external memory.

Embodiments of the present application also provide a computer program product that, when driving on a computer, causes the computer to execute the steps performed by the apparatus for constructing a neural network in the method described in the embodiments shown in FIG. 5 to FIG. 11C .

The neural network construction apparatus provided in the embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin, or a circuit, etc. . The processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the neural network construction method described in the embodiments shown in FIG. 5 to FIG. 11C. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.

Specifically, the aforementioned processing unit or processor may be a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), a digital signal processing digital signal processor (DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or it may be any conventional processor or the like.

Exemplarily, please refer to FIG. 14. FIG. 14 is a schematic structural diagram of a chip provided by an embodiment of the application. The chip may be represented as a neural network processor NPU 140, and the NPU 140 is mounted as a coprocessor to the main CPU ( Host CPU), the task is allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 140, and the controller 1404 controls the arithmetic circuit 1403 to extract the matrix data in the memory and perform multiplication operations.

In some implementations, the arithmetic circuit 1403 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 1403 is a two-dimensional systolic array. The arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1403 is a general-purpose matrix processor.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1402 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit fetches the data of matrix A and matrix B from the input memory 1401 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1408 .

Unified memory 1406 is used to store input data and output data. The weight data is directly passed through the storage unit access controller (direct memory access controller, DMAC) 1405, and the DMAC is transferred to the weight memory 1402. Input data is also moved to unified memory 1406 via the DMAC.

A bus interface unit (BIU) 1410 is used for the interaction between the AXI bus and the DMAC and an instruction fetch buffer (instruction fetch buffer, IFB) 1409.

The bus interface unit 1410 (bus interface unit, BIU) is used for the instruction fetch memory 1409 to obtain instructions from the external memory, and is also used for the storage unit access controller 1405 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1406 , the weight data to the weight memory 1402 , or the input data to the input memory 1401 .

The vector calculation unit 1407 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional/fully connected layer network computations in neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.

In some implementations, the vector computation unit 1407 can store the processed output vectors to the unified memory 1406 . For example, the vector calculation unit 1407 may apply a linear function and/or a nonlinear function to the output of the operation circuit 1403, such as linear interpolation of the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 1407 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 1403, eg, for use in subsequent layers in a neural network.

The instruction fetch buffer (instruction fetch buffer) 1409 connected to the controller 1404 is used to store the instructions used by the controller 1404;

The unified memory 1406, the input memory 1401, the weight memory 1402 and the instruction fetch memory 1409 are all On-Chip memories. External memory is private to the NPU hardware architecture.

The operation of each layer in the recurrent neural network can be performed by the operation circuit 1403 or the vector calculation unit 1407 .

Wherein, the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the programs of the methods of FIG. 5 to FIG. 11C.

In addition, it should be noted that the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.

From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Special components, etc. to achieve. Under normal circumstances, all functions completed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structures used to implement the same function can also be various, such as analog circuits, digital circuits or special circuit, etc. However, a software program implementation is a better implementation in many cases for this application. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a readable storage medium, such as a floppy disk of a computer. , U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), disk or CD, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present application.

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.

The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

Finally, it should be noted that: the above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. Any person skilled in the art who is familiar with the technical scope disclosed by the present application can easily think of changes. Or replacement should be covered within the protection scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A method for constructing a neural network, comprising:

At least one set of first parameter combinations is sampled from a preset search space, where the search space includes value ranges of various parameters used in constructing the neural network, and each of the at least one set of first parameter combinations the first parameter combination includes a value for each of the plurality of parameters;

Constructing a plurality of first neural networks according to the at least one set of first parameter combinations;

obtaining a constraint range, the constraint range including a numerical range identifying the computing capability of the computing device;

Obtain a second parameter combination corresponding to the constraint range according to the mapping relationship, where the mapping relationship includes the relationship between the at least one set of parameter combinations and the evaluation results of the first neural networks, the evaluation results a result obtained by evaluating the structure of each first neural network in the plurality of first neural networks;

The target neural network is obtained according to the second parameter combination.
The method according to claim 1, wherein the method further comprises:

using the preset first data set to train the plurality of first neural networks to obtain a plurality of first neural networks after training;

According to the evaluation result of each first neural network in the plurality of first neural networks after training or the output accuracy of each trained first neural network after training, from the plurality of first neural networks Screening out at least one second neural network from the network;

The mapping relationship includes a relationship between a corresponding parameter combination of each second neural network in the at least one second neural network and an evaluation result of each second neural network.
The method of claim 2, wherein:

The mapping relationship is obtained by fitting the relationship between the corresponding parameter combination of each second neural network in the at least one second neural network and the evaluation result of each second neural network.
The method according to any one of claims 1-3, wherein the parameters included in the search space include one or more of the following:

Width, depth, resolution or the size of the convolution kernel, the width is the number of basic units included in each layer of the neural network, the depth is the number of layers of the neural network, and the resolution is The resolution of the image input to the neural network.
The method according to any one of claims 1-4, wherein the evaluation result comprises:

The total number of floating-point operations flops of each first neural network, the running time of forward inference of each first neural network, the amount of memory occupied by running each first neural network, or the amount of memory occupied by each first neural network. One or more parameters of the first neural network.
The method according to any one of claims 1-5, wherein the obtaining the constraint range comprises:

Receive user input data, and obtain the constraint range according to the user input data.
The method according to claim 6, wherein the obtaining the constraint range according to the user input data comprises:

Obtaining identification information of the computing device from the user input data;

The constraint range is acquired according to the identification information of the computing device.
The method according to any one of claims 1-7, wherein the method further comprises:

The target neural network is trained using a preset second data set to obtain the trained target neural network.
The method according to any one of claims 1-8, wherein the target neural network is used to perform at least one of feature extraction, semantic segmentation, classification tasks, super-resolution or target detection.
A device for constructing a neural network, comprising:

A sampling module, configured to sample at least one set of first parameter combinations from a preset search space, where the search space includes the value ranges of various parameters used in constructing the neural network, the at least one set of first parameters each first parameter combination in the combination includes a value for each of the plurality of parameters;

a building module for building a plurality of first neural networks according to the at least one set of first parameter combinations;

an acquisition module, configured to acquire a constraint range, the constraint range includes a numerical range identifying the computing capability of the computing device;

A calculation module, configured to obtain a second parameter combination corresponding to the constraint range according to the mapping relationship, where the mapping relationship includes the relationship between the at least one set of parameter combinations and the evaluation results of the plurality of first neural networks relationship, the evaluation result is a result obtained by evaluating the structure of each first neural network in the plurality of first neural networks;

The building module is further configured to obtain a target neural network according to the second parameter combination.
The apparatus of claim 10, wherein the apparatus further comprises:

a first training module, configured to train the plurality of first neural networks using a preset first data set to obtain a plurality of trained first neural networks;

The screening module is configured to, according to the evaluation result of each first neural network in the plurality of first neural networks after training or the output accuracy of the trained first neural network after training, from the Screening out at least one second neural network from the plurality of first neural networks;

The mapping relationship includes a relationship between a corresponding parameter combination of each second neural network in the at least one second neural network and an evaluation result of each second neural network.
The apparatus of claim 11, wherein:

The mapping relationship is obtained by fitting the relationship between the corresponding parameter combination of each second neural network in the at least one second neural network and the evaluation result of each second neural network.
The apparatus according to any one of claims 10-12, wherein the parameters included in the search space include one or more of the following:

Width, depth, resolution or the size of the convolution kernel, the width is the number of basic units included in each layer of the neural network, the depth is the number of layers of the neural network, and the resolution is The resolution of the image input to the neural network.
The device according to any one of claims 10-13, wherein the evaluation result comprises:

The total number of floating-point operations flops of each first neural network, the running time of forward inference of each first neural network, the amount of memory occupied by running each first neural network, or the amount of memory occupied by each first neural network. One or more parameters of the first neural network.
The device according to any one of claims 10-14, characterized in that,

The obtaining module is specifically configured to receive user input data, and obtain the constraint range according to the user input data.
The device according to claim 15, wherein the acquisition module is specifically configured to:

Obtaining identification information of the computing device from the user input data;

The constraint range is acquired according to the identification information of the computing device.
The device according to any one of claims 10-16, wherein the device further comprises:

The second training module is used for training the target neural network using a preset second data set to obtain the trained target neural network.
The apparatus according to any one of claims 10-17, wherein the target neural network is used to perform at least one of feature extraction, semantic segmentation, classification tasks, super-resolution or target detection.
An apparatus for constructing a neural network, characterized in that it comprises a processor, the processor is coupled to a memory, the memory stores a program, and when the program instructions stored in the memory are executed by the processor, claims 1 to 1 are realized. The method of any one of 9.
A computer-readable storage medium comprising a program which, when executed by a processing unit, performs the method of any one of claims 1 to 9.
An apparatus for constructing a neural network, characterized in that it includes a processing unit and a communication interface, the processing unit obtains program instructions through the communication interface, and when the program instructions are executed by the processing unit, the implementation of claims 1 to 9 The method of any one.