CN115953839B - Real-time 2D gesture estimation method based on loop architecture and key point regression - Google Patents
Real-time 2D gesture estimation method based on loop architecture and key point regression Download PDFInfo
- Publication number
- CN115953839B CN115953839B CN202211675766.0A CN202211675766A CN115953839B CN 115953839 B CN115953839 B CN 115953839B CN 202211675766 A CN202211675766 A CN 202211675766A CN 115953839 B CN115953839 B CN 115953839B
- Authority
- CN
- China
- Prior art keywords
- module
- model
- feature map
- regression
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000013528 artificial neural network Methods 0.000 claims abstract description 18
- 230000000694 effects Effects 0.000 claims abstract description 16
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 23
- 230000004913 activation Effects 0.000 claims description 15
- 238000009826 distribution Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 210000000988 bone and bone Anatomy 0.000 claims description 3
- 230000003139 buffering effect Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 230000007774 longterm Effects 0.000 claims description 3
- 238000011105 stabilization Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 8
- 230000009467 reduction Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 9
- 230000009466 transformation Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 210000002478 hand joint Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
The invention provides a real-time 2D gesture estimation method based on a circulating architecture and coordinate system regression, which belongs to the technical field of real-time 2D gesture estimation, and a core module of the method comprises an image acquisition module, a lightweight neural network algorithm module, a circulating architecture module and a key point regression module; the coordinate system regression method has the advantages that the algorithm consumes short time and less resources, and the real-time and end-to-end full differential training can be realized on a mobile end, embedded or low-cost hardware cost platform; using a cyclic architecture module to enhance the effect of the model on dynamic gesture estimation in the video; the real-time 2D gesture estimation method based on the loop architecture and the coordinate system regression can realize the real-time and high-precision detection effect of the mobile terminal, embedded hardware or low-cost hardware, can effectively relieve the problem of model detection performance reduction caused by motion blurring and self-shielding in a video, and realizes the quick landing of products.
Description
Technical Field
The invention belongs to the technical field of real-time 2D gesture estimation, and particularly relates to a real-time 2D gesture estimation method based on a loop architecture and key point regression.
Background
The 2D gesture estimation technology mainly detects 21 key points of the hand, and can describe information expressed by different gestures through the key points; hand 2D keypoint detection is one of the basic algorithms of computer vision and plays an important role in research in other related fields of computer vision. At present, the main hardware carrier equipment of the metauniverse is AR\VR equipment and the like, images can be acquired through a camera, and corresponding feedback is obtained through analyzing information expressed by gestures of a user.
2D gesture estimation is in fact a very challenging task with respect to body pose estimation. The effect of 2D gesture estimation may be reduced because the hand joints are more flexible, motion sensitive and affected by self occlusion. At present, a method based on a Gaussian heat map is the main flow direction of the technology and has the same recognition effect, in the industrial Internet age, the combination of embedded type and artificial intelligence is a necessary development trend, so that the 2D gesture estimation algorithm of the method based on the Gaussian heat map can not reach the real-time effect usually when the method is operated on a mobile terminal, an embedded type or a low-cost hardware platform, and the detection effect of the method is not satisfactory for motion blurring caused by dynamic gestures and self-shielding problems; because of the problems of high memory consumption and low reasoning speed of the gaussian heat diagram-based method, great delay occurs when the technology is to be operated in low-cost hardware, and discomfort is often brought to the experience of the whole product.
Therefore, it is necessary to invent a real-time 2D gesture estimation method based on a loop architecture and key point regression.
Disclosure of Invention
In order to solve the above-mentioned problems, the present invention provides a real-time 2D gesture estimation method based on a loop architecture and key point regression to solve the above-mentioned problems. The core module of the real-time 2D gesture estimation method based on the circulating architecture and the key point regression comprises an image acquisition module, a lightweight neural network algorithm module, a circulating architecture module and a key point regression module, wherein the image acquisition module is a monocular camera;
the lightweight neural network algorithm module adopts MobileNet V3 as a lightweight backbone model to extract characteristics, and consists of a plurality of stages, wherein a plurality of groups of deep separable convolutions are formed;
the cyclic architecture module acquires characteristic information through a MobileNet V3 backbone network and passes through a cyclic architecture module; the circulation mechanism can learn which information should be reserved in the continuous video stream by itself, and the long-term and short-term time information capability is reserved while self-adapting, so that the circulation mechanism is suitable for our requirements;
the key point regression module outputs the obtained feature map through the circulation architecture module as the input of the key point regression module, and respectively passes through 2 FC layers; FC1 outputs coordinate information of the 2D skeleton key points, and FC2 outputs score information of the 2D skeleton key points; the regression results need to be supervised, so that a standardized flow module is added for auxiliary training.
Preferably, in the lightweight neural network algorithm module, the depth separable convolution is mainly divided into two processes, namely channel-by-channel convolution and point-by-point convolution; one convolution kernel of the channel-by-channel convolution is responsible for one channel, one channel is only convolved by one convolution kernel, and the number of the channels of the characteristic map generated in the process is completely consistent with the number of the input channels; the point-by-point convolution uses 1x1 convolution, and the feature images output by the channel-by-channel convolution are weighted and combined in the depth direction to produce a new feature image;
preferably, adding an SE structure module to obtain a new feature matrix; when the step length is 1, and the input characteristic matrix and the output characteristic matrix are the same in size, carrying out shortcut connection; after the MobileNet V3 trunk model outputs the feature map, an LR-ASPP module is added to increase the receptive field, the accuracy of the whole model is improved, the feature map of the input channel is divided into two branches, and the left branch outputs a feature map P through a convolution kernel of 1x1 1 The right branch outputs a characteristic diagram P after passing through a global tie pool layer, a 1x1 convolution kernel and a Sigmod module 2 And for the characteristic diagram P 1 And feature map P 2 After multiplication, a new feature map is output;
preferably, in the last stage of the model, the SiLu activation function is used instead of all the original activation functions; input Z k Activation a of kth SiLU of (2) k The a is calculated by multiplying the sigmoid function by its input k (z k )=z k σ(z k ) Equation 1, in which the sigmoid function, is for a larger Z k The value, siLU activation is substantially equal to the function of ReLU, but different ReLUs, siLU activation is not monotonically increasing, but instead for Z k Approximately 1.28, its global minimum is-0.28; the SiLU has the characteristic of self-stabilization, the global minimum value with the derivative of zero plays a role of buffering the weights, the global minimum value serves as an implicit regularizer to inhibit learning of a large number of weights, the model performance is improved in actual tests, and the effect of replacing the SiLU function by all stages is equivalent, so that the model can be used only in the last Stage.
Preferably, in the loop architecture module, when the feature map is input into the loop architecture module, the channel for inputting the feature map is equally divided into feature maps P 3 And feature map P 4 For the characteristic map P 4 Output profile P output by ConvGRU 5 And memory cell feature map h t Splice feature map P 3 And feature map P 4 Output of the feature map P 6 。
Preferably, in the key point regression module, the normalized flow module can convert some basic simple distributions into arbitrary complex distributions, and theoretically, if the transformation is sufficiently complex, then arbitrary target distributions can be fitted; in the practical training process, a neural network is used, so that the neural network can be close to any function in theory, and therefore, a series of complex transformations in a standardized flow model can be realized by superposition on an FC layer; in the model training process, the regression module fits the output value of simple distribution, and the normalized flow module transforms the fitted result value to enable the transformed result to be closer to the distribution P of the target.
Preferably, the training is divided into four stages, namely stage 1, stage 2, stage 3 and stage 4, and stage 1 can use scattered data sets to perform model training under the condition of no cycle architecture module to obtain a proper pre-training model, and in actual test comparison, it is found that although the classified pre-training model of MobileNet V3 is used as the pre-training model of the key point model, compared with the pre-training model of the key point model, the model can have faster loss shrinkage and a certain improvement on model performance in later training.
Preferably, stage 2 performs a training of 15 frames on the video stream data, we set a shorter sequence length t=15 frames, so that the network can be updated quickly; stage 3 increases the T frames to 50 frames, reduces the learning rate to half of the original, and retains the super-parametric training model of stage 1, which allows our model to see longer sequence information and learn the dependency between long sequences.
Preferably, stage 4 uses video stream data and sporadic data for integration training a small number of iterations, for sporadic data we consider it as a video sequence of only 1 frame, which can force the model to remain robust even without repeated or continuous information.
Wherein robustness is a strong and robust meaning; it is also the ability of the system to survive in abnormal and dangerous situations; for example, the robustness of the computer software can be realized if the computer software is not dead or crashed under the conditions of input errors, disk faults, network overload or intentional attack; robustness also refers to the characteristic of the control system to maintain certain other properties under perturbation of certain parameters.
Compared with the prior art, the invention has the following beneficial effects:
in the invention, firstly, the 2D gesture estimation method based on the key point regression has the advantages of short algorithm consumption time and less resources, and can realize real-time operation and full differential training from end to end on a mobile end, embedded or low-cost hardware cost platform. The Gaussian heat map-based method is not an end-to-end differentiable model from image input to coordinate regression, the Gaussian heat map to coordinate points need to be obtained in an argmax mode, and the process is not conductive; however, the position information is converted into the coordinate value based on the result of full convolution in the coordinate regression mode, and for the dim information conversion, the nonlinearity is very strong, and the model is not easy to converge in training, so that the problem is solved by using the standardized flow module, and the effect of rapidness and high precision at the embedded end is realized;
for motion blur and self-occlusion problems with dynamic gestures in video, although many are designed for video applications, a single frame is treated as an independent image, but the most widely existing temporal information in video is ignored; therefore, a loop architecture module is used for enhancing the effect of the model on dynamic gesture estimation in the video. Because in the video, the model can know the previous frame and predict the current frame, and under the condition that a single frame is possibly blurred, the model can refer to better prediction key points of the previous frame, so that the definition of the model is greatly improved; the method can be applied to all videos without any auxiliary input; according to the model training strategy set by the user, a high-precision model can be effectively generated; the problems of motion blurring and self-shielding caused by dynamic gestures can be solved to a great extent;
therefore, the real-time 2D gesture estimation method based on the loop architecture and the key point regression can realize the real-time and high-precision detection effect of mobile terminal, embedded or low-cost hardware, can effectively alleviate the problem of model detection performance reduction caused by motion blurring and self-shielding in video, and realizes quick landing of products.
Drawings
Fig. 1 is a block diagram of the module of the present invention.
FIG. 2 is a block flow diagram of a lightweight neural network algorithm module of the present invention.
FIG. 3 is a block flow diagram of the overall model of the present invention.
Fig. 4 is a block flow diagram of the training strategy of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
examples:
as shown in fig. 1 to 4
The invention provides a real-time 2D gesture estimation method based on a circulating architecture and key point regression, wherein a core module comprises an image acquisition module, a lightweight neural network algorithm module, a circulating architecture module and a key point regression module, and the image acquisition module is a monocular camera; the lightweight neural network algorithm module adopts MobileNetV3 as a lightweight backbone model to extract characteristics, and consists of a plurality of stages, wherein a plurality of groups of deep separable convolutions are formed in the lightweight neural network algorithm module; the cyclic architecture module acquires characteristic information through a MobileNet V3 backbone network and passes through a cyclic architecture module; the circulation mechanism can learn which information should be reserved in the continuous video stream by itself, and the long-term and short-term time information capability is reserved while self-adapting, so that the circulation mechanism is suitable for our requirements; the key point regression module outputs the obtained feature map through the circulation architecture module as the input of the key point regression module, and respectively passes through 2 FC layers; FC1 outputs coordinate information of the 2D skeleton key points, and FC2 outputs score information of the 2D skeleton key points; the regression results need to be supervised, so that a standardized flow module is added for auxiliary training.
In the embodiment, in the lightweight neural network algorithm module, the depth separable convolution is mainly divided into two processes, namely channel-by-channel convolution and point-by-point convolution; one convolution kernel of the channel-by-channel convolution is responsible for one channel, one channel is only convolved by one convolution kernel, and the number of the channels of the characteristic map generated in the process is completely consistent with the number of the input channels; the point-by-point convolution uses 1x1 convolution, and the feature images output by the channel-by-channel convolution are weighted and combined in the depth direction to produce a new feature image; adding an SE structure module to obtain a new feature matrix; when the step length is 1, and the input characteristic matrix and the output characteristic matrix are the same in size, carrying out shortcut connection; after the MobileNet V3 trunk model outputs the feature map, an LR-ASPP module is added to increase the receptive field, the accuracy of the whole model is improved, the feature map of the input channel is divided into two branches, and the left branch outputs a feature map P through a convolution kernel of 1x1 1 The right branch outputs a characteristic diagram P after passing through a global tie pool layer, a 1x1 convolution kernel and a Sigmod module 2 And for the characteristic diagram P 1 And feature map P 2 After multiplication, a new feature map is output;
in the last stage of the model, replacing all original activation functions, and using SiLu activation functions; input Z k Activation a of kth SiLU of (2) k The a is calculated by multiplying the sigmoid function by its input k (z k )=z k σ(z k ) Equation 1, in which the sigmoid function, is for a larger Z k The value, siLU activation is substantially equal to the function of ReLU, but different ReLUs, siLU activation is not monotonically increasing, but instead for Z k Approximately 1.28, its global minimum is-0.28; the SiLU has the self-stabilization characteristic, the global minimum value with the derivative of zero plays a role in buffering weights, the global minimum value serves as an implicit regularizer to inhibit learning of a large number of weights, the model performance is improved in actual tests, and the effect of replacing SiLU functions by all stages is equivalent, so that the model can be obtained by only using the last Stage:
in this embodiment, in the cyclic architecture module, when the feature map is input into the cyclic architecture module, the channel for inputting the feature map is equally divided into feature maps P 3 And feature map P 4 For the characteristic map P 4 Output profile P output by ConvGRU 5 And memory cell feature map h t Splice feature map P 3 And feature map P 4 Output of the feature map P 6 The method comprises the steps of carrying out a first treatment on the surface of the Formally convglu is defined as follows:
z t =σ(w zx *x t +w zh *h t-1 +b z )
r t =σ(w rx *x t +w rh *h t-1 +b r )
wherein is sum. Representing the product of the convolution and the corresponding location element, and tanh and sigma represent hyperbolic and Sigmod functions; w and b are convolution kernels and bias terms.Hidden layer h t H as output and as cycle state for next time t-1 The method comprises the steps of carrying out a first treatment on the surface of the Initial cycle state h 0 Is an all zero tensor.
In the key point regression module, the standardized flow module can convert some basic simple distribution into arbitrary complex distribution, and theoretically, the transformation is enough complex, so that arbitrary target distribution can be fitted; in the practical training process, a neural network is used, so that the neural network can be close to any function in theory, and therefore, a series of complex transformations in a standardized flow model can be realized by superposition on an FC layer; in the model training process, the regression module fits the output value of simple distribution, and the normalized flow module transforms the fitted result value to enable the transformed result to be closer to the distribution P of the target; then the loss function L of the normalized stream module mle Can be set as follows
Wherein phi is a learnable parameter of the normalized flow model, mu g Is the skeletal key point coordinates of the data,bone key point coordinates predicted for regression module, < +.>Skeletal keypoint scores predicted for the regression module.
In this embodiment, the training is divided into four phases, namely phase 1, phase 2, phase 3 and phase 4, where the phase 1 can use scattered data sets to perform model training under the condition of no cycle architecture module to obtain a suitable pre-training model, and in actual test comparison, it is found that although the classified pre-training model of MobileNetV3 is used as the pre-training model of the key point model, compared with the pre-training model of the key point model, the model can bring faster loss shrinkage and improve the model performance to some extent for later training of the model; stage 2 training 15 frames on video stream data, we set a shorter sequence length t=15 frames, so that the network can be updated quickly; stage 3, increasing the T frame to 50 frames, reducing the learning rate to half of the original one, and reserving the super-parameter training model of stage 1, so that the model can see longer sequence information and learn the dependency relationship among long sequences; stage 4 uses video stream data and sporadic data for integration training a small number of iteration numbers, which we consider as a video sequence of only 1 frame for sporadic data, which can force the model to remain robust even without repeated or continuous information.
In the invention, firstly, the 2D gesture estimation method based on the key point regression has the advantages that the algorithm consumes short time and less resources, and can realize real-time operation and full differential training from end to end on a mobile end, embedded or low-cost hardware cost platform; the Gaussian heat map-based method is not an end-to-end differentiable model from image input to coordinate regression, the Gaussian heat map to coordinate points need to be obtained in an argmax mode, and the process is not conductive; however, the position information is converted into the coordinate value based on the result of full convolution in the coordinate regression mode, and for the dim information conversion, the nonlinearity is very strong, and the model is not easy to converge in training, so that the problem is solved by using the standardized flow module, and the effect of rapidness and high precision at the embedded end is realized;
for motion blur and self-occlusion problems with dynamic gestures in video, although many are designed for video applications, a single frame is treated as an independent image, but the most widely existing temporal information in video is ignored; therefore, a loop architecture module is used for enhancing the effect of the model on dynamic gesture estimation in the video. Because in the video, the model can know the previous frame and predict the current frame, and under the condition that a single frame is possibly blurred, the model can refer to better prediction key points of the previous frame, so that the definition of the model is greatly improved. The method can be applied to all videos without any auxiliary input; according to the model training strategy set by the user, a high-precision model can be effectively generated; the problems of motion blurring and self-shielding caused by dynamic gestures can be solved to a great extent;
therefore, the real-time 2D gesture estimation method based on the loop architecture and the key point regression can realize the real-time and high-precision detection effect of mobile terminal, embedded or low-cost hardware, can effectively alleviate the problem of model detection performance reduction caused by motion blurring and self-shielding in video, and realizes quick landing of products.
By utilizing the technical scheme of the invention or under the inspired by the technical scheme of the invention, a similar technical scheme is designed by a person skilled in the art, so that the technical effects are achieved, and the technical scheme falls into the protection scope of the invention.
Claims (5)
1. A real-time 2D gesture estimation method based on a loop architecture and key point regression is characterized by comprising the following steps: the core module comprises an image acquisition module, a lightweight neural network algorithm module, a circulation architecture module and a key point regression module, wherein the image acquisition module is a monocular camera;
the lightweight neural network algorithm module adopts MobileNetV3 as a lightweight backbone model to extract characteristics, and consists of a plurality of stages, wherein a plurality of groups of deep separable convolutions are formed;
the cyclic architecture module acquires characteristic information through a MobileNet V3 backbone network and passes through a cyclic architecture module; the circulation mechanism can learn which information should be reserved in the continuous video stream by itself, and the long-term and short-term time information capability is reserved while self-adapting, so that the circulation mechanism is suitable for our requirements; in the cycle architecture module, when the feature map is input into the cycle architecture module, a channel for inputting the feature map is equally divided into a feature map P3 and a feature map P4, and the feature map P4 is spliced with an output feature map P5 and a memory unit feature map ht which are output through ConvGRU, so as to output a feature map P6;
the key point regression module is output by the circulation architecture module to obtainThe feature map of (2) is used as the input of the key point regression module and respectively passes through 2 FC layers; FC1 outputs coordinate information of the 2D skeleton key points, and FC2 outputs score information of the 2D skeleton key points; because the regression result needs to be supervised, a standardized flow module is added for auxiliary training; the normalized flow module can convert some basic simple distribution into arbitrary complex distribution, and the loss function L of the normalized flow module mle The following may be set:
wherein phi is a learnable parameter of the normalized flow model, mu g Is the skeletal key point coordinates of the data,bone key point coordinates predicted for regression module, < +.>Bone key points predicted by the regression module are scored;
in the lightweight neural network algorithm module, the depth separable convolution is mainly divided into two processes, namely channel-by-channel convolution and point-by-point convolution; one convolution kernel of the channel-by-channel convolution is responsible for one channel, one channel is only convolved by one convolution kernel, and the number of the channels of the characteristic map generated in the process is completely consistent with the number of the input channels; the point-by-point convolution uses 1x1 convolution, and the feature images output by the channel-by-channel convolution are weighted and combined in the depth direction to produce a new feature image; the lightweight neural network algorithm module obtains a new feature matrix by adding an SE structure module; when the step length is 1, and the input characteristic matrix and the output characteristic matrix are the same in size, carrying out shortcut connection; the LR-ASPP module is added after the MobileNet V3 trunk model outputs the feature map, the receptive field is increased, the precision of the whole model is improved, the feature map of the input channel is divided into two branches, the left branch outputs the feature map P1 through a convolution kernel of 1x1, the right branch outputs the feature map P2 through a global tie pool layer, a convolution kernel of 1x1 and a Sigmod module, and the feature map P1 and the feature map P2 are multiplied to output a new feature map.
2. The real-time 2D gesture estimation method based on loop architecture and keypoint regression of claim 1, wherein: in the last stage of the lightweight neural network algorithm module using the MobileNet V3 as a lightweight backbone model, replacing all original activation functions and using SiLu activation functions; the activation ak of the kth SiLU of the input Zk is calculated as a by multiplying its input by the sigmoid function k (z k )=z k σ(z k ) Formula 1, wherein the sigmoid function, for larger Zk values, the activation of the SiLU is substantially equal to the function of the ReLU, but the different ReLU, the SiLU activation is not monotonically increasing, but its global minimum is-0.28 for zk≡1.28; the SiLU has the characteristic of self-stabilization, the global minimum value with the derivative of zero plays a role of buffering the weights, the global minimum value serves as an implicit regularizer to inhibit learning of a large number of weights, the model performance is improved in actual tests, and the effect of replacing the SiLU function by all stages is equivalent, so that the model can be used only in the last Stage.
3. The real-time 2D gesture estimation method based on loop architecture and keypoint regression of claim 1, wherein: the training strategy of the model is divided into four stages, namely a stage 1, a stage 2, a stage 3 and a stage 4, wherein the stage 1 can use scattered data sets to perform model training under the condition of no circulating framework module to obtain a proper pre-training model, and the fact that the training strategy of the model is a pre-training model which uses a classified pre-training model of MobileNet V3 as a key point model is found in actual test comparison, but compared with the pre-training model which uses the key point model, the model can bring faster loss shrinkage and improve the model performance to a certain extent for later training of the model.
4. The real-time 2D gesture estimation method based on loop architecture and keypoint regression of claim 3, wherein: stage 2 training 15 frames on video stream data, we set a shorter sequence length t=15 frames, so that the network can be updated quickly; stage 3 increases the T frames to 50 frames, reduces the learning rate to half of the original, and retains the super-parametric training model of stage 1, which allows our model to see longer sequence information and learn the dependency between long sequences.
5. The real-time 2D gesture estimation method based on loop architecture and keypoint regression of claim 3, wherein: stage 4 uses video stream data and sporadic data for integration training a small number of iteration numbers, which we consider as a video sequence of only 1 frame for sporadic data, which can force the model to remain robust even without repeated or continuous information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211675766.0A CN115953839B (en) | 2022-12-26 | 2022-12-26 | Real-time 2D gesture estimation method based on loop architecture and key point regression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211675766.0A CN115953839B (en) | 2022-12-26 | 2022-12-26 | Real-time 2D gesture estimation method based on loop architecture and key point regression |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115953839A CN115953839A (en) | 2023-04-11 |
CN115953839B true CN115953839B (en) | 2024-04-12 |
Family
ID=87296332
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211675766.0A Active CN115953839B (en) | 2022-12-26 | 2022-12-26 | Real-time 2D gesture estimation method based on loop architecture and key point regression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115953839B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191627A (en) * | 2020-01-06 | 2020-05-22 | 浙江工业大学 | Method for improving accuracy of dynamic gesture motion recognition under multiple viewpoints |
CN112507898A (en) * | 2020-12-14 | 2021-03-16 | 重庆邮电大学 | Multi-modal dynamic gesture recognition method based on lightweight 3D residual error network and TCN |
CN113095262A (en) * | 2021-04-21 | 2021-07-09 | 大连理工大学 | Three-dimensional voxel gesture attitude estimation method based on multitask information complementation |
CN114519868A (en) * | 2022-02-22 | 2022-05-20 | 广东新王牌智能信息技术有限公司 | Real-time bone key point identification method and system based on coordinate system regression |
CN114882524A (en) * | 2022-04-15 | 2022-08-09 | 华南理工大学 | Monocular three-dimensional gesture estimation method based on full convolution neural network |
CN114882493A (en) * | 2021-01-22 | 2022-08-09 | 北京航空航天大学 | Three-dimensional hand posture estimation and recognition method based on image sequence |
CN115171149A (en) * | 2022-06-09 | 2022-10-11 | 广州紫为云科技有限公司 | Monocular RGB image regression-based real-time human body 2D/3D bone key point identification method |
WO2022262878A1 (en) * | 2021-06-16 | 2022-12-22 | 华南理工大学 | Ltc-dnn-based visual inertial navigation combined navigation system and self-learning method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111739078B (en) * | 2020-06-15 | 2022-11-18 | 大连理工大学 | Monocular unsupervised depth estimation method based on context attention mechanism |
-
2022
- 2022-12-26 CN CN202211675766.0A patent/CN115953839B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191627A (en) * | 2020-01-06 | 2020-05-22 | 浙江工业大学 | Method for improving accuracy of dynamic gesture motion recognition under multiple viewpoints |
CN112507898A (en) * | 2020-12-14 | 2021-03-16 | 重庆邮电大学 | Multi-modal dynamic gesture recognition method based on lightweight 3D residual error network and TCN |
CN114882493A (en) * | 2021-01-22 | 2022-08-09 | 北京航空航天大学 | Three-dimensional hand posture estimation and recognition method based on image sequence |
CN113095262A (en) * | 2021-04-21 | 2021-07-09 | 大连理工大学 | Three-dimensional voxel gesture attitude estimation method based on multitask information complementation |
WO2022262878A1 (en) * | 2021-06-16 | 2022-12-22 | 华南理工大学 | Ltc-dnn-based visual inertial navigation combined navigation system and self-learning method |
CN114519868A (en) * | 2022-02-22 | 2022-05-20 | 广东新王牌智能信息技术有限公司 | Real-time bone key point identification method and system based on coordinate system regression |
CN114882524A (en) * | 2022-04-15 | 2022-08-09 | 华南理工大学 | Monocular three-dimensional gesture estimation method based on full convolution neural network |
CN115171149A (en) * | 2022-06-09 | 2022-10-11 | 广州紫为云科技有限公司 | Monocular RGB image regression-based real-time human body 2D/3D bone key point identification method |
Non-Patent Citations (2)
Title |
---|
周全 ; 甘屹 ; 何伟铭 ; 孙福佳 ; 杨丽红 ; .基于LHPN算法的手势姿态估计方法研究.软件.2020,(第07期),全文. * |
路昊 ; 石敏 ; 李昊 ; 朱登明 ; .基于深度学习的动态场景相机姿态估计方法.高技术通讯.2020,(第01期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN115953839A (en) | 2023-04-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qin et al. | U2-Net: Going deeper with nested U-structure for salient object detection | |
CN111079646B (en) | Weak supervision video time sequence action positioning method and system based on deep learning | |
CN108133188B (en) | Behavior identification method based on motion history image and convolutional neural network | |
CN111507993A (en) | Image segmentation method and device based on generation countermeasure network and storage medium | |
Geng et al. | Combining CNN and MRF for road detection | |
CN113313173B (en) | Human body analysis method based on graph representation and improved transducer | |
Ma et al. | Meta PID attention network for flexible and efficient real-world noisy image denoising | |
CN117499658A (en) | Generating video frames using neural networks | |
CN111696110A (en) | Scene segmentation method and system | |
Dai et al. | Mdrnet: A lightweight network for real-time semantic segmentation in street scenes | |
CN116704431A (en) | On-line monitoring system and method for water pollution | |
CN115423739A (en) | SimpleBaseline-based method for detecting key points of teleoperation mechanical arm | |
Yan et al. | RoboSeg: Real-time semantic segmentation on computationally constrained robots | |
Zhang et al. | Dnanet: De-normalized attention based multi-resolution network for human pose estimation | |
Yi et al. | Elanet: effective lightweight attention-guided network for real-time semantic segmentation | |
Ben Mahjoub et al. | An efficient end-to-end deep learning architecture for activity classification | |
CN114359554A (en) | Image semantic segmentation method based on multi-receptive-field context semantic information | |
US10963775B2 (en) | Neural network device and method of operating neural network device | |
Wang et al. | Bilateral attention network for semantic segmentation | |
CN115953839B (en) | Real-time 2D gesture estimation method based on loop architecture and key point regression | |
Hua et al. | Dynamic scene deblurring with continuous cross-layer attention transmission | |
CN112819044A (en) | Method for training neural network for target operation task compensation of target object | |
CN116468894A (en) | Distance self-adaptive mask generation method for supervised learning of lithium battery pole piece | |
Run-Hua et al. | SCAM-YOLOv5: Improved YOLOv5 based on spatial and channel attention module | |
Rueckauer et al. | Contraction of dynamically masked deep neural networks for efficient video processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |