CN115953839B - Real-time 2D gesture estimation method based on loop architecture and key point regression - Google Patents

Real-time 2D gesture estimation method based on loop architecture and key point regression Download PDF

Info

Publication number
CN115953839B
CN115953839B CN202211675766.0A CN202211675766A CN115953839B CN 115953839 B CN115953839 B CN 115953839B CN 202211675766 A CN202211675766 A CN 202211675766A CN 115953839 B CN115953839 B CN 115953839B
Authority
CN
China
Prior art keywords
module
model
feature map
regression
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211675766.0A
Other languages
Chinese (zh)
Other versions
CN115953839A (en
Inventor
李观喜
张磊
梁倬华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ziweiyun Technology Co ltd
Original Assignee
Guangzhou Ziweiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ziweiyun Technology Co ltd filed Critical Guangzhou Ziweiyun Technology Co ltd
Priority to CN202211675766.0A priority Critical patent/CN115953839B/en
Publication of CN115953839A publication Critical patent/CN115953839A/en
Application granted granted Critical
Publication of CN115953839B publication Critical patent/CN115953839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a real-time 2D gesture estimation method based on a circulating architecture and coordinate system regression, which belongs to the technical field of real-time 2D gesture estimation, and a core module of the method comprises an image acquisition module, a lightweight neural network algorithm module, a circulating architecture module and a key point regression module; the coordinate system regression method has the advantages that the algorithm consumes short time and less resources, and the real-time and end-to-end full differential training can be realized on a mobile end, embedded or low-cost hardware cost platform; using a cyclic architecture module to enhance the effect of the model on dynamic gesture estimation in the video; the real-time 2D gesture estimation method based on the loop architecture and the coordinate system regression can realize the real-time and high-precision detection effect of the mobile terminal, embedded hardware or low-cost hardware, can effectively relieve the problem of model detection performance reduction caused by motion blurring and self-shielding in a video, and realizes the quick landing of products.

Description

Real-time 2D gesture estimation method based on loop architecture and key point regression
Technical Field
The invention belongs to the technical field of real-time 2D gesture estimation, and particularly relates to a real-time 2D gesture estimation method based on a loop architecture and key point regression.
Background
The 2D gesture estimation technology mainly detects 21 key points of the hand, and can describe information expressed by different gestures through the key points; hand 2D keypoint detection is one of the basic algorithms of computer vision and plays an important role in research in other related fields of computer vision. At present, the main hardware carrier equipment of the metauniverse is AR\VR equipment and the like, images can be acquired through a camera, and corresponding feedback is obtained through analyzing information expressed by gestures of a user.
2D gesture estimation is in fact a very challenging task with respect to body pose estimation. The effect of 2D gesture estimation may be reduced because the hand joints are more flexible, motion sensitive and affected by self occlusion. At present, a method based on a Gaussian heat map is the main flow direction of the technology and has the same recognition effect, in the industrial Internet age, the combination of embedded type and artificial intelligence is a necessary development trend, so that the 2D gesture estimation algorithm of the method based on the Gaussian heat map can not reach the real-time effect usually when the method is operated on a mobile terminal, an embedded type or a low-cost hardware platform, and the detection effect of the method is not satisfactory for motion blurring caused by dynamic gestures and self-shielding problems; because of the problems of high memory consumption and low reasoning speed of the gaussian heat diagram-based method, great delay occurs when the technology is to be operated in low-cost hardware, and discomfort is often brought to the experience of the whole product.
Therefore, it is necessary to invent a real-time 2D gesture estimation method based on a loop architecture and key point regression.
Disclosure of Invention
In order to solve the above-mentioned problems, the present invention provides a real-time 2D gesture estimation method based on a loop architecture and key point regression to solve the above-mentioned problems. The core module of the real-time 2D gesture estimation method based on the circulating architecture and the key point regression comprises an image acquisition module, a lightweight neural network algorithm module, a circulating architecture module and a key point regression module, wherein the image acquisition module is a monocular camera;
the lightweight neural network algorithm module adopts MobileNet V3 as a lightweight backbone model to extract characteristics, and consists of a plurality of stages, wherein a plurality of groups of deep separable convolutions are formed;
the cyclic architecture module acquires characteristic information through a MobileNet V3 backbone network and passes through a cyclic architecture module; the circulation mechanism can learn which information should be reserved in the continuous video stream by itself, and the long-term and short-term time information capability is reserved while self-adapting, so that the circulation mechanism is suitable for our requirements;
the key point regression module outputs the obtained feature map through the circulation architecture module as the input of the key point regression module, and respectively passes through 2 FC layers; FC1 outputs coordinate information of the 2D skeleton key points, and FC2 outputs score information of the 2D skeleton key points; the regression results need to be supervised, so that a standardized flow module is added for auxiliary training.
Preferably, in the lightweight neural network algorithm module, the depth separable convolution is mainly divided into two processes, namely channel-by-channel convolution and point-by-point convolution; one convolution kernel of the channel-by-channel convolution is responsible for one channel, one channel is only convolved by one convolution kernel, and the number of the channels of the characteristic map generated in the process is completely consistent with the number of the input channels; the point-by-point convolution uses 1x1 convolution, and the feature images output by the channel-by-channel convolution are weighted and combined in the depth direction to produce a new feature image;
preferably, adding an SE structure module to obtain a new feature matrix; when the step length is 1, and the input characteristic matrix and the output characteristic matrix are the same in size, carrying out shortcut connection; after the MobileNet V3 trunk model outputs the feature map, an LR-ASPP module is added to increase the receptive field, the accuracy of the whole model is improved, the feature map of the input channel is divided into two branches, and the left branch outputs a feature map P through a convolution kernel of 1x1 1 The right branch outputs a characteristic diagram P after passing through a global tie pool layer, a 1x1 convolution kernel and a Sigmod module 2 And for the characteristic diagram P 1 And feature map P 2 After multiplication, a new feature map is output;
preferably, in the last stage of the model, the SiLu activation function is used instead of all the original activation functions; input Z k Activation a of kth SiLU of (2) k The a is calculated by multiplying the sigmoid function by its input k (z k )=z k σ(z k ) Equation 1, in which the sigmoid function, is for a larger Z k The value, siLU activation is substantially equal to the function of ReLU, but different ReLUs, siLU activation is not monotonically increasing, but instead for Z k Approximately 1.28, its global minimum is-0.28; the SiLU has the characteristic of self-stabilization, the global minimum value with the derivative of zero plays a role of buffering the weights, the global minimum value serves as an implicit regularizer to inhibit learning of a large number of weights, the model performance is improved in actual tests, and the effect of replacing the SiLU function by all stages is equivalent, so that the model can be used only in the last Stage.
Preferably, in the loop architecture module, when the feature map is input into the loop architecture module, the channel for inputting the feature map is equally divided into feature maps P 3 And feature map P 4 For the characteristic map P 4 Output profile P output by ConvGRU 5 And memory cell feature map h t Splice feature map P 3 And feature map P 4 Output of the feature map P 6
Preferably, in the key point regression module, the normalized flow module can convert some basic simple distributions into arbitrary complex distributions, and theoretically, if the transformation is sufficiently complex, then arbitrary target distributions can be fitted; in the practical training process, a neural network is used, so that the neural network can be close to any function in theory, and therefore, a series of complex transformations in a standardized flow model can be realized by superposition on an FC layer; in the model training process, the regression module fits the output value of simple distribution, and the normalized flow module transforms the fitted result value to enable the transformed result to be closer to the distribution P of the target.
Preferably, the training is divided into four stages, namely stage 1, stage 2, stage 3 and stage 4, and stage 1 can use scattered data sets to perform model training under the condition of no cycle architecture module to obtain a proper pre-training model, and in actual test comparison, it is found that although the classified pre-training model of MobileNet V3 is used as the pre-training model of the key point model, compared with the pre-training model of the key point model, the model can have faster loss shrinkage and a certain improvement on model performance in later training.
Preferably, stage 2 performs a training of 15 frames on the video stream data, we set a shorter sequence length t=15 frames, so that the network can be updated quickly; stage 3 increases the T frames to 50 frames, reduces the learning rate to half of the original, and retains the super-parametric training model of stage 1, which allows our model to see longer sequence information and learn the dependency between long sequences.
Preferably, stage 4 uses video stream data and sporadic data for integration training a small number of iterations, for sporadic data we consider it as a video sequence of only 1 frame, which can force the model to remain robust even without repeated or continuous information.
Wherein robustness is a strong and robust meaning; it is also the ability of the system to survive in abnormal and dangerous situations; for example, the robustness of the computer software can be realized if the computer software is not dead or crashed under the conditions of input errors, disk faults, network overload or intentional attack; robustness also refers to the characteristic of the control system to maintain certain other properties under perturbation of certain parameters.
Compared with the prior art, the invention has the following beneficial effects:
in the invention, firstly, the 2D gesture estimation method based on the key point regression has the advantages of short algorithm consumption time and less resources, and can realize real-time operation and full differential training from end to end on a mobile end, embedded or low-cost hardware cost platform. The Gaussian heat map-based method is not an end-to-end differentiable model from image input to coordinate regression, the Gaussian heat map to coordinate points need to be obtained in an argmax mode, and the process is not conductive; however, the position information is converted into the coordinate value based on the result of full convolution in the coordinate regression mode, and for the dim information conversion, the nonlinearity is very strong, and the model is not easy to converge in training, so that the problem is solved by using the standardized flow module, and the effect of rapidness and high precision at the embedded end is realized;
for motion blur and self-occlusion problems with dynamic gestures in video, although many are designed for video applications, a single frame is treated as an independent image, but the most widely existing temporal information in video is ignored; therefore, a loop architecture module is used for enhancing the effect of the model on dynamic gesture estimation in the video. Because in the video, the model can know the previous frame and predict the current frame, and under the condition that a single frame is possibly blurred, the model can refer to better prediction key points of the previous frame, so that the definition of the model is greatly improved; the method can be applied to all videos without any auxiliary input; according to the model training strategy set by the user, a high-precision model can be effectively generated; the problems of motion blurring and self-shielding caused by dynamic gestures can be solved to a great extent;
therefore, the real-time 2D gesture estimation method based on the loop architecture and the key point regression can realize the real-time and high-precision detection effect of mobile terminal, embedded or low-cost hardware, can effectively alleviate the problem of model detection performance reduction caused by motion blurring and self-shielding in video, and realizes quick landing of products.
Drawings
Fig. 1 is a block diagram of the module of the present invention.
FIG. 2 is a block flow diagram of a lightweight neural network algorithm module of the present invention.
FIG. 3 is a block flow diagram of the overall model of the present invention.
Fig. 4 is a block flow diagram of the training strategy of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
examples:
as shown in fig. 1 to 4
The invention provides a real-time 2D gesture estimation method based on a circulating architecture and key point regression, wherein a core module comprises an image acquisition module, a lightweight neural network algorithm module, a circulating architecture module and a key point regression module, and the image acquisition module is a monocular camera; the lightweight neural network algorithm module adopts MobileNetV3 as a lightweight backbone model to extract characteristics, and consists of a plurality of stages, wherein a plurality of groups of deep separable convolutions are formed in the lightweight neural network algorithm module; the cyclic architecture module acquires characteristic information through a MobileNet V3 backbone network and passes through a cyclic architecture module; the circulation mechanism can learn which information should be reserved in the continuous video stream by itself, and the long-term and short-term time information capability is reserved while self-adapting, so that the circulation mechanism is suitable for our requirements; the key point regression module outputs the obtained feature map through the circulation architecture module as the input of the key point regression module, and respectively passes through 2 FC layers; FC1 outputs coordinate information of the 2D skeleton key points, and FC2 outputs score information of the 2D skeleton key points; the regression results need to be supervised, so that a standardized flow module is added for auxiliary training.
In the embodiment, in the lightweight neural network algorithm module, the depth separable convolution is mainly divided into two processes, namely channel-by-channel convolution and point-by-point convolution; one convolution kernel of the channel-by-channel convolution is responsible for one channel, one channel is only convolved by one convolution kernel, and the number of the channels of the characteristic map generated in the process is completely consistent with the number of the input channels; the point-by-point convolution uses 1x1 convolution, and the feature images output by the channel-by-channel convolution are weighted and combined in the depth direction to produce a new feature image; adding an SE structure module to obtain a new feature matrix; when the step length is 1, and the input characteristic matrix and the output characteristic matrix are the same in size, carrying out shortcut connection; after the MobileNet V3 trunk model outputs the feature map, an LR-ASPP module is added to increase the receptive field, the accuracy of the whole model is improved, the feature map of the input channel is divided into two branches, and the left branch outputs a feature map P through a convolution kernel of 1x1 1 The right branch outputs a characteristic diagram P after passing through a global tie pool layer, a 1x1 convolution kernel and a Sigmod module 2 And for the characteristic diagram P 1 And feature map P 2 After multiplication, a new feature map is output;
in the last stage of the model, replacing all original activation functions, and using SiLu activation functions; input Z k Activation a of kth SiLU of (2) k The a is calculated by multiplying the sigmoid function by its input k (z k )=z k σ(z k ) Equation 1, in which the sigmoid function, is for a larger Z k The value, siLU activation is substantially equal to the function of ReLU, but different ReLUs, siLU activation is not monotonically increasing, but instead for Z k Approximately 1.28, its global minimum is-0.28; the SiLU has the self-stabilization characteristic, the global minimum value with the derivative of zero plays a role in buffering weights, the global minimum value serves as an implicit regularizer to inhibit learning of a large number of weights, the model performance is improved in actual tests, and the effect of replacing SiLU functions by all stages is equivalent, so that the model can be obtained by only using the last Stage:
in this embodiment, in the cyclic architecture module, when the feature map is input into the cyclic architecture module, the channel for inputting the feature map is equally divided into feature maps P 3 And feature map P 4 For the characteristic map P 4 Output profile P output by ConvGRU 5 And memory cell feature map h t Splice feature map P 3 And feature map P 4 Output of the feature map P 6 The method comprises the steps of carrying out a first treatment on the surface of the Formally convglu is defined as follows:
z t =σ(w zx *x t +w zh *h t-1 +b z )
r t =σ(w rx *x t +w rh *h t-1 +b r )
wherein is sum. Representing the product of the convolution and the corresponding location element, and tanh and sigma represent hyperbolic and Sigmod functions; w and b are convolution kernels and bias terms.Hidden layer h t H as output and as cycle state for next time t-1 The method comprises the steps of carrying out a first treatment on the surface of the Initial cycle state h 0 Is an all zero tensor.
In the key point regression module, the standardized flow module can convert some basic simple distribution into arbitrary complex distribution, and theoretically, the transformation is enough complex, so that arbitrary target distribution can be fitted; in the practical training process, a neural network is used, so that the neural network can be close to any function in theory, and therefore, a series of complex transformations in a standardized flow model can be realized by superposition on an FC layer; in the model training process, the regression module fits the output value of simple distribution, and the normalized flow module transforms the fitted result value to enable the transformed result to be closer to the distribution P of the target; then the loss function L of the normalized stream module mle Can be set as follows
Wherein phi is a learnable parameter of the normalized flow model, mu g Is the skeletal key point coordinates of the data,bone key point coordinates predicted for regression module, < +.>Skeletal keypoint scores predicted for the regression module.
In this embodiment, the training is divided into four phases, namely phase 1, phase 2, phase 3 and phase 4, where the phase 1 can use scattered data sets to perform model training under the condition of no cycle architecture module to obtain a suitable pre-training model, and in actual test comparison, it is found that although the classified pre-training model of MobileNetV3 is used as the pre-training model of the key point model, compared with the pre-training model of the key point model, the model can bring faster loss shrinkage and improve the model performance to some extent for later training of the model; stage 2 training 15 frames on video stream data, we set a shorter sequence length t=15 frames, so that the network can be updated quickly; stage 3, increasing the T frame to 50 frames, reducing the learning rate to half of the original one, and reserving the super-parameter training model of stage 1, so that the model can see longer sequence information and learn the dependency relationship among long sequences; stage 4 uses video stream data and sporadic data for integration training a small number of iteration numbers, which we consider as a video sequence of only 1 frame for sporadic data, which can force the model to remain robust even without repeated or continuous information.
In the invention, firstly, the 2D gesture estimation method based on the key point regression has the advantages that the algorithm consumes short time and less resources, and can realize real-time operation and full differential training from end to end on a mobile end, embedded or low-cost hardware cost platform; the Gaussian heat map-based method is not an end-to-end differentiable model from image input to coordinate regression, the Gaussian heat map to coordinate points need to be obtained in an argmax mode, and the process is not conductive; however, the position information is converted into the coordinate value based on the result of full convolution in the coordinate regression mode, and for the dim information conversion, the nonlinearity is very strong, and the model is not easy to converge in training, so that the problem is solved by using the standardized flow module, and the effect of rapidness and high precision at the embedded end is realized;
for motion blur and self-occlusion problems with dynamic gestures in video, although many are designed for video applications, a single frame is treated as an independent image, but the most widely existing temporal information in video is ignored; therefore, a loop architecture module is used for enhancing the effect of the model on dynamic gesture estimation in the video. Because in the video, the model can know the previous frame and predict the current frame, and under the condition that a single frame is possibly blurred, the model can refer to better prediction key points of the previous frame, so that the definition of the model is greatly improved. The method can be applied to all videos without any auxiliary input; according to the model training strategy set by the user, a high-precision model can be effectively generated; the problems of motion blurring and self-shielding caused by dynamic gestures can be solved to a great extent;
therefore, the real-time 2D gesture estimation method based on the loop architecture and the key point regression can realize the real-time and high-precision detection effect of mobile terminal, embedded or low-cost hardware, can effectively alleviate the problem of model detection performance reduction caused by motion blurring and self-shielding in video, and realizes quick landing of products.
By utilizing the technical scheme of the invention or under the inspired by the technical scheme of the invention, a similar technical scheme is designed by a person skilled in the art, so that the technical effects are achieved, and the technical scheme falls into the protection scope of the invention.

Claims (5)

1. A real-time 2D gesture estimation method based on a loop architecture and key point regression is characterized by comprising the following steps: the core module comprises an image acquisition module, a lightweight neural network algorithm module, a circulation architecture module and a key point regression module, wherein the image acquisition module is a monocular camera;
the lightweight neural network algorithm module adopts MobileNetV3 as a lightweight backbone model to extract characteristics, and consists of a plurality of stages, wherein a plurality of groups of deep separable convolutions are formed;
the cyclic architecture module acquires characteristic information through a MobileNet V3 backbone network and passes through a cyclic architecture module; the circulation mechanism can learn which information should be reserved in the continuous video stream by itself, and the long-term and short-term time information capability is reserved while self-adapting, so that the circulation mechanism is suitable for our requirements; in the cycle architecture module, when the feature map is input into the cycle architecture module, a channel for inputting the feature map is equally divided into a feature map P3 and a feature map P4, and the feature map P4 is spliced with an output feature map P5 and a memory unit feature map ht which are output through ConvGRU, so as to output a feature map P6;
the key point regression module is output by the circulation architecture module to obtainThe feature map of (2) is used as the input of the key point regression module and respectively passes through 2 FC layers; FC1 outputs coordinate information of the 2D skeleton key points, and FC2 outputs score information of the 2D skeleton key points; because the regression result needs to be supervised, a standardized flow module is added for auxiliary training; the normalized flow module can convert some basic simple distribution into arbitrary complex distribution, and the loss function L of the normalized flow module mle The following may be set:
wherein phi is a learnable parameter of the normalized flow model, mu g Is the skeletal key point coordinates of the data,bone key point coordinates predicted for regression module, < +.>Bone key points predicted by the regression module are scored;
in the lightweight neural network algorithm module, the depth separable convolution is mainly divided into two processes, namely channel-by-channel convolution and point-by-point convolution; one convolution kernel of the channel-by-channel convolution is responsible for one channel, one channel is only convolved by one convolution kernel, and the number of the channels of the characteristic map generated in the process is completely consistent with the number of the input channels; the point-by-point convolution uses 1x1 convolution, and the feature images output by the channel-by-channel convolution are weighted and combined in the depth direction to produce a new feature image; the lightweight neural network algorithm module obtains a new feature matrix by adding an SE structure module; when the step length is 1, and the input characteristic matrix and the output characteristic matrix are the same in size, carrying out shortcut connection; the LR-ASPP module is added after the MobileNet V3 trunk model outputs the feature map, the receptive field is increased, the precision of the whole model is improved, the feature map of the input channel is divided into two branches, the left branch outputs the feature map P1 through a convolution kernel of 1x1, the right branch outputs the feature map P2 through a global tie pool layer, a convolution kernel of 1x1 and a Sigmod module, and the feature map P1 and the feature map P2 are multiplied to output a new feature map.
2. The real-time 2D gesture estimation method based on loop architecture and keypoint regression of claim 1, wherein: in the last stage of the lightweight neural network algorithm module using the MobileNet V3 as a lightweight backbone model, replacing all original activation functions and using SiLu activation functions; the activation ak of the kth SiLU of the input Zk is calculated as a by multiplying its input by the sigmoid function k (z k )=z k σ(z k ) Formula 1, wherein the sigmoid function, for larger Zk values, the activation of the SiLU is substantially equal to the function of the ReLU, but the different ReLU, the SiLU activation is not monotonically increasing, but its global minimum is-0.28 for zk≡1.28; the SiLU has the characteristic of self-stabilization, the global minimum value with the derivative of zero plays a role of buffering the weights, the global minimum value serves as an implicit regularizer to inhibit learning of a large number of weights, the model performance is improved in actual tests, and the effect of replacing the SiLU function by all stages is equivalent, so that the model can be used only in the last Stage.
3. The real-time 2D gesture estimation method based on loop architecture and keypoint regression of claim 1, wherein: the training strategy of the model is divided into four stages, namely a stage 1, a stage 2, a stage 3 and a stage 4, wherein the stage 1 can use scattered data sets to perform model training under the condition of no circulating framework module to obtain a proper pre-training model, and the fact that the training strategy of the model is a pre-training model which uses a classified pre-training model of MobileNet V3 as a key point model is found in actual test comparison, but compared with the pre-training model which uses the key point model, the model can bring faster loss shrinkage and improve the model performance to a certain extent for later training of the model.
4. The real-time 2D gesture estimation method based on loop architecture and keypoint regression of claim 3, wherein: stage 2 training 15 frames on video stream data, we set a shorter sequence length t=15 frames, so that the network can be updated quickly; stage 3 increases the T frames to 50 frames, reduces the learning rate to half of the original, and retains the super-parametric training model of stage 1, which allows our model to see longer sequence information and learn the dependency between long sequences.
5. The real-time 2D gesture estimation method based on loop architecture and keypoint regression of claim 3, wherein: stage 4 uses video stream data and sporadic data for integration training a small number of iteration numbers, which we consider as a video sequence of only 1 frame for sporadic data, which can force the model to remain robust even without repeated or continuous information.
CN202211675766.0A 2022-12-26 2022-12-26 Real-time 2D gesture estimation method based on loop architecture and key point regression Active CN115953839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211675766.0A CN115953839B (en) 2022-12-26 2022-12-26 Real-time 2D gesture estimation method based on loop architecture and key point regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211675766.0A CN115953839B (en) 2022-12-26 2022-12-26 Real-time 2D gesture estimation method based on loop architecture and key point regression

Publications (2)

Publication Number Publication Date
CN115953839A CN115953839A (en) 2023-04-11
CN115953839B true CN115953839B (en) 2024-04-12

Family

ID=87296332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211675766.0A Active CN115953839B (en) 2022-12-26 2022-12-26 Real-time 2D gesture estimation method based on loop architecture and key point regression

Country Status (1)

Country Link
CN (1) CN115953839B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191627A (en) * 2020-01-06 2020-05-22 浙江工业大学 Method for improving accuracy of dynamic gesture motion recognition under multiple viewpoints
CN112507898A (en) * 2020-12-14 2021-03-16 重庆邮电大学 Multi-modal dynamic gesture recognition method based on lightweight 3D residual error network and TCN
CN113095262A (en) * 2021-04-21 2021-07-09 大连理工大学 Three-dimensional voxel gesture attitude estimation method based on multitask information complementation
CN114519868A (en) * 2022-02-22 2022-05-20 广东新王牌智能信息技术有限公司 Real-time bone key point identification method and system based on coordinate system regression
CN114882524A (en) * 2022-04-15 2022-08-09 华南理工大学 Monocular three-dimensional gesture estimation method based on full convolution neural network
CN114882493A (en) * 2021-01-22 2022-08-09 北京航空航天大学 Three-dimensional hand posture estimation and recognition method based on image sequence
CN115171149A (en) * 2022-06-09 2022-10-11 广州紫为云科技有限公司 Monocular RGB image regression-based real-time human body 2D/3D bone key point identification method
WO2022262878A1 (en) * 2021-06-16 2022-12-22 华南理工大学 Ltc-dnn-based visual inertial navigation combined navigation system and self-learning method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739078B (en) * 2020-06-15 2022-11-18 大连理工大学 Monocular unsupervised depth estimation method based on context attention mechanism

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191627A (en) * 2020-01-06 2020-05-22 浙江工业大学 Method for improving accuracy of dynamic gesture motion recognition under multiple viewpoints
CN112507898A (en) * 2020-12-14 2021-03-16 重庆邮电大学 Multi-modal dynamic gesture recognition method based on lightweight 3D residual error network and TCN
CN114882493A (en) * 2021-01-22 2022-08-09 北京航空航天大学 Three-dimensional hand posture estimation and recognition method based on image sequence
CN113095262A (en) * 2021-04-21 2021-07-09 大连理工大学 Three-dimensional voxel gesture attitude estimation method based on multitask information complementation
WO2022262878A1 (en) * 2021-06-16 2022-12-22 华南理工大学 Ltc-dnn-based visual inertial navigation combined navigation system and self-learning method
CN114519868A (en) * 2022-02-22 2022-05-20 广东新王牌智能信息技术有限公司 Real-time bone key point identification method and system based on coordinate system regression
CN114882524A (en) * 2022-04-15 2022-08-09 华南理工大学 Monocular three-dimensional gesture estimation method based on full convolution neural network
CN115171149A (en) * 2022-06-09 2022-10-11 广州紫为云科技有限公司 Monocular RGB image regression-based real-time human body 2D/3D bone key point identification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周全 ; 甘屹 ; 何伟铭 ; 孙福佳 ; 杨丽红 ; .基于LHPN算法的手势姿态估计方法研究.软件.2020,(第07期),全文. *
路昊 ; 石敏 ; 李昊 ; 朱登明 ; .基于深度学习的动态场景相机姿态估计方法.高技术通讯.2020,(第01期),全文. *

Also Published As

Publication number Publication date
CN115953839A (en) 2023-04-11

Similar Documents

Publication Publication Date Title
Qin et al. U2-Net: Going deeper with nested U-structure for salient object detection
CN111079646B (en) Weak supervision video time sequence action positioning method and system based on deep learning
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN111507993A (en) Image segmentation method and device based on generation countermeasure network and storage medium
Geng et al. Combining CNN and MRF for road detection
CN113313173B (en) Human body analysis method based on graph representation and improved transducer
Ma et al. Meta PID attention network for flexible and efficient real-world noisy image denoising
CN117499658A (en) Generating video frames using neural networks
CN111696110A (en) Scene segmentation method and system
Dai et al. Mdrnet: A lightweight network for real-time semantic segmentation in street scenes
CN116704431A (en) On-line monitoring system and method for water pollution
CN115423739A (en) SimpleBaseline-based method for detecting key points of teleoperation mechanical arm
Yan et al. RoboSeg: Real-time semantic segmentation on computationally constrained robots
Zhang et al. Dnanet: De-normalized attention based multi-resolution network for human pose estimation
Yi et al. Elanet: effective lightweight attention-guided network for real-time semantic segmentation
Ben Mahjoub et al. An efficient end-to-end deep learning architecture for activity classification
CN114359554A (en) Image semantic segmentation method based on multi-receptive-field context semantic information
US10963775B2 (en) Neural network device and method of operating neural network device
Wang et al. Bilateral attention network for semantic segmentation
CN115953839B (en) Real-time 2D gesture estimation method based on loop architecture and key point regression
Hua et al. Dynamic scene deblurring with continuous cross-layer attention transmission
CN112819044A (en) Method for training neural network for target operation task compensation of target object
CN116468894A (en) Distance self-adaptive mask generation method for supervised learning of lithium battery pole piece
Run-Hua et al. SCAM-YOLOv5: Improved YOLOv5 based on spatial and channel attention module
Rueckauer et al. Contraction of dynamically masked deep neural networks for efficient video processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant