CN107657233A - Static sign language real-time identification method based on modified single multi-target detection device - Google Patents

Static sign language real-time identification method based on modified single multi-target detection device Download PDF

Info

Publication number
CN107657233A
CN107657233A CN201710899126.0A CN201710899126A CN107657233A CN 107657233 A CN107657233 A CN 107657233A CN 201710899126 A CN201710899126 A CN 201710899126A CN 107657233 A CN107657233 A CN 107657233A
Authority
CN
China
Prior art keywords
sign language
network
image
static sign
detection device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710899126.0A
Other languages
Chinese (zh)
Inventor
张勋
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
National Dong Hwa University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN201710899126.0A priority Critical patent/CN107657233A/en
Publication of CN107657233A publication Critical patent/CN107657233A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of static sign language real-time identification method based on modified single multi-target detection device, comprise the following steps:Static sign language sample image is pre-processed;Build and strengthen static sign language image data set;The deep learning network based on improved single multi-target detection device is built, the deep learning network is divided into facilities network network layers and extra convolution characteristic layer;Wherein, facilities network network layers are used for feature extraction, and the image of input is converted into the character representation of various dimensions;Extra convolutional layer is a kind of feature selecting strategy, is offset with small convolution filter come the classification fraction of one group of default boundary frame fixed on predicted characteristics figure and position, while the prediction of different scale is produced from the characteristic pattern of different scale;This network is trained using static sign language data set, camera is gathered to the network that sign language video input trains in real time, realizes static sign language Real time identification.The present invention substantially increases recognition speed while accuracy of identification is ensured.

Description

Static sign language real-time identification method based on modified single multi-target detection device
Technical field
The present invention relates to Sign Language Recognition technical field, more particularly to a kind of based on modified single multi-target detection device Static sign language real-time identification method.
Background technology
Sign language is that deaf-mute uses gesture a kind of effective means exchanged instead of normal speech.Research Sign Language Recognition can be helped Help deaf-mute, do not obtain the exchange between the deaf-mute of good education, at the same can also help deaf-mute with it is normal Exchange between people;Sign Language Recognition is also a kind of convenient mode of man-machine interaction, and research Sign Language Recognition can promote mechanical intelligence The development of the other fields such as running, the operation of mobile device terminal, gate control system, remote control;Further, sign language is studied to know Can not understanding of the secondary computer to human language.
It is to enter row information input using a common camera to utilize computer based on the Sign Language Recognition of monocular vision identification Algorithm is identified, relative to based on the digital devices such as sensor input information again by computer know method for distinguishing, its for The requirement of equipment is low, instruction is convenient, low advantage of injecting capital into, more and more interested to researchers.In Sign Language Recognition field, One traditional complete recognition methods normally comprises three processes:Segmentation, feature extraction, identification.1) split, common method is Model based on movable information, the model based on Motion mask, model based on Skin Color Information etc.;2) feature extraction, common side Method is the feature extracting method based on histograms of oriented gradients (HOG), the feature extraction based on local binary patterns texture (LBP) Method, method based on convolutional neural networks (CNN) feature extraction etc.;3) gesture identification, common methods have based on artificial neuron The multilayer perceptron (MLP) of network, SVMs (SVM) based on supervised learning model etc..
Although static Sign Language Recognition technology has correlative study person to be studied very early, do not united in face of human hand skeleton First, the characteristics of hand-type is changeable, sign language vocabulary amount is big, its characteristic information be difficult flexibly to obtain, and hand-designed language is retouched State that the process of sign language feature is cumbersome, and the characteristic information of profound level can not be excavated, this result in model plasticity it is poor, it is difficult in base Reach in the Sign Language Recognition of vision real-time it is good, identification accurately require.
Deep learning (Deep Learning) method just solves above pain spot.Deep learning model is considered as one The disruptive technology in machine learning field, realized by the combination of multilayered nonlinear have supervision and unsupervised feature extraction and Conversion, to reach the purpose of pattern analysis and classification.The researcher of substantial amounts of scientific research institutions and enterprise is to depth learning technology And its application conducts extensive research, and significant effect is achieved in fields such as voice, images.The deeper knot of network layer Structure can learn to more more complicated features, and these abstract expressions can be with the more flexible change for more accurately describing image.
In order to meet that the real-time of detection is good and purpose that accuracy rate is high, researcher do various effort.Ross B.Girshick et al. proposes region convolutional neural networks (R-CNN), some candidate regions volume that this method generates to image Product neutral net carry out feature extraction, after classified to obtain border by grader, target detection problems are converted into classification Problem, although being broken through on target detection problems, feature extraction network is respectively trained and sorter network is quite time-consuming, it is real When property cannot be guaranteed.Ross B.Girshick are improved R-CNN networks, and feature extraction and classification are merged For a network, using search property algorithm, fast area convolutional neural networks (Fast R-CNN) are delivered, further increase mould Type training speed and Detection accuracy.Later, Ross B.Girshick suggested network (RPN) to optimize candidate regions using region The generation in domain, further improves speed, has delivered acceleration region convolutional neural networks (Faster R-CNN).
Above method turns into the milestone in detection identification field, although accuracy rate is prettyr good, these methods are for embedding Amount of calculation is excessive for embedded system, too slow for real-time or near real-time application even for high-end hardware, or Person is to sacrifice accuracy of detection to exchange the time for.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of static hand based on modified single multi-target detection device Language real-time identification method, recognition speed is substantially increased while accuracy of identification is ensured, meets requirement of real-time.
The technical solution adopted for the present invention to solve the technical problems is:There is provided a kind of based on the inspection of modified single multiple target The static sign language real-time identification method of device is surveyed, is comprised the following steps:
(1) static sign language sample image is pre-processed;
(2) build and strengthen static sign language image data set;
(3) the deep learning network based on improved single multi-target detection device is built, the deep learning network is divided into base Plinth Internet and extra convolution characteristic layer;Wherein, facilities network network layers are used for feature extraction, and the image of input is converted into various dimensions Character representation;Extra convolutional layer is a kind of feature selecting strategy, with small convolution filter come one of fixation on predicted characteristics figure The classification fraction of group default boundary frame and position skew, while the prediction of the characteristic pattern generation different scale from different scale;
(4) this network is trained using static sign language data set, camera is gathered to what sign language video input trained in real time Network, realize static sign language Real time identification.
The step (1) is specially:Static sign language video is recorded, and it is image that video is taken out into frame, and it is tight to remove smear manually The image of weight and serious shielding, and enhancing processing is carried out using the method for high-pass filtering to image.
The static sign language data set of structure includes original sample image and original sample image is carried out in the step (2) Label image after mark by hand, image tagged frame and the original image of markup information record correspond;Using to original graph As doing the mode of minute surface symmetrical treatment, and correspondence image is re-flagged, reach the purpose for strengthening static sign language data set.
Facilities network network layers in the step (3) are to use the AlexNet Internets for removing full articulamentum, 5 layers altogether, pond Change uses maximum pond;The extra convolutional network is 9 layer networks, wherein being divided into 8 layers of convolutional network layer and 1 layer of average pond Layer.
The extra convolution characteristic layer is added to the basic network end blocked, and successively reduces and obtain multiple size measurements Predicted value;Prediction sets are produced with one group of convolution filter in the characteristic layer of each addition, obtain classification fraction either phase Coordinate offset for giving tacit consent to frame;Coordinate offset is that acquiescence frame position is then relative to characteristic pattern relative to acquiescence frame measurement.
Rule caused by the prediction sets is:For the characteristic layer that the size with p passage is m*n, 3*3*p is used Convolution kernel does convolution, produces classification fraction either relative to the coordinate offset of acquiescence frame, and in each application convolution kernel operation M*n sizes position, produce an output valve.
Comprise the following steps during the deep learning network training of the step (3):(31) matching strategy:During training, it need to build Corresponding relation between vertical true tag and acquiescence frame, give tacit consent to the frame acquiescence higher than a certain threshold value overlapping with true tag with matching Frame;(32) training objective:Object function is the object function from MultiBox, and general objective loss function is position loss and put Believe the weighted sum of loss, wherein, position loss is the Smooth L1 losses between prediction block and true tag value frame parameter, is put Letter loss is cross validation (33) the selection acquiescence frame that a softmax loss is arranged to 1 to multi-class confidence and weight term Ratio and width ratio:Carried out by combining many features figure in all acquiescence frames of the different sizes and the ratio of width to height of all positions Predict to cover various input object size and dimensions.
The step (4) is specially:Obtain sign language image in real time with monocular cam, it is more that image is inputted into improved single After the deep learning network of object detector, classification and Detection result is obtained, realizes static sign language Real time identification.
Beneficial effect
As a result of above-mentioned technical scheme, the present invention compared with prior art, has the following advantages that and actively imitated Fruit:The present invention need not describe static sign language feature using hand-designed language, and the convolutional neural networks of use can obtain more Profound characteristic information so that the plasticity of model is good;Also, using small convolution filter come one of fixation on predicted characteristics figure The classification fraction of group default boundary frame and position skew, while the prediction of the characteristic pattern generation different scale from different scale, this Sample can greatly improve recognition speed, meet the requirement of static Sign Language Recognition real-time.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the deep learning network A SSD overall structure figures of improved single multi-target detection device;
Fig. 3 is the structural representation of inventive network ASSD facilities network network layers;
Fig. 4 is the experimental result picture of the static Sign Language Recognition of the present invention.
Embodiment
With reference to specific embodiment, the present invention is expanded on further.It should be understood that these embodiments are merely to illustrate the present invention Rather than limitation the scope of the present invention.In addition, it is to be understood that after the content of the invention lectured has been read, people in the art Member can make various changes or modifications to the present invention, and these equivalent form of values equally fall within the application appended claims and limited Scope.
Embodiments of the present invention are related to a kind of static sign language Real time identification based on modified single multi-target detection device Method, as shown in figure 1, comprising the following steps:First have to carry out handmarking to static sign language image, it is corresponding to obtain sign language image Label figure;Then the deep learning network of improved single multi-target detection device is built, by training image and corresponding label Figure, which is put into the network of structure, is iterated study, obtains the model parameter of network;Then input test image, improved single The deep learning network of multi-target detection device can be handled test image according to the model parameter for above training to obtain;Finally Obtain the static sign language generic label that occurs in test image and a corresponding probable value.It is specific as follows:
Step 1:Static sign language sample image is pre-processed.This experimental data has been gathered by high definition monocular cam Into.It is representative, respectively A, B, C, D, E to carry out choosing 5 letters in 26 alphabetical sign languages of static Sign Language Recognition in experiment.It is real Data are tested to be completed by 8 people, everyone is to each letter difference recorded video, then takes out frame program by Matlab videos and complete to take out frame, Manual removal smear is serious, the image of serious shielding, the method that high-pass filtering is used for the poor image of some display effects Enhancing processing is done to image, is easy to target identification, obtained preliminary data collection, picture size is 640*480.
Step 2:(2) build and strengthen static sign language image data set.The static sign language data set of structure includes original sample This image and the label image after manual mark is carried out to original sample image, the image tagged frame of markup information record with it is original Image corresponds;By the way of minute surface symmetrical treatment is done to original image, and correspondence image is re-flagged, it is quiet to reach reinforcing The purpose of state sign language data set.Final data collection is as shown in table 1, wherein alphabetical A training sets picture 2311 is opened, letter b training set Picture 2606 is opened, letter C training set picture 2581 is opened, alphabetical D training sets picture 2667 is opened, letter e training set picture 2659 is opened, 13024 altogether;Each alphabetical A, B, C, D, E test set is 500,2500 altogether.Enter pedestrian with LabelImg programs Work marks to obtain real goal label file.
The static sign language data set table of table 1
Step 3:Build the deep learning network A SSD based on improved single multi-target detection device.Obtained using step 2 Static sign language image data set train the deep learning network of the improved single multi-target detection device, as shown in Fig. 2 the net Network structure includes two parts and formed:Facilities network network layers are that feature extraction layer is 5 layer networks and extra convolution characteristic layer is 9 layers Network.The effect of facilities network network layers be by the feature of a series of convolution, excitation and the procedure extraction in pond original image, from And obtain characteristic pattern;The effect of extra convolutional layer is future position, and obtains confidence level.
As shown in figure 3, the effect of facilities network network layers is feature extraction, and feature extraction network f can regard a series of as Convolution, excitation and the process in pond.Using removing the AlexNet of full articulamentum as convolutional network, therefore, convolution of the invention Network tool is of five storeys.Assuming that the basic network is f, parameter θ, then f mathematic(al) representation be:
f(X;θ)=WLHL-1
Hl=pool (relu (WlHl-1+bl))
Wherein, X as convolutional neural networks input picture, HlFor the output of l layer Hidden units, blFor the deviation of l layers Value, WlFor the weights of l layers, and blAnd WlTrainable parameter θ is formed, pool () represents pondization operation, and relu () represents excitation Operation.Characteristic point in small neighbourhood is integrated to obtain new feature by pondization operation so that feature is reduced, and parameter is reduced, and Chi Huadan Member has translation invariance.The method in pond mainly includes average pondization and maximum pond, and the present invention mainly uses maximum Pondization operates.
Extra convolutional layer is added at the AlexNet basic networks end for removing full articulamentum, is 9 layer networks, wherein dividing For 8 layers of convolutional network layer and 1 layer of average pond layer, there are following characteristics:
(1) Analysis On Multi-scale Features figure detects:Extra convolution characteristic layer is added to the basic network end blocked, and these layers are gradual Reduce, obtain the predicted value of multiple size measurements.The convolution model of detection is different for each characteristic layer.
(2) convolution fallout predictor is detecting:Can in the characteristic layer (and the existing characteristic layer of basic network) of each addition Prediction sets are produced with one group of convolution filter as shown in Figure 2.Rule is caused by prediction sets:For with p passage Size be m*n characteristic layer, do convolution using 3*3*p convolution kernels, produce classification fraction either relative to the seat of acquiescence frame Mark skew, and in each m*n sizes position using convolution kernel operation, one output valve of generation.Bounding box offsets output valve Relative to acquiescence frame measurement, acquiescence frame position is then relative to characteristic pattern.
(3) frame and the ratio of width to height are given tacit consent to:One group of default boundary frame is associated with each characteristic pattern unit of overlay network.Give tacit consent to frame Convolution algorithm is made to characteristic pattern so that each frame example is fixed relative to the position of its corresponding unit lattice.In each feature In map unit, every class fraction relative to example in the skew for giving tacit consent to frame shape in cell, and each frame is predicted.
Inventive network structure ASSD as shown in Figure 2, with Conv5, Conv7 (former 7th layer of full articulamentum), Conv8_2, These layers of Conv9_2, Conv10_2 and pool11 carry out predicted position with calculating confidence level.The present invention uses " xavier " method Initialize the parameter of the convolutional layer of all new additions.Because Conv4_3 size is larger (38 × 38), thus we only thereon 3 acquiescence frames are placed, wherein the frame that frame and other aspect ratio comprising 0.1 ratio are 0.5 and 2.For every other layer, We set 6 acquiescence frames.Conv4_3 takes on a different character yardstick with other layers compared with simultaneously, of the invention to use L2 canonicals Change technology, the feature norm of each opening position in characteristic pattern is scaled 20, and learns ratio during backpropagation.The present invention Use 10-3Learning rate carries out 40k iterative learning, then with 10-4With 10-5Learning rate carries out the iterative learning of 10k times.
Network A SSD keys of the present invention are trained to need to be imparted to those fixation outputs in the true tag in training image Acquiescence frame on, have following several dot characteristics:
(1) matching strategy:During training, true tag need to be established and give tacit consent to the corresponding relation between frame, give tacit consent to frame with matching The acquiescence frame higher than a certain threshold value (0.5) overlapping with true tag Jaccard.
(2) training objective:The object function of ASSD training, the object function from MultiBox.WithRepresent i-th J-th of true tag of individual acquiescence frame and classification p matches, otherwiseAccording to matching strategy, necessarily have It is matching to mean that j-th of true tag has been possible to multiple acquiescence frames.General objective loss function L (x, c, l, g) is position Lose LlocL is lost with confidenceconfWeighted sum, be shown below:
In formula, N is the quantity of the acquiescence frame of matching, and x is the variable using image as input.
Lose L in positionlocIt is the SmoothL1 losses between prediction block l and true tag value frame g parameters, returns bounding box d Center (cx, cy) and its width w and height h offset.
Confidence loses LconfIt is the cross validation that a softmax loss is arranged to 1 to multi-class confidence c and weight term α, It is shown below:
Wherein:
(3) ratio and width ratio of selection acquiescence frame:In single network the characteristic pattern of different layers be predicted and Parameter is shared on all subjective scales can reduce calculating and memory requirements, also, the ad-hoc location of characteristic pattern comes in present networks It is responsible for specific region and object specific dimensions in image, so giving tacit consent to frame need not be corresponding with receptive field in every layer.Pass through combination For many features figure in the prediction of all acquiescence frames of the different sizes and the ratio of width to height of all positions, we have diversified prediction Set, covers various input object size and dimensions.
Step 4:The static sign language data set obtained using step 2 trains this network.The present invention uses SGD (stochastic gradients Decline) model is finely adjusted, initial learning rate, using 0.9 momentum, 0.0005 weight decay, 32 batch size etc.. A preferable model parameter of effect is selected, for experiment test.
Fig. 4 is the experimental result picture of static Sign Language Recognition.5 manual alphabet A, B, C, D, E recognition results are illustrated, comprising The class label and probability size of sign language to be identified, and the frame per second currently identified.It can be seen that changing using present embodiment The deep learning network of the single multi-target detection device entered, very good to the effect of static sign language, the speed of identification also quickly, expires The requirement of sufficient Real time identification.

Claims (8)

  1. A kind of 1. static sign language real-time identification method based on modified single multi-target detection device, it is characterised in that including with Lower step:
    (1) static sign language sample image is pre-processed;
    (2) build and strengthen static sign language image data set;
    (3) the deep learning network based on improved single multi-target detection device is built, the deep learning network is divided into facilities network Network layers and extra convolution characteristic layer;Wherein, facilities network network layers are used for feature extraction, and the image of input is converted into the spy of various dimensions Sign represents;Extra convolutional layer is a kind of feature selecting strategy, is write from memory with small convolution filter come one group fixed on predicted characteristics figure Recognize classification fraction and the position skew of bounding box, while the prediction of different scale is produced from the characteristic pattern of different scale;
    (4) this network is trained using static sign language data set, camera is gathered to the network that sign language video input trains in real time, Realize static sign language Real time identification.
  2. 2. the static sign language real-time identification method according to claim 1 based on modified single multi-target detection device, its It is characterised by, the step (1) is specially:Static sign language video is recorded, and it is image that video is taken out into frame, and it is tight to remove smear manually The image of weight and serious shielding, and enhancing processing is carried out using the method for high-pass filtering to image.
  3. 3. the static sign language real-time identification method according to claim 1 based on modified single multi-target detection device, its It is characterised by, the static sign language data set of structure includes original sample image and original sample image is entered in the step (2) Label image after the manual mark of row, image tagged frame and the original image of markup information record correspond;Using to original Image does the mode of minute surface symmetrical treatment, and re-flags correspondence image, reaches the purpose for strengthening static sign language data set.
  4. 4. the static sign language real-time identification method according to claim 1 based on modified single multi-target detection device, its It is characterised by, the facilities network network layers in the step (3) are to use the AlexNet Internets for removing full articulamentum, 5 layers altogether, Pondization uses maximum pond;The extra convolutional network is 9 layer networks, wherein being divided into 8 layers of convolutional network layer and 1 layer of average pond Change layer.
  5. 5. the static sign language real-time identification method according to claim 4 based on modified single multi-target detection device, its It is characterised by, the extra convolution characteristic layer is added to the basic network end blocked, and successively reduces and obtain multiple yardstick inspections The predicted value of survey;Prediction sets are produced with one group of convolution filter in the characteristic layer of each addition, obtain classification fraction either Relative to the coordinate offset of acquiescence frame;Coordinate offset is that acquiescence frame position is then relative to characteristic pattern relative to acquiescence frame measurement.
  6. 6. the static sign language real-time identification method according to claim 5 based on modified single multi-target detection device, its It is characterised by, rule caused by the prediction sets is:For the characteristic layer that the size with p passage is m*n, 3*3* is used P convolution kernels do convolution, produce classification fraction either relative to the coordinate offset of acquiescence frame, and in each application convolution kernel operation M*n sizes position, produce an output valve.
  7. 7. the static sign language real-time identification method according to claim 4 based on modified single multi-target detection device, its It is characterised by, comprises the following steps during the deep learning network training of the step (3):(31) matching strategy:During training, it need to build Corresponding relation between vertical true tag and acquiescence frame, give tacit consent to the frame acquiescence higher than a certain threshold value overlapping with true tag with matching Frame;(32) training objective:Object function is the object function from MultiBox, and general objective loss function is position loss and put Believe the weighted sum of loss, wherein, position loss is the Smooth L1 losses between prediction block and true tag value frame parameter, is put Letter loss is cross validation (33) the selection acquiescence frame that a softmax loss is arranged to 1 to multi-class confidence and weight term Ratio and width ratio:Carried out by combining many features figure in all acquiescence frames of the different sizes and the ratio of width to height of all positions Predict to cover various input object size and dimensions.
  8. 8. the static sign language real-time identification method according to claim 1 based on modified single multi-target detection device, its It is characterised by, the step (4) is specially:Obtain sign language image in real time with monocular cam, image is inputted into improved single After the deep learning network of multi-target detection device, classification and Detection result is obtained, realizes static sign language Real time identification.
CN201710899126.0A 2017-09-28 2017-09-28 Static sign language real-time identification method based on modified single multi-target detection device Pending CN107657233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710899126.0A CN107657233A (en) 2017-09-28 2017-09-28 Static sign language real-time identification method based on modified single multi-target detection device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710899126.0A CN107657233A (en) 2017-09-28 2017-09-28 Static sign language real-time identification method based on modified single multi-target detection device

Publications (1)

Publication Number Publication Date
CN107657233A true CN107657233A (en) 2018-02-02

Family

ID=61116880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710899126.0A Pending CN107657233A (en) 2017-09-28 2017-09-28 Static sign language real-time identification method based on modified single multi-target detection device

Country Status (1)

Country Link
CN (1) CN107657233A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846826A (en) * 2018-04-24 2018-11-20 深圳大学 Object detecting method, device, image processing equipment and storage medium
CN108875537A (en) * 2018-02-28 2018-11-23 北京旷视科技有限公司 Method for checking object, device and system and storage medium
CN109271901A (en) * 2018-08-31 2019-01-25 武汉大学 A kind of sign Language Recognition Method based on Multi-source Information Fusion
CN109376571A (en) * 2018-08-03 2019-02-22 西安电子科技大学 Estimation method of human posture based on deformation convolution
CN109508715A (en) * 2018-10-30 2019-03-22 南昌大学 A kind of License Plate and recognition methods based on deep learning
CN109635750A (en) * 2018-12-14 2019-04-16 广西师范大学 A kind of compound convolutional neural networks images of gestures recognition methods under complex background
CN110008848A (en) * 2019-03-13 2019-07-12 华南理工大学 A kind of travelable area recognizing method of the road based on binocular stereo vision
CN110032980A (en) * 2019-04-18 2019-07-19 天津工业大学 A kind of organ detection and recognition positioning method based on deep learning
CN110288030A (en) * 2019-06-27 2019-09-27 重庆大学 Image-recognizing method, device and equipment based on lightweight network model
CN110399850A (en) * 2019-07-30 2019-11-01 西安工业大学 A kind of continuous sign language recognition method based on deep neural network
CN110555371A (en) * 2019-07-19 2019-12-10 华瑞新智科技(北京)有限公司 Wild animal information acquisition method and device based on unmanned aerial vehicle
CN110717422A (en) * 2019-09-25 2020-01-21 北京影谱科技股份有限公司 Method and system for identifying interactive action based on convolutional neural network
CN111523530A (en) * 2020-04-13 2020-08-11 南京行者易智能交通科技有限公司 Mapping method of score map in target detection and target detection method
CN111562815A (en) * 2020-05-04 2020-08-21 北京花兰德科技咨询服务有限公司 Wireless head-mounted device and language translation system
CN112614121A (en) * 2020-12-29 2021-04-06 国网青海省电力公司海南供电公司 Multi-scale small-target equipment defect identification and monitoring method
CN117830859A (en) * 2024-03-05 2024-04-05 农业农村部南京农业机械化研究所 Automatic fruit tree target recognition method and system based on image processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608432A (en) * 2015-12-21 2016-05-25 浙江大学 Instantaneous myoelectricity image based gesture identification method
CN106598226A (en) * 2016-11-16 2017-04-26 天津大学 UAV (Unmanned Aerial Vehicle) man-machine interaction method based on binocular vision and deep learning
US20170206405A1 (en) * 2016-01-14 2017-07-20 Nvidia Corporation Online detection and classification of dynamic gestures with recurrent convolutional neural networks
CN106980365A (en) * 2017-02-21 2017-07-25 华南理工大学 The first visual angle dynamic gesture identification method based on depth convolutional neural networks framework

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608432A (en) * 2015-12-21 2016-05-25 浙江大学 Instantaneous myoelectricity image based gesture identification method
US20170206405A1 (en) * 2016-01-14 2017-07-20 Nvidia Corporation Online detection and classification of dynamic gestures with recurrent convolutional neural networks
CN106598226A (en) * 2016-11-16 2017-04-26 天津大学 UAV (Unmanned Aerial Vehicle) man-machine interaction method based on binocular vision and deep learning
CN106980365A (en) * 2017-02-21 2017-07-25 华南理工大学 The first visual angle dynamic gesture identification method based on depth convolutional neural networks framework

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEI LIU ETAL.: ""SSD: Single Shot MultiBox Detector"", 《ARXIV》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875537A (en) * 2018-02-28 2018-11-23 北京旷视科技有限公司 Method for checking object, device and system and storage medium
CN108875537B (en) * 2018-02-28 2022-11-08 北京旷视科技有限公司 Object detection method, device and system and storage medium
CN108846826A (en) * 2018-04-24 2018-11-20 深圳大学 Object detecting method, device, image processing equipment and storage medium
CN109376571A (en) * 2018-08-03 2019-02-22 西安电子科技大学 Estimation method of human posture based on deformation convolution
CN109271901A (en) * 2018-08-31 2019-01-25 武汉大学 A kind of sign Language Recognition Method based on Multi-source Information Fusion
CN109508715A (en) * 2018-10-30 2019-03-22 南昌大学 A kind of License Plate and recognition methods based on deep learning
CN109508715B (en) * 2018-10-30 2022-11-08 南昌大学 License plate positioning and identifying method based on deep learning
CN109635750A (en) * 2018-12-14 2019-04-16 广西师范大学 A kind of compound convolutional neural networks images of gestures recognition methods under complex background
CN110008848A (en) * 2019-03-13 2019-07-12 华南理工大学 A kind of travelable area recognizing method of the road based on binocular stereo vision
CN110032980B (en) * 2019-04-18 2023-04-25 天津工业大学 Organ detection and identification positioning method based on deep learning
CN110032980A (en) * 2019-04-18 2019-07-19 天津工业大学 A kind of organ detection and recognition positioning method based on deep learning
CN110288030A (en) * 2019-06-27 2019-09-27 重庆大学 Image-recognizing method, device and equipment based on lightweight network model
CN110288030B (en) * 2019-06-27 2023-04-07 重庆大学 Image identification method, device and equipment based on lightweight network model
CN110555371A (en) * 2019-07-19 2019-12-10 华瑞新智科技(北京)有限公司 Wild animal information acquisition method and device based on unmanned aerial vehicle
CN110399850A (en) * 2019-07-30 2019-11-01 西安工业大学 A kind of continuous sign language recognition method based on deep neural network
CN110717422A (en) * 2019-09-25 2020-01-21 北京影谱科技股份有限公司 Method and system for identifying interactive action based on convolutional neural network
CN111523530B (en) * 2020-04-13 2021-04-02 南京行者易智能交通科技有限公司 Mapping method of score map in target detection and target detection method
CN111523530A (en) * 2020-04-13 2020-08-11 南京行者易智能交通科技有限公司 Mapping method of score map in target detection and target detection method
CN111562815B (en) * 2020-05-04 2021-07-13 北京花兰德科技咨询服务有限公司 Wireless head-mounted device and language translation system
CN111562815A (en) * 2020-05-04 2020-08-21 北京花兰德科技咨询服务有限公司 Wireless head-mounted device and language translation system
CN112614121A (en) * 2020-12-29 2021-04-06 国网青海省电力公司海南供电公司 Multi-scale small-target equipment defect identification and monitoring method
CN117830859A (en) * 2024-03-05 2024-04-05 农业农村部南京农业机械化研究所 Automatic fruit tree target recognition method and system based on image processing
CN117830859B (en) * 2024-03-05 2024-05-03 农业农村部南京农业机械化研究所 Automatic fruit tree target recognition method and system based on image processing

Similar Documents

Publication Publication Date Title
CN107657233A (en) Static sign language real-time identification method based on modified single multi-target detection device
CN108509839A (en) One kind being based on the efficient gestures detection recognition methods of region convolutional neural networks
CN104537393B (en) A kind of traffic sign recognition method based on multiresolution convolutional neural networks
CN103605972B (en) Non-restricted environment face verification method based on block depth neural network
CN110287960A (en) The detection recognition method of curve text in natural scene image
Hoque et al. Real time bangladeshi sign language detection using faster r-cnn
CN106960206A (en) Character identifying method and character recognition system
Huang et al. Development and validation of a deep learning algorithm for the recognition of plant disease
CN107133616A (en) A kind of non-division character locating and recognition methods based on deep learning
CN106845430A (en) Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN106845499A (en) A kind of image object detection method semantic based on natural language
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN106096557A (en) A kind of semi-supervised learning facial expression recognizing method based on fuzzy training sample
CN106372581A (en) Method for constructing and training human face identification feature extraction network
CN107529650A (en) The structure and closed loop detection method of network model, related device and computer equipment
CN107239762A (en) Patronage statistical method in a kind of bus of view-based access control model
CN112784763A (en) Expression recognition method and system based on local and overall feature adaptive fusion
Hao Multimedia English teaching analysis based on deep learning speech enhancement algorithm and robust expression positioning
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
CN110503090A (en) Character machining network training method, character detection method and character machining device based on limited attention model
Wang et al. Scene text recognition algorithm based on faster RCNN
Shinde et al. Math accessibility for blind people in society using machine learning
CN109284752A (en) A kind of rapid detection method of vehicle
Patel et al. Multiresolution technique to handwritten English character recognition using learning rule and Euclidean distance metric
US11521427B1 (en) Ear detection method with deep learning pairwise model based on contextual information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180202