YIN ETAL:CAMK:CAMERA-BASED KEYSTROKE DETECTION AND LOCALIZATION FOR SMALL MOBILE DEVICES 2241 0(0,0) (xA,yi)】 (x/2,yi2)】 B (比,y2 D ∠K3/∠K4y (x4ya】 (t3y) (a)Candidate keys (b)Locating a fingertip (a)Keys around the fingertip (b)Keys containing the fin- Fig.7.Candidate keys and Candidate fingertips. gertip Therefore,we can use the location variation of the candidate fingertip to detect a possible keystroke.In Frame i,we use P (i)and P(r,r)to represent the candidate finger- tips in the left hand and right hand,respectively.If the can- didate fingertips in frame [i-1,i]satisfy Eq.(2)in left hand or Eq.(3)in right hand,the corresponding fingertip will be treated as static,i.e.,a keystroke probably happens.Based (c)Visually obstructed key (d)Vertical distance with re- on extensive experiments,we set Ar =5 empirically maining fingertips Fig.8.Candidate fingertips/keys in each step. V(,-x4-1)P+(--1尸≤△r (2) a fingertip P is not located in the keyboard region,CamK V(r:-x-)》2+(--1)2≤△r (3) eliminates it from the candidate fingertips Crip. Selecting the Nearest Candidate Keys.For each candidate fingertip Pi,we first search the candidate keys which are 4.3.3 Keystroke Localization by Correlating the probably pressed by P.As shown in Fig.7a,although the Fingertip with the Pressed Key real fingertip is Pi,the detected fingertip is P.We use p to After detecting a possible keystroke,we correlate the candi- search the candidate keys.We use Kej(ej,to represent date fingertip and the pressed key to locate the keystroke, the centroid of key Kj.Then we get two rows of keys near- based on the observations of Section 3.1.In regard to the est the location P(,)(i.e.,the rows with two smallest candidate fingertips,we treat the thumb as a special case, lyj-).For each row,we select the two nearest keys (i.e., and also select it as a candidate fingertip at first.Then,we the keys with two smallest -)In Fig.7a,the candi- get the candidate fingertip set Cp=(PP,left thumb date key set C is consisted of K1,K2,K3,K.Fig.8a shows in frame i,right thumb in frame i.After that,we can locate the candidate keys of each fingertip. the keystroke by using Algorithm 1. Retaining Candidate Keys Containing the Candidate Finger- tip.If a key is pressed by the user,the fingertip will be Algorithm 1.Keystroke Localization located in that key.Thus we use the location of the fingertip Pi(,)to verify whether a candidate key contains the fin- Input:Candidate fingertip set Cup in frame i. Remove fingertips out of the keyboard from Crip. gertip,to remove the invalid candidate keys.As shown in Fig.7a,there exists a small deviation between the real finger- forP∈Cup do Obtain candidate key set Cey around P. tip and the detected fingertip.Therefore,we extend the range forK;∈Ckey do of the detected fingertip to Ri,as shown in Fig.7a.If any point if P is located in Ki then P(k,y)in the range Ri is located in a candidate key Ki,P is Calculate the coverage ratio P;of Kj. considered to be located in Kj.Ri is calculated as [PE if Pe;<Pi then RV(在:-xk)2+(i-)2≤△rh.We set△r=5 empirically. Remove Ki from Ciey. As shown in Fig.7b,a key is represented as a quadrangle else Remove Ki from Ckey. ABCD.If a point is located in ABCD,when we traverse if Ckey≠0then ABCD clockwise,the point will be located in the right side Select Kj with largest P&,from Ce. of each edge in ABCD.As shown in Fig.2a,the origin of <P,Kj>forms a possible keystroke. coordinates is located in the top left corner of the image. else Remove P;from Ctip. Therefore,if the fingertip PE Ri satisfies Eq.(4),it is located if Cp =0 then No keystroke occurs,return in the key.CamK will keep it as a candidate key.Otherwise, if ICipl =1 then Return the pressed key. CamK removes the key from the candidate key set Ckey.In Select P,Ki>with largest ratio P&,in each hand. Fig.7a,K1,K2 are the remaining candidate keys.The candi- Obtain P,K>(<P,K >in left (right)hand. date keys contain the fingertip in Fig.8a is shown in Fig.8b Calculate relative distance d(d)in left(right)hand. if d>d,then Return Ki.else Return Kr. AB×AP≥0,BC×B≥0, (4) Output:The pressed key. C元×C2≥0,DA×D2≥0. Eliminating Impossible Fingertips.For convenience,we use Calculating the Coverage Ratios of Candidate Keys.When a P to represent the fingertip in Crip,i.e.,P Ctip,i[1,4].If key is pressed,it is visually obstructed by the fingertip,as
Therefore, we can use the location variation of the candidate fingertip to detect a possible keystroke. In Frame i, we use Pliðxli ; yliÞ and Priðxri ; yriÞ to represent the candidate fingertips in the left hand and right hand, respectively. If the candidate fingertips in frame ½i 1; i satisfy Eq. (2) in left hand or Eq. (3) in right hand, the corresponding fingertip will be treated as static, i.e., a keystroke probably happens. Based on extensive experiments, we set Dr ¼ 5 empirically ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðxli xli1 Þ 2 þ ðyli yli1 Þ 2 q Dr; (2) ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðxri xri1 Þ 2 þ ðyri yri1 Þ 2 q Dr: (3) 4.3.3 Keystroke Localization by Correlating the Fingertip with the Pressed Key After detecting a possible keystroke, we correlate the candidate fingertip and the pressed key to locate the keystroke, based on the observations of Section 3.1. In regard to the candidate fingertips, we treat the thumb as a special case, and also select it as a candidate fingertip at first. Then, we get the candidate fingertip set Ctip ¼ fPli ; Pri ; left thumb in frame i;right thumb in frame ig. After that, we can locate the keystroke by using Algorithm 1. Algorithm 1. Keystroke Localization Input: Candidate fingertip set Ctip in frame i. Remove fingertips out of the keyboard from Ctip . for Pi 2 Ctip do Obtain candidate key set Ckey around Pi. for Kj 2 Ckey do if Pi is located in Kj then Calculate the coverage ratio rkj of Kj. if rkj < rl then Remove Kj from Ckey. else Remove Kj from Ckey. if Ckey 6¼ ; then Select Kj with largest rkj from Ckey. < Pi; Kj > forms a possible keystroke. else Remove Pi from Ctip. if Ctip ¼ ; then No keystroke occurs, return. if jCtipj ¼ 1 then Return the pressed key. Select < Pi; Kj > with largest ratio rkj in each hand. Obtain < Pl; Kl > ( < Pr; Kr > ) in left (right) hand. Calculate relative distance dl (dr) in left (right) hand. if dl > dr then Return Kl. else Return Kr. Output: The pressed key. Eliminating Impossible Fingertips. For convenience, we use Pi to represent the fingertip in Ctip, i.e., Pi 2 Ctip; i 2 ½1; 4. If a fingertip Pi is not located in the keyboard region, CamK eliminates it from the candidate fingertips Ctip. Selecting the Nearest Candidate Keys. For each candidate fingertip Pi, we first search the candidate keys which are probably pressed by Pi. As shown in Fig. 7a, although the real fingertip is Pi, the detected fingertip is P^ i. We use P^i to search the candidate keys. We use Kcjðxcj; ycjÞ to represent the centroid of key Kj. Then we get two rows of keys nearest the location P^iðx^i; y^iÞ (i.e., the rows with two smallest jycj y^ij). For each row, we select the two nearest keys (i.e., the keys with two smallest jxcj x^ij). In Fig. 7a, the candidate key set Ckey is consisted of K1; K2; K3; K4. Fig. 8a shows the candidate keys of each fingertip. Retaining Candidate Keys Containing the Candidate Fingertip. If a key is pressed by the user, the fingertip will be located in that key. Thus we use the location of the fingertip P^iðx^i; y^iÞ to verify whether a candidate key contains the fingertip, to remove the invalid candidate keys. As shown in Fig. 7a, there exists a small deviation between the real fingertip and the detected fingertip. Therefore, we extend the range of the detected fingertip to Ri, as shown in Fig. 7a. If any point Pkðxk; ykÞ in the range Ri is located in a candidate key Kj, P^ i is considered to be located in Kj. Ri is calculated as fPk 2 Rij ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðx^i xkÞ 2 þ ðy^i ykÞ 2 q Drg. We set Dr ¼ 5 empirically. As shown in Fig. 7b, a key is represented as a quadrangle ABCD. If a point is located in ABCD, when we traverse ABCD clockwise, the point will be located in the right side of each edge in ABCD. As shown in Fig. 2a, the origin of coordinates is located in the top left corner of the image. Therefore, if the fingertip P 2 Ri satisfies Eq. (4), it is located in the key. CamK will keep it as a candidate key. Otherwise, CamK removes the key from the candidate key set Ckey. In Fig. 7a, K1; K2 are the remaining candidate keys. The candidate keys contain the fingertip in Fig. 8a is shown in Fig. 8b AB! AP! 0; BC! BP! 0; CD! CP! 0; DA! DP! 0: (4) Calculating the Coverage Ratios of Candidate Keys. When a key is pressed, it is visually obstructed by the fingertip, as Fig. 7. Candidate keys and Candidate fingertips. Fig. 8. Candidate fingertips/keys in each step. YIN ET AL.: CAMK: CAMERA-BASED KEYSTROKE DETECTION AND LOCALIZATION FOR SMALL MOBILE DEVICES 2241
2242 IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.17,NO.10.OCTOBER 2018 the dashed area of key Ki shown in Fig.7a.We use the cov- 5.1 Initial Training erage ratio to measure the visually obstructed area of a can- Optimal Parameters for Image Processing.For key segmenta- didate key,in order to remove wrong candidate keys.For a tion (see Section 4.1.2),sy is used for tolerating the change candidate key Ki,whose area is Sk,the visually obstructed of Y caused by the environment.Initially,y=50.CamK area is D,and its coverage ratio is Pk=s Du.For a larger updates y=+1,when the number of extracted key (e.g.,the space key),we update pe;by multiplying a key keys decreases,it stops.Then,CamK sets y to 50 and size factor方rie,P%,=min().where方=S/风. updates y=-1-1,when the number of extracted keys decreases,it stops.In the process,when CamK gets maxi- Here,5 means the average area of a key,i.e,Sa=So/Natg mum number of keys,the corresponding value Ey is If P,>pL,the key Kj is still a candidate key.Otherwise, selected as the optimal value for y. CamK removes it from the candidate key set Ckey.We set In hand segmentation,CamK uses erosion and dilation Pr=0.25 by default.For each hand,if there is more than operations,which respectively use a kernel B[32]to process one candidate key,we will keep the key with largest cover- images.To get a suitable size of B,the user first puts his/her age ratio as the final candidate key.For a candidate fingertip, hands on the home row of the keyboard (see Fig.5a).For sim- if there is no candidate key associated with it,the fingertip plicity,we set the kernel sizes for erosion and dilation to be will be eliminated.Fig.8c shows each candidate fingertip equal.The initial kernel size is 2o=0.Then,CamK updates and its associated key. =-1+1.When CamK can localize each fingertip in the correct key with zi,then CamK sets the kernel size as z=z. 4.3.4 Vertical Distance with Remaining Fingertips In initial training,the user puts on the hands based on the on-screen instructions,it usually spends less than 10s. Until now,there is one candidate fingertip in each hand at Frame Rate Selection.CamK sets the initial/default frame most.If there are no candidate fingertips,then no keystroke rate of the camera to be fo =30fps (frames per second), is detected.If there is only one candidate fingertip,then the which is usually the maximal possible frame rate.We use no; fingertip is the StrokeTip while the associated key is Stroke- to represent the number of frames containing the ith key- Key,they represent the keystroke.However,if there are two stroke.When the user has pressed u keys,we can get candidate fingertips,we will utilize the vertical distance the average number of frames during a keystroke as between the candidate fingertip and the remaining fingertips to choose the most probable StrokeTip,as shown in Fig.2a. o=士·∑ng,In fact,,而reflects the duration of a key- We use P(x,y)and P(r,yr)to represent the candidate stroke.When the frame rate f changes,the number of frames fingertips in the left hand and right hand,respectively in a keystroke nf changes.Intuitively,a smaller value of nf Then,we calculate the distance d between P and the can reduce the image processing time,while a larger value of remaining fingertips in the left hand,and the distance d, nf can improve the accuracy of keystroke localization.Based between P,and the remaining fingertips in the right hand. on extensive experiments (see Section 7.3),we set rif=3, Here,d=l-7·∑ih,j≠4,while d,=l-a·∑0 husf=[6· yj,jr.Here,yj represents the vertical coordinate of fin- gertip j.If d>d,we choose P as the StrokeTip.Otherwise, 5.2 Online Calibration we choose P as the StrokeTip.The associated key for the Removing False Positive Keystrokes.Under certain conditions, StrokeTip is the pressed key StrokeKey.In Fig.8d,we choose the user does not type any key while keeping the fingers sta- fingertip 3 in the left hand as StrokeTip.However,consider- tionary,CamK may misclassify the non-keystroke as a key- ing the effect of camera's view,sometimes di(d,)may fail to stroke.Thus we introduce a femporary character to mitigate locate the keystroke accurately.Therefore,for the unse- this problem. lected candidate fingertip (e.g.,fingertip 8 in Fig.8d),we In the process of pressing a key,the SfrokeTip moves will not discard its associated key directly.Specifically,we towards the key,stays at that key,and then moves away. sort the previous candidate keys which contain the candi- The vertical coordinate of the StrokeTip first increases,then date fingertip based on the coverage ratio in descending pauses,then decreases.If CamK has detected a keystroke in order.Finally,we select top four candidates keys and show nf consecutive frames,it displays the current character on them on the screen.The user can press the candidate key for the screen as a temporary character.In the next frame(s),if text input(see Fig.1),to tolerate the localization error. the position of the StrokeTip does not satisfy the features of a keystroke,CamK will cancel the temporary character.This 5 OPTIMIZATIONS FOR KEYSTROKE LOCALIZATION does not have much impact on the user's experience, because of the short time between two consecutive frames. AND IMAGE PROCESSING Besides,CamK also displays the candidate keys around the Considering the deviation caused by image processing,the StrokeTip,the user can choose them for text input. influence of light conditions,and other factors,we introduce Movement of Smartphone or Keyboard.CamK presumes that the initial training to select the suitable values of parameters the smartphone and the keyboard do not move while in use. for image processing and utilize online calibration to For best results,we recommend the user to tape the paper improve the performance of keystroke detection and locali- keyboard on a flat surface.Nevertheless,to alleviate the zation.In addition,considering the limited resources of effect caused by the movements of the mobile device or the small mobile devices,we also introduce multiple optimiza- keyboard,we offer a simple solution.If the user continu- tion techniques to reduce the time latency and energy cost ously uses the Delete key on the screen multiple times(e.g., in CamK. larger than 3 times),CamK will inform the user to move
the dashed area of key K1 shown in Fig. 7a. We use the coverage ratio to measure the visually obstructed area of a candidate key, in order to remove wrong candidate keys. For a candidate key Kj, whose area is Skj , the visually obstructed area is Dkj , and its coverage ratio is rkj ¼ Dkj Skj . For a larger key (e.g., the space key), we update rkj by multiplying a key size factor fj, i.e., rkj ¼ minð Dkj Skj fj; 1Þ, where fj ¼ Skj=S k. Here, S k means the average area of a key, i.e, S k ¼ Sb=Navg. If rkj rl, the key Kj is still a candidate key. Otherwise, CamK removes it from the candidate key set Ckey. We set rl ¼ 0:25 by default. For each hand, if there is more than one candidate key, we will keep the key with largest coverage ratio as the final candidate key. For a candidate fingertip, if there is no candidate key associated with it, the fingertip will be eliminated. Fig. 8c shows each candidate fingertip and its associated key. 4.3.4 Vertical Distance with Remaining Fingertips Until now, there is one candidate fingertip in each hand at most. If there are no candidate fingertips, then no keystroke is detected. If there is only one candidate fingertip, then the fingertip is the StrokeTip while the associated key is StrokeKey, they represent the keystroke. However, if there are two candidate fingertips, we will utilize the vertical distance between the candidate fingertip and the remaining fingertips to choose the most probable StrokeTip, as shown in Fig. 2a. We use Plðxl; ylÞ and Prðxr; yrÞ to represent the candidate fingertips in the left hand and right hand, respectively. Then, we calculate the distance dl between Pl and the remaining fingertips in the left hand, and the distance dr between Pr and the remaining fingertips in the right hand. Here, dl ¼ jyl 1 4 Pj¼5 j¼1 yj; j 6¼ lj, while dr ¼ jyr 1 4 Pj¼10 j¼6 yj; j 6¼ rj. Here, yj represents the vertical coordinate of fingertip j. If dl > dr, we choose Pl as the StrokeTip. Otherwise, we choose Pr as the StrokeTip. The associated key for the StrokeTip is the pressed key StrokeKey. In Fig. 8d, we choose fingertip 3 in the left hand as StrokeTip. However, considering the effect of camera’s view, sometimes dl (dr) may fail to locate the keystroke accurately. Therefore, for the unselected candidate fingertip (e.g., fingertip 8 in Fig. 8d), we will not discard its associated key directly. Specifically, we sort the previous candidate keys which contain the candidate fingertip based on the coverage ratio in descending order. Finally, we select top four candidates keys and show them on the screen. The user can press the candidate key for text input (see Fig. 1), to tolerate the localization error. 5 OPTIMIZATIONS FOR KEYSTROKE LOCALIZATION AND IMAGE PROCESSING Considering the deviation caused by image processing, the influence of light conditions, and other factors, we introduce the initial training to select the suitable values of parameters for image processing and utilize online calibration to improve the performance of keystroke detection and localization. In addition, considering the limited resources of small mobile devices, we also introduce multiple optimization techniques to reduce the time latency and energy cost in CamK. 5.1 Initial Training Optimal Parameters for Image Processing. For key segmentation (see Section 4.1.2), "y is used for tolerating the change of Y caused by the environment. Initially, "y ¼ 50. CamK updates "yi ¼ "yi1 þ 1, when the number of extracted keys decreases, it stops. Then, CamK sets "y to 50 and updates "yi ¼ "yi1 1, when the number of extracted keys decreases, it stops. In the process, when CamK gets maximum number of keys, the corresponding value "yi is selected as the optimal value for "y. In hand segmentation, CamK uses erosion and dilation operations, which respectively use a kernel B [32] to process images. To get a suitable size of B, the user first puts his/her hands on the home row of the keyboard (see Fig. 5a). For simplicity, we set the kernel sizes for erosion and dilation to be equal. The initial kernel size is z0 ¼ 0. Then, CamK updates zi ¼ zi1 þ 1. When CamK can localize each fingertip in the correct key with zi, then CamK sets the kernel size as z ¼ zi. In initial training, the user puts on the hands based on the on-screen instructions, it usually spends less than 10s. Frame Rate Selection. CamK sets the initial/default frame rate of the camera to be f0 ¼ 30fps (frames per second), which is usually the maximal possible frame rate. We use n0i to represent the number of frames containing the ith keystroke. When the user has pressed u keys, we can get the average number of frames during a keystroke as n0 ¼ 1 u Pi¼u i¼1 n0i . In fact, n0 reflects the duration of a keystroke. When the frame rate f changes, the number of frames in a keystroke nf changes. Intuitively, a smaller value of nf can reduce the image processing time, while a larger value of nf can improve the accuracy of keystroke localization. Based on extensive experiments (see Section 7.3), we set nf ¼ 3, thus f ¼ df0 nf n0 e. 5.2 Online Calibration Removing False Positive Keystrokes. Under certain conditions, the user does not type any key while keeping the fingers stationary, CamK may misclassify the non-keystroke as a keystroke. Thus we introduce a temporary character to mitigate this problem. In the process of pressing a key, the StrokeTip moves towards the key, stays at that key, and then moves away. The vertical coordinate of the StrokeTip first increases, then pauses, then decreases. If CamK has detected a keystroke in nf consecutive frames, it displays the current character on the screen as a temporary character. In the next frame(s), if the position of the StrokeTip does not satisfy the features of a keystroke, CamK will cancel the temporary character. This does not have much impact on the user’s experience, because of the short time between two consecutive frames. Besides, CamK also displays the candidate keys around the StrokeTip, the user can choose them for text input. Movement of Smartphone or Keyboard. CamK presumes that the smartphone and the keyboard do not move while in use. For best results, we recommend the user to tape the paper keyboard on a flat surface. Nevertheless, to alleviate the effect caused by the movements of the mobile device or the keyboard, we offer a simple solution. If the user continuously uses the Delete key on the screen multiple times (e.g., larger than 3 times), CamK will inform the user to move 2242 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 17, NO. 10, OCTOBER 2018