EEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications CamK:a Camera-based Keyboard for Small Mobile Devices Yafeng Yint,Qun Lit,Lei Xief,Shanhe Yit,Edmund Novak,Sanglu Lut State Key Laboratory for Novel Software Technology,Nanjing University,China College of William and Mary,Williamsburg,VA,USA Email:fyyf@dislab.nju.edu.cn,tfIxie,sanglu}@nju.edu.cn,[liqun,syi,ejnovak}@cs.wm.edu Abstract-Due to the smaller size of mobile devices,on-screen keystrokes.CamK can be used in a wide variety of scenarios, keyboards become inefficient for text entry.In this paper,we e.g.,the office,coffee shops,outdoors,etc. present CamK,a camera-based text-entry method,which uses an arbitrary panel (e.g,a piece of paper)with a keyboard layout to input text into small devices.Camk captures the images during the typing process and uses the image processing Please TYPE technique to recognize the typing behavior.The principle of CamK is to extract the keys,track the user's fingertips,detect Typing FINISIHES I and localize the keystroke.To achieve high accuracy of keystroke Candidate Keys' localization and low false positive rate of keystroke detection, CamK introduces the initial training and online calibration. Start Stop screen keyboard Add Additionally,CamK optimizes computation-intensive modules to Speed:1.92cps The camera turned OFF. reduce the time latency.We implement CamK on a mobile device running Android.Our experiment results show that CamK can achieve above 95%accuracy of keystroke localization,with only Fig.1.A typical use case of CamK. 4.8%false positive keystrokes.When compared to on-screen There are three key technical challenges in CamK.(1)High keyboards,CamK can achieve 1.25X typing speedup for regular accuracy of keystroke localization:The inter-key distance in text input and 2.5X for random character input. the paper keyboard is only about two centimeters [10].While I.INTRODUCTION using image processing techniques,there may exist a position deviation between the real fingertip and the detected fingertip Recently,mobile devices have converged to a relatively To address this challenge,CamK introduces the initial training small form factor (e.g.,smartphones,Apple Watch),in order to get the optimal parameters for image processing.Besides, to be carried everywhere easily,while avoiding carrying bulky CamK uses an extended region to represent the detected laptops all the time.Consequently,interacting with small fingertip,aiming to tolerate the position deviation.In addition, mobile devices involves many challenges,a typical example CamK utilizes the features (e.g.,visually obstructed area is text input without a physical keyboard. of the pressed key)of a keystroke to verify the validity Currently,many visual keyboards are proposed.However, of a keystroke.(2)Low false positive rate of keystroke wearable keyboards [1].[2]introduce additional equipments. detection:A false positive occurs when a non-keystroke (i.e.. On-screen keyboards [3],[4]usually take up a large area a period in which no fingertip is pressing any key)is treated on the screen and only support single finger for text entry. as a keystroke.To address this challenge,CamK combines Projection keyboards [5]-[9]often need an infrared or visible keystroke detection with keystroke localization.If there is light projector to display the keyboard to the user.Audio signal not a valid key pressed by the fingertip,CamK will remove 10]or camera based visual keyboards [11]-13]remove the the possible non-keystroke.Besides,CamK introduces online additional hardware.By leveraging the microphone to localize calibration to further remove the false positive keystrokes. the keystrokes,UbiK [10]requires the user to click keys with (3)Low latency:When the user presses a key on the their fingertips and nails to make an audible sound,which is paper keyboard,CamK should output the character of the not typical of typing.For existing camera based keyboards, key without any noticeable latency.Usually,the computation they either slow the typing speed [12],or should be used inin image processing is heavy,leading to large time latency controlled environments [13].They can not provide a similar in keystroke localization.To address this challenge,CamK user experience to using physical keyboards [11]. changes the sizes of images,optimizes the image processing In this paper,we propose CamK,a more natural and process,adopts multiple threads,and removes the operations intuitive text-entry method,in order to provide a PC-like text- of writing/reading images,in order to make CamK work on entry experience.CamK works with the front-facing camera the mobile device. of the mobile device and a paper keyboard,as shown in Fig.1. We make the following contributions in this paper. CamK takes pictures as the user types on the paper keyboard, We propose a novel method CamK for text-entry.CamK and uses image processing techniques to detect and localize only uses the camera of the mobile device and a paper 978-1-4673-9953-1/16/$31.00©20161EEE
CamK: a Camera-based Keyboard for Small Mobile Devices Yafeng Yin† , Qun Li‡ , Lei Xie† , Shanhe Yi‡ , Edmund Novak‡ , Sanglu Lu† †State Key Laboratory for Novel Software Technology, Nanjing University, China ‡College of William and Mary, Williamsburg, VA, USA Email: †yyf@dislab.nju.edu.cn, †{lxie, sanglu}@nju.edu.cn, ‡{liqun, syi, ejnovak}@cs.wm.edu Abstract—Due to the smaller size of mobile devices, on-screen keyboards become inefficient for text entry. In this paper, we present CamK, a camera-based text-entry method, which uses an arbitrary panel (e.g., a piece of paper) with a keyboard layout to input text into small devices. CamK captures the images during the typing process and uses the image processing technique to recognize the typing behavior. The principle of CamK is to extract the keys, track the user’s fingertips, detect and localize the keystroke. To achieve high accuracy of keystroke localization and low false positive rate of keystroke detection, CamK introduces the initial training and online calibration. Additionally, CamK optimizes computation-intensive modules to reduce the time latency. We implement CamK on a mobile device running Android. Our experiment results show that CamK can achieve above 95% accuracy of keystroke localization, with only 4.8% false positive keystrokes. When compared to on-screen keyboards, CamK can achieve 1.25X typing speedup for regular text input and 2.5X for random character input. I. INTRODUCTION Recently, mobile devices have converged to a relatively small form factor (e.g., smartphones, Apple Watch), in order to be carried everywhere easily, while avoiding carrying bulky laptops all the time. Consequently, interacting with small mobile devices involves many challenges, a typical example is text input without a physical keyboard. Currently, many visual keyboards are proposed. However, wearable keyboards [1], [2] introduce additional equipments. On-screen keyboards [3], [4] usually take up a large area on the screen and only support single finger for text entry. Projection keyboards [5]–[9] often need an infrared or visible light projector to display the keyboard to the user. Audio signal [10] or camera based visual keyboards [11]–[13] remove the additional hardware. By leveraging the microphone to localize the keystrokes, UbiK [10] requires the user to click keys with their fingertips and nails to make an audible sound, which is not typical of typing. For existing camera based keyboards, they either slow the typing speed [12], or should be used in controlled environments [13]. They can not provide a similar user experience to using physical keyboards [11]. In this paper, we propose CamK, a more natural and intuitive text-entry method, in order to provide a PC-like textentry experience. CamK works with the front-facing camera of the mobile device and a paper keyboard, as shown in Fig. 1. CamK takes pictures as the user types on the paper keyboard, and uses image processing techniques to detect and localize keystrokes. CamK can be used in a wide variety of scenarios, e.g., the office, coffee shops, outdoors, etc. Fig. 1. A typical use case of CamK. There are three key technical challenges in CamK. (1) High accuracy of keystroke localization: The inter-key distance in the paper keyboard is only about two centimeters [10]. While using image processing techniques, there may exist a position deviation between the real fingertip and the detected fingertip. To address this challenge, CamK introduces the initial training to get the optimal parameters for image processing. Besides, CamK uses an extended region to represent the detected fingertip, aiming to tolerate the position deviation. In addition, CamK utilizes the features (e.g., visually obstructed area of the pressed key) of a keystroke to verify the validity of a keystroke. (2) Low false positive rate of keystroke detection: A false positive occurs when a non-keystroke (i.e., a period in which no fingertip is pressing any key) is treated as a keystroke. To address this challenge, CamK combines keystroke detection with keystroke localization. If there is not a valid key pressed by the fingertip, CamK will remove the possible non-keystroke. Besides, CamK introduces online calibration to further remove the false positive keystrokes. (3) Low latency: When the user presses a key on the paper keyboard, CamK should output the character of the key without any noticeable latency. Usually, the computation in image processing is heavy, leading to large time latency in keystroke localization. To address this challenge, CamK changes the sizes of images, optimizes the image processing process, adopts multiple threads, and removes the operations of writing/reading images, in order to make CamK work on the mobile device. We make the following contributions in this paper. • We propose a novel method CamK for text-entry. CamK only uses the camera of the mobile device and a paper IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications 978-1-4673-9953-1/16/$31.00 ©2016 IEEE
keyboard,which is easy to carry.CamK allows the user satisfied.Thus this feature is used to assist in keystroke to type with all the fingers and provides a similar user localization,instead of directly determining a keystroke. experience to using physical keyboards. We design a practical framework for CamK,which can III.SYSTEM DESIGN detect and localize the keystroke with high accuracy, As shown in Fig.1,CamK works with a mobile device (e.g., and output the character of the pressed key without any noticeable time latency.Based on image processing, a smartphone)with the embedded camera,a paper keyboard. CamK can extract the keys,track the user's fingertips, The smartphone uses the front-facing camera to watch the detect and localize keystrokes.Besides,CamK introduces typing process.The paper keyboard is placed on a flat surface. the initial training to optimize the image processing The objective is to let the keyboard layout be located in the camera's view,while making the keys in the camera's view result and utilizes online calibration to reduce the false look as large as possible.CamK does not require the keyboard positive keystrokes.Additionally,CamK optimizes the computation-intensive modules to reduce the time latency, layout is fully located in the camera's view,because sometimes in order to make CamK work on the mobile devices. the user may only want to input letters or digits.Even if some We implement CamK on a smartphone running Google's part of the keyboard is out of the camera's view,CamK can Android operating system(version 4.4.4).We first mea- still work.CamK consists of the following four components: key extraction,fingertip detection,keystroke detection and sure the performance of each module in CamK.Then. we invite nine users!to evaluate CamK in a variety of localization,and text-entry determination. real-world environments.We compare the performance Key Extraction Text-entry of CamK with other methods,in terms of keystroke determination 6 localization accuracy and text-entry speed. smi山I W次es II.OBSERVATIONS OF A KEYSTROKE Keystroke Detection and Localization In order to show the feasibility of localizing the keystroke Caadidate fiagertip selection ae i-l based on image processing techniques,we first show the observations of a keystroke.Fig.2 shows the frames/images Fingertip Detection captured by the camera during two consecutive keystrokes. Chering the prevd kev The origin of coordinates is located in the top left corner of the image,as shown in Fig.2(a).We call the hand located in the left area of the image the left hand,while the other No fisp is called the right hand,as shown in Fig.2(b).From left to right,the fingers are called finger i in sequence,iE[1,10], Fig.3.Architecture of CamK. as shown in Fig.2(c).The fingertip pressing the key is called StrokeTip.The key pressed by StrokeTip is called StrokeKey. A.System Overview The StrokeTip has the largest vertical coordinate among the fingers on the same hand.An example is finger 9 in The architecture of CamK is shown in Fig.3.The input is Fig.2(a).However this feature may not work well for the image taken by the camera and the output is the character thumbs,which should be identified separately. of the pressed key.Before a user begins typing,CamK uses The StrokeTip stays on the StrokeKey for a certain dura- Key Extraction to detect the keyboard and extract each key tion,as shown in Fig.2(c)-Fig.2(d).If the positions of from the image.When the user types,CamK uses Fingertip the fingertip keep unchanged,a keystroke may happen. Detection to extract the user's hands and detect the fingertip The StrokeTip is located in the StrokeKey.as shown in based on the shape of a finger,in order to track the fingertips. Fig.2(a).Fig.2(d). Based on the movements of fingertips,CamK uses Keystroke The StrokeTip obstructs the StrokeKey from the view Detection and Localization to detect a possible keystroke of the camera,as shown in Fig.2(d).The ratio of the and localize the keystroke.Finally,CamK uses Text-entry visually obstructed area to the whole area of the key can Determination to output the character of the pressed key. be used to verify whether the key is pressed. B.Key Extraction The StrokeTip has the largest vertical distance between the remaining fingertips of the corresponding hand.As Without loss of generality,CamK adopts the common shown in Fig.2(a),the vertical distance d between the QWERTY keyboard layout,which is printed in black and StrokeTip (i.e..Finger 9)and remaining fingertips in right white on a piece of paper,as shown in Fig.1.In order to hand is larger than that (d)in left hand.Considering the eliminate background effects,we first detect the boundary of difference caused by the distance between the camera the keyboard.Then.we extract each key from the keyboard. and the fingertip,sometimes this feature may not be Therefore,key extraction contains three parts:keyboard de- tection,key segmentation,and mapping the characters to the All data collection in this paper has gone through the IRB approval keys,as shown in Fig.3
keyboard, which is easy to carry. CamK allows the user to type with all the fingers and provides a similar user experience to using physical keyboards. • We design a practical framework for CamK, which can detect and localize the keystroke with high accuracy, and output the character of the pressed key without any noticeable time latency. Based on image processing, CamK can extract the keys, track the user’s fingertips, detect and localize keystrokes. Besides, CamK introduces the initial training to optimize the image processing result and utilizes online calibration to reduce the false positive keystrokes. Additionally, CamK optimizes the computation-intensive modules to reduce the time latency, in order to make CamK work on the mobile devices. • We implement CamK on a smartphone running Google’s Android operating system (version 4.4.4). We first measure the performance of each module in CamK. Then, we invite nine users1 to evaluate CamK in a variety of real-world environments. We compare the performance of CamK with other methods, in terms of keystroke localization accuracy and text-entry speed. II. OBSERVATIONS OF A KEYSTROKE In order to show the feasibility of localizing the keystroke based on image processing techniques, we first show the observations of a keystroke. Fig. 2 shows the frames/images captured by the camera during two consecutive keystrokes. The origin of coordinates is located in the top left corner of the image, as shown in Fig. 2(a). We call the hand located in the left area of the image the left hand, while the other is called the right hand, as shown in Fig. 2(b). From left to right, the fingers are called finger i in sequence, i ∈ [1, 10], as shown in Fig. 2(c). The fingertip pressing the key is called StrokeTip. The key pressed by StrokeTip is called StrokeKey. • The StrokeTip has the largest vertical coordinate among the fingers on the same hand. An example is finger 9 in Fig. 2(a). However this feature may not work well for thumbs, which should be identified separately. • The StrokeTip stays on the StrokeKey for a certain duration, as shown in Fig. 2(c) - Fig. 2(d). If the positions of the fingertip keep unchanged, a keystroke may happen. • The StrokeTip is located in the StrokeKey, as shown in Fig. 2(a), Fig. 2(d). • The StrokeTip obstructs the StrokeKey from the view of the camera, as shown in Fig. 2(d). The ratio of the visually obstructed area to the whole area of the key can be used to verify whether the key is pressed. • The StrokeTip has the largest vertical distance between the remaining fingertips of the corresponding hand. As shown in Fig. 2(a), the vertical distance dr between the StrokeTip (i.e., Finger 9) and remaining fingertips in right hand is larger than that (dl) in left hand. Considering the difference caused by the distance between the camera and the fingertip, sometimes this feature may not be 1All data collection in this paper has gone through the IRB approval satisfied. Thus this feature is used to assist in keystroke localization, instead of directly determining a keystroke. III. SYSTEM DESIGN As shown in Fig. 1, CamK works with a mobile device (e.g., a smartphone) with the embedded camera, a paper keyboard. The smartphone uses the front-facing camera to watch the typing process. The paper keyboard is placed on a flat surface. The objective is to let the keyboard layout be located in the camera’s view, while making the keys in the camera’s view look as large as possible. CamK does not require the keyboard layout is fully located in the camera’s view, because sometimes the user may only want to input letters or digits. Even if some part of the keyboard is out of the camera’s view, CamK can still work. CamK consists of the following four components: key extraction, fingertip detection, keystroke detection and localization, and text-entry determination. .H\([WUDFWLRQ .H\ERDUG GHWHFWLRQ .H\ VHJPHQWDWLRQ 0DSSLQJ )LQJHUWLS'HWHFWLRQ +DQGVHJPHQWDWLRQ )LQJHUWLSGLVFRYHU\ )UDPHL .H\VWURNH'HWHFWLRQDQG/RFDOL]DWLRQ &DQGLGDWHILQJHUWLSVHOHFWLRQ .H\DUHD.H\ORFDWLRQ .H\VWURNH 1RQ NH\VWURNH 7H[WHQWU\ GHWHUPLQDWLRQ 2XWSXW )LQJHUWLSV¶ ORFDWLRQV .H\VWURNHORFDWLRQ %HLQJORFDWHGLQ WKHVDPHNH\IRUnd FRQVHFXWLYHIUDPHV )UDPHj )UDPHi )UDPHi Fig. 3. Architecture of CamK. A. System Overview The architecture of CamK is shown in Fig. 3. The input is the image taken by the camera and the output is the character of the pressed key. Before a user begins typing, CamK uses Key Extraction to detect the keyboard and extract each key from the image. When the user types, CamK uses Fingertip Detection to extract the user’s hands and detect the fingertip based on the shape of a finger, in order to track the fingertips. Based on the movements of fingertips, CamK uses Keystroke Detection and Localization to detect a possible keystroke and localize the keystroke. Finally, CamK uses Text-entry Determination to output the character of the pressed key. B. Key Extraction Without loss of generality, CamK adopts the common QWERTY keyboard layout, which is printed in black and white on a piece of paper, as shown in Fig. 1. In order to eliminate background effects, we first detect the boundary of the keyboard. Then, we extract each key from the keyboard. Therefore, key extraction contains three parts: keyboard detection, key segmentation, and mapping the characters to the keys, as shown in Fig. 3.
(a)Frame 1 (b)Frame 2 (c)Frame 3 (d)Frame 4 (e)Frame 5 Fig.2.Frames during two consecutive keystrokes 1)Keyboard detection:We use Canny edge detection algo-The initial/default value of y is y=50 rithm [14]to obtain the edges of the keyboard.Fig.4(b)shows When we obtain the white pixels,we need to get the the edge detection result of Fig.4(a).However,the interference contours of the keys and separate the keys from one another. edges (e.g.,the paper's edge/longest edge in Fig.4(b))should While considering the pitfall areas such as small white areas be removed.Based on Fig.4(b),the edges of the keyboard which do not belong to any key,we estimate the area of a key should be close to the edges of the keys.We use this feature at first.Based on Fig.4(e),we use P,P2,P3,P to calculate to remove pitfall edges,the result is shown in Fig.4(c). the area S of the keyboard as S6=是·(IBPB×BPa+ Additionally,we adopt the dilation operation [15]to join the PP x PP).Then,we calculate the area of each key. dispersed edge points which are close to each other,in order to We use N to represent the number of keys in the keyboard get better edges/boundaries of the keyboard.After that,we use Considering the size difference between keys,we treat larger the Hough transform [12]to detect the lines in Fig.4(c).Then, keys (e.g.,the space key)as multiple regular keys (e.g.,A-Z, we use the uppermost line and the bottom line to describe 0-9).For example,the space key is treated as five regular keys. the position range of the keyboard,as shown in Fig.4(d). In this way,we will change N to Navg.Then,we can estimate Similarly,we can use the Hough transform [12]to detect the the average area of a regular key as S/Nv In addition to left/right edge of the keyboard.If there are no suitable edges size difference between keys,different distances between the detected by the Hough transform,it is usually because the camera and the keys can also affect the area of a key in the keyboard is not perfectly located in the camera's view.In this image.Therefore,we introduce a.ah to describe the range of case,we simply use the left/right boundary of the image to represent the left/right edge of the keyboard.As shown in a valid area S of a key as 5o We set a=0.15,an =5 in CamK,based on extensive experiments. Fig.4(e),we extend the four edges (lines)to get four inter- The key segmentation result of Fig.4(e)is shown in Fig.4(f) sections Pi(1,21),P2(x2,2),P3(x3,y3),P(4,y4),which Then,we use the location of the space key (biggest key)to are used to describe the boundary of the keyboard. locate other keys,based on the relative locations between keys. C.Fingertip Detection In order to detect keystrokes,CamK needs to detect the fingertips and track the movements of fingertips.Fingertip de- tection consists of hand segmentation and fingertip discovery. (a)An input image (b)Canny edge detec-(c)Optimization for 1)Hand segmentation:Skin segmentation [15]is a com- tion result edges mon method used for hand detection.In YCrCb color space,a pixel (Y,Cr,Cb)is determined to be a skin pixel,if it satisfies Cr E [133,173]and Cb E [77,127].However,the threshold values of Cr and Cb can be affected by the surroundings such as lighting conditions.It is difficult to choose suitable 11r为 threshold values for Cr and Cb.Therefore.we combine Otsu's (d)Position range of (e)Keyboard boundary (f)Key segmentation re- method [16]and the red channel in YCrCb color space for skin keyboard sult segmentation. Fig.4.Keyboard detection and key extraction In YCrCb color space,the red channel Cr is essential to 2)Key segmentation:With the known location of the key- human skin coloration.Therefore,for a captured image,we board.we can extract the keys based on color segmentation. use the grayscale image that is split based on Cr channel In YCrCb space,the color coordinate (Y,Cr,Cb)of a white as an input for Otsu's method.Otsu's method [16]can pixel is (255,128,128),while that of a black pixel is (0, automatically perform clustering-based image thresholding, 128.128).Thus,we can only use the difference of the Y i.e.,it can calculate the optimal threshold to separate the value between the pixels to distinguish the white keys from the foreground and background.Therefore,this skin segmentation black background.If a pixel is located in the keyboard,while approach can tolerate the effect caused by environments such satisfying 255-E<Y<255,the pixel belongs to a key.as lighting conditions.For the input image Fig.5(a),the hand The offsets yEN of Y is mainly caused by light conditions. segmentation result is shown in Fig.5(b),where the white ey can be estimated in the initial training(see section IV-A). regions represent the hand regions,while the black regions
O x y dl dr (a) Frame 1 /HIWKDQG 5LJKWKDQG (b) Frame 2 Finger number 1 2 3 4 5 6 7 8 9 10 (c) Frame 3 (d) Frame 4 (e) Frame 5 Fig. 2. Frames during two consecutive keystrokes 1) Keyboard detection: We use Canny edge detection algorithm [14] to obtain the edges of the keyboard. Fig. 4(b) shows the edge detection result of Fig. 4(a). However, the interference edges (e.g., the paper’s edge / longest edge in Fig. 4(b)) should be removed. Based on Fig. 4(b), the edges of the keyboard should be close to the edges of the keys. We use this feature to remove pitfall edges, the result is shown in Fig. 4(c). Additionally, we adopt the dilation operation [15] to join the dispersed edge points which are close to each other, in order to get better edges/boundaries of the keyboard. After that, we use the Hough transform [12] to detect the lines in Fig. 4(c). Then, we use the uppermost line and the bottom line to describe the position range of the keyboard, as shown in Fig. 4(d). Similarly, we can use the Hough transform [12] to detect the left/right edge of the keyboard. If there are no suitable edges detected by the Hough transform, it is usually because the keyboard is not perfectly located in the camera’s view. In this case, we simply use the left/right boundary of the image to represent the left/right edge of the keyboard. As shown in Fig. 4(e), we extend the four edges (lines) to get four intersections P1(x1, y1), P2(x2, y2), P3(x3, y3), P4(x4, y4), which are used to describe the boundary of the keyboard. (a) An input image (b) Canny edge detection result (c) Optimization for edges (d) Position range of keyboard P1 (x1 , y1 ) P4 (x4 , y4 ) P2 (x2 , y2 ) P3 (x3 , y3 ) (e) Keyboard boundary (f) Key segmentation result Fig. 4. Keyboard detection and key extraction 2) Key segmentation: With the known location of the keyboard, we can extract the keys based on color segmentation. In YCrCb space, the color coordinate (Y, Cr, Cb) of a white pixel is (255, 128, 128), while that of a black pixel is (0, 128, 128). Thus, we can only use the difference of the Y value between the pixels to distinguish the white keys from the black background. If a pixel is located in the keyboard, while satisfying 255 − εy ≤ Y ≤ 255, the pixel belongs to a key. The offsets εy ∈ N of Y is mainly caused by light conditions. εy can be estimated in the initial training (see section IV-A). The initial/default value of εy is εy = 50. When we obtain the white pixels, we need to get the contours of the keys and separate the keys from one another. While considering the pitfall areas such as small white areas which do not belong to any key, we estimate the area of a key at first. Based on Fig. 4(e), we use P1, P2, P3, P4 to calculate the area Sb of the keyboard as Sb = 1 2 · (| −−−→ P1P2 × −−−→ P1P4| + | −−−→ P3P4 × −−−→ P3P2|). Then, we calculate the area of each key. We use N to represent the number of keys in the keyboard. Considering the size difference between keys, we treat larger keys (e.g., the space key) as multiple regular keys (e.g., A-Z, 0-9). For example, the space key is treated as five regular keys. In this way, we will change N to Navg. Then, we can estimate the average area of a regular key as Sb/Navg. In addition to size difference between keys, different distances between the camera and the keys can also affect the area of a key in the image. Therefore, we introduce αl , αh to describe the range of a valid area Sk of a key as Sk ∈ [αl · Sb Navg , αh · Sb Navg ]. We set αl = 0.15, αh = 5 in CamK, based on extensive experiments. The key segmentation result of Fig. 4(e) is shown in Fig. 4(f). Then, we use the location of the space key (biggest key) to locate other keys, based on the relative locations between keys. C. Fingertip Detection In order to detect keystrokes, CamK needs to detect the fingertips and track the movements of fingertips. Fingertip detection consists of hand segmentation and fingertip discovery. 1) Hand segmentation: Skin segmentation [15] is a common method used for hand detection. In YCrCb color space, a pixel (Y, Cr, Cb) is determined to be a skin pixel, if it satisfies Cr ∈ [133, 173] and Cb ∈ [77, 127]. However, the threshold values of Cr and Cb can be affected by the surroundings such as lighting conditions. It is difficult to choose suitable threshold values for Cr and Cb. Therefore, we combine Otsu’s method [16] and the red channel in YCrCb color space for skin segmentation. In YCrCb color space, the red channel Cr is essential to human skin coloration. Therefore, for a captured image, we use the grayscale image that is split based on Cr channel as an input for Otsu’s method. Otsu’s method [16] can automatically perform clustering-based image thresholding, i.e., it can calculate the optimal threshold to separate the foreground and background. Therefore, this skin segmentation approach can tolerate the effect caused by environments such as lighting conditions. For the input image Fig. 5(a), the hand segmentation result is shown in Fig. 5(b), where the white regions represent the hand regions, while the black regions
00.0) 0(0.0) represent the background.However,around the hands,there exist some interference regions,which may change the con- B4g(g,y4g) tours of fingers,resulting in detecting wrong fingertips.Thus. P+g(+g+g CamK introduces the erosion and dilation operations [17]. P(.y) Ly P-(x-4-) P(x:y) We first use the erosion operation to isolate the hands from keys and separate each finger.Then,we use the dilation (a)Fingertips (excluding thumbs) (b)A thumb operation to smooth the edge of the fingers.Fig.5(c)shows Fig.6.Features of a fingertip the optimized result of hand segmentation.Intuitively,if the In fingertip detection,we only need to detect the points color of the user's clothes is close to his/her skin color,the located in the bottom edge (from the left most point to the hand segmentation result will become worse.At this time,we right most point)of the hand,such as the blue contour of only focus on the hand region located in the keyboard area. right hand in Fig.5(d).The shape feature 0;and the positions Due to the color difference between the keyboard and human in vertical coordinates yi along the bottom edge are shown skin,CamK can still extract the hands efficiently. Fig.5(e).If we can detect five fingertips in a hand with; and yi,we will not detect the thumb specially. Otherwise,we detect the fingertip of the thumb in the right most area of left hand or left most area of right hand according to 0i and i-,i.i+.The detected fingertips of Fig.5(a) are marked in Fig.5(f). D.Keystroke Detection and Localization (a)An input image (b)Hand segmentation (c)Optimization When CamK detects the fingertips,it will track the fin- gertips to detect a possible keystroke and localize it.The keystroke localization result can be used to remove false pos- itive keystrokes.We illustrate the whole process of keystroke detection and localization together. 1)Candidate fingertip in each hand:CamK allows the (d)Fingers'contour (e)Fingertip discovery (f)Fingertips user to use all the fingers for text-entry,thus the keystroke Fig.5.Fingertip detection may be caused by the left or right hand.According to the 2)Fingertip discovery:After we extract the fingers,we observations (see section II),the fingertip (i.e.,StrokeTip) need to detect the fingertips.As shown in Fig.6(a).the pressing the key usually has the largest vertical coordinate in fingertip is usually a convex vertex of the finger.For a point that hand.Therefore,we first select the candidate fingertip with Pi(xi,yi)located in the contour of a hand,by tracing the the largest vertical coordinate in each hand.We respectively contour,we can select the point Pi(i)before P use C and C.to represent the points located in the contour and the point P+(after Pi.Here,i,q.E N.We of left hand and right hand.For a point P(,y)EC.if calculate the angle between the two vectors PP-PP+ P satisfies≥(P,(rj,y)∈C,j≠I),then乃will be according to Eq.(1).In order to simplify the calculation for, selected as the candidate fingertip in the left hand.Similarly, we map 0i in the range0a∈[0°,l80].If0:∈[a,0h],ai<0h, we can get the candidate fingertip P(r,yr)in the right hand. we call P:a candidate vertex.Considering the relative lo- In this step,we only need to get P and P to know the moving cations of the points,Pi should also satisfy yi>y and states of hands.It is unnecessary to detect other fingertips. yi>+Otherwise,Pi will not be a candidate vertex.If 2)Moving or staying:As described in the observations, there are multiple candidate vertexes,such as P in Fig.6(a), when the user presses a key,the fingertip will stay at that key for a certain duration.Therefore,we can use the loca- we choose the vertex which has the largest vertical coordinate, as P shown in Fig.6(a).Because this point has the largest tion variation of the candidate fingertip to detect a possible probability to be a fingertip.Based on extensive experiments, keystroke.In Frame i,we use P(L,y)and Pr(r,y) to represent the candidate fingertips in the left hand and right we set 0=60°,0h=150°,q=20 in this paper. hand,respectively.Based on Fig.5,the interference regions around a fingertip may affect the contour of the fingertip,there 0;arccos- PP-·PP+g (1) PP-gl·lPP+gl may exist a position deviation between the real fingertip and the detected fingertip.Therefore,if the candidate fingertips in Considering the specificity of thumbs,which may press the frame i-1,i satisfy Eq.(2),the fingertips can be treated as key (e.g.,space key)in a different way from other fingers, static,i.e.,a keystroke probably happens.Based on extensive the relative positions of P-P,P+may change.Fig.6(b) experiments,we set Ar =5 empirically. shows the thumb in the left hand.Obviously,Pi-Pi.P+ do not satisfy yi>yi-g and yi>yi+.Therefore,we use V:-4户+(-卫≤△r, (2) (i-i-)(i+)>0 to describe the relative locations of V(Em-xm-12+(r-r4-1P≤△r Pi-q,P,P+a in thumbs.Then,we choose the vertex which 3)Discovering the pressed key:For a keystroke,the finger- has the largest vertical coordinate as the fingertip. tip is located at the key and a part of the key will be visually
represent the background. However, around the hands, there exist some interference regions, which may change the contours of fingers, resulting in detecting wrong fingertips. Thus, CamK introduces the erosion and dilation operations [17]. We first use the erosion operation to isolate the hands from keys and separate each finger. Then, we use the dilation operation to smooth the edge of the fingers. Fig. 5(c) shows the optimized result of hand segmentation. Intuitively, if the color of the user’s clothes is close to his/her skin color, the hand segmentation result will become worse. At this time, we only focus on the hand region located in the keyboard area. Due to the color difference between the keyboard and human skin, CamK can still extract the hands efficiently. (a) An input image (b) Hand segmentation (c) Optimization (d) Fingers’contour 3RLQWVHTXHQFH $QJOHq 9HUWLFDOFRRUGLQDWH (e) Fingertip discovery (f) Fingertips Fig. 5. Fingertip detection 2) Fingertip discovery: After we extract the fingers, we need to detect the fingertips. As shown in Fig. 6(a), the fingertip is usually a convex vertex of the finger. For a point Pi(xi , yi) located in the contour of a hand, by tracing the contour, we can select the point Pi−q(xi−q, yi−q) before Pi and the point Pi+q(xi+q, yi+q) after Pi . Here, i, q ∈ N. We calculate the angle θi between the two vectors −−−−→ PiPi−q, −−−−→ PiPi+q, according to Eq. (1). In order to simplify the calculation for θi , we map θi in the range θi ∈ [0◦ , 180◦ ]. If θi ∈ [θl , θh], θl < θh, we call Pi a candidate vertex. Considering the relative locations of the points, Pi should also satisfy yi > yi−q and yi > yi+q. Otherwise, Pi will not be a candidate vertex. If there are multiple candidate vertexes, such as P ′ i in Fig. 6(a), we choose the vertex which has the largest vertical coordinate, as Pi shown in Fig. 6(a). Because this point has the largest probability to be a fingertip. Based on extensive experiments, we set θl = 60◦ , θh = 150◦ , q = 20 in this paper. θi = arccos −−−−→ PiPi−q · −−−−→ PiPi+q | −−−−→ PiPi−q| · |−−−−→ PiPi+q| (1) Considering the specificity of thumbs, which may press the key (e.g., space key) in a different way from other fingers, the relative positions of Pi−q, Pi , Pi+q may change. Fig. 6(b) shows the thumb in the left hand. Obviously, Pi−q, Pi , Pi+q do not satisfy yi > yi−q and yi > yi+q. Therefore, we use (xi−xi−q)·(xi−xi+q) > 0 to describe the relative locations of Pi−q, Pi , Pi+q in thumbs. Then, we choose the vertex which has the largest vertical coordinate as the fingertip. Ti ( , ) P x y i i i ( , ) P x y i q i q i q ( , ) P x y i q i q i q ' Pi O (0,0) x y (a) Fingertips (excluding thumbs) Ti ( , ) P x y i i i ( , ) P x y i q i q i q ( , ) P x y i q i q i q ' Pi O (0,0) x y (b) A thumb Fig. 6. Features of a fingertip In fingertip detection, we only need to detect the points located in the bottom edge (from the left most point to the right most point) of the hand, such as the blue contour of right hand in Fig. 5(d). The shape feature θi and the positions in vertical coordinates yi along the bottom edge are shown Fig. 5(e). If we can detect five fingertips in a hand with θi and yi−q, yi , yi+q, we will not detect the thumb specially. Otherwise, we detect the fingertip of the thumb in the right most area of left hand or left most area of right hand according to θi and xi−q, xi , xi+q. The detected fingertips of Fig. 5(a) are marked in Fig. 5(f). D. Keystroke Detection and Localization When CamK detects the fingertips, it will track the fingertips to detect a possible keystroke and localize it. The keystroke localization result can be used to remove false positive keystrokes. We illustrate the whole process of keystroke detection and localization together. 1) Candidate fingertip in each hand: CamK allows the user to use all the fingers for text-entry, thus the keystroke may be caused by the left or right hand. According to the observations (see section II), the fingertip (i.e., StrokeTip) pressing the key usually has the largest vertical coordinate in that hand. Therefore, we first select the candidate fingertip with the largest vertical coordinate in each hand. We respectively use Cl and Cr to represent the points located in the contour of left hand and right hand. For a point Pl(xl , yl) ∈ Cl , if Pl satisfies yl ≥ yj (∀Pj (xj , yj ) ∈ Cl , j 6= l), then Pl will be selected as the candidate fingertip in the left hand. Similarly, we can get the candidate fingertip Pr(xr, yr) in the right hand. In this step, we only need to get Pl and Pr to know the moving states of hands. It is unnecessary to detect other fingertips. 2) Moving or staying: As described in the observations, when the user presses a key, the fingertip will stay at that key for a certain duration. Therefore, we can use the location variation of the candidate fingertip to detect a possible keystroke. In Frame i, we use Pli (xli , yli ) and Pri (xri , yri ) to represent the candidate fingertips in the left hand and right hand, respectively. Based on Fig. 5, the interference regions around a fingertip may affect the contour of the fingertip, there may exist a position deviation between the real fingertip and the detected fingertip. Therefore, if the candidate fingertips in frame i − 1, i satisfy Eq. (2), the fingertips can be treated as static, i.e., a keystroke probably happens. Based on extensive experiments, we set ∆r = 5 empirically. p (xli − xli−1 ) 2 + (yli − yli−1 ) p 2 ≤ ∆r, (xri − xri−1 ) 2 + (yri − yri−1 ) 2 ≤ ∆r. (2) 3) Discovering the pressed key: For a keystroke, the fingertip is located at the key and a part of the key will be visually
obstructed by that fingertip,as shown in Fig.2(d).We treat the the pressed key,it is visually obstructed by the fingertip,as the thumb as a special case,and also select it as a candidate fin- dashed area of key Ki shown in Fig.7(a).We use the coverage gertip at first.Then,we get the candidate fingertip set Cp= ratio to measure the visually obstructed area of a candidate key, P,Pr left thumb in frame i,right thumb in frame i).Af- in order to remove the wrong candidate keys.For a candidate ter that,we can localize the keystroke by using Alg.1. key Ki,whose area is S,the visually obstructed area is Dj. Eliminating impossible fingertips:For convenience,we use Pi to represent the fingertip in Ctip.i.e..PE Ctip,iE hncoveraisFor larger key (the [1,4].If a fingertip P is not located in the keyboard region, space key),we update the p&,by multiplying a key size factor CamK eliminates it from the candidate fingertips Ctip. fi,i.e..pk =min(se j 1).where fj=SK/Sk.Here,5 means the average area of a key,as described in section III-B2. 00.0) If pk,>p,the key Kj is still a candidate key.Otherwise, (xyn) () A B CamK removes it from the candidate key set Ce We set pi- 0.25 in this paper.For each hand,if there is more than one (x,y2 candidate key,we will keep the key with largest coverage ratio D (y4ya) (x,ya) as the final candidate key.For a candidate fingertip,if there is no candidate key associated with it,the candidate fingertip (a)Candidate keys (b)Locating a fingertip will be eliminated.Fig.8(c)shows each candidate fingertip Fig.7.Candidate keys and Candidate fingertips and its associated key. Selecting the nearest candidate keys:For each candidate fingertip P,we first search the candidate keys which are probably pressed by P.As shown in Fig.7(a),although the real fingertip is P,the detected fingertip is P.We use P to search the candidate keys.We use Kej(rei,yei)to represent the centroid of key Kj.We get two rows of keys nearest the location P()(i.e.,the rows with two smallest ye. For each row,we select two nearest keys (i.e.,the keys with (a)Keys around the fingertip (b)Keys containing the fingertip two smallest ci).In Fig.7(a),the candidate key set Ckey is consisted of K1,K2,K3,K4.Fig.8(a)shows the candidate keys of the fingertip in each hand. Keeping candidate keys containing the candidate finger- tip:If a key is pressed by the user,the fingertip will be located in that key.Thus we use the location of the fingertip P(,)to verify whether a candidate key contains the fingertip,in order to remove the invalid candidate keys.As (c)Visually obstructed key (d)Vertical distance with re shown in Fig.7(a),there exists a small deviation between maining fingertips the real fingertip and the detected fingertip.Therefore,we Fig.8.Candidate fingertips/keys in each step extend the range of the detected fingertip to Ri,as shown in 4)Vertical distance with remaining fingertips:Until now, Fig.7(a).If any point P(k,)in the range Ri is located in there is one candidate fingertip in each hand at most.If there a candidate key Ki,P is considered to be located in Ki.R are no candidate fingertips,then we infer that no keystroke is calculated as {PRilv()2+()2<Ar},happens.If there is only one candidate fingertip,then the we set Ar -5 empirically. fingertip is the StrokeTip,and the associated candidate key is As shown in Fig.7(b),a key is represented as a quadrangle StrokeKey.However,if there are two candidate fingertips,we ABCD.If a point is located in ABCD,when we move will utilize the vertical distance between the candidate fingertip around ABCD clockwise,the point will be located in the and the remaining fingertips to choose the most probable right side of each edge in ABCD.As shown in Fig.2(a).the StrokeTip,as shown in Fig.2(a). origin of coordinates is located in the top left corner of the We use P(x,)and P(xr,r)to represent the candidate image.Therefore,if the fingertip PRi satisfy Eq.(3),it fingertips in the left hand and right hand,respectively.Then is located in the key.CamK will keep it as a candidate key. we calculate the distance d between P and the remaining Otherwise,CamK removes the key from the candidate key set fingertips in left hand,and the distance d between Pr and Ce.In Fig.7(a).K1,K2 are the remaining candidate keys.the remaining fingertips in right hand.Here,d The candidate keys contain the fingertip in Fig.8(a)is shown ∑i5,j≠,while d,,=lr-}·∑0,i≠r小.Here in Fig.8(b). yi represents the vertical coordinate of fingertip j.If d>d, AB×AP≥0,BC×BP≥0, we choose P as the StrokeTip.Otherwise,we choose B.as the Cb×C2≥0,DA×Dp≥0. (3)StrokeTip.The associated key for the StrokeTip is the pressed key StrokeKey.In Fig.8(d),we choose fingertip 3 in the left Calculating the coverage ratios of candidate keys:For hand as the StrokeTip.However,based on the observations,the