This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TMC.2020.3034354.IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 100 --6 Mf-4.0,△h M.0.-4h 4 Time(s) (a)3-D coordinate system of the scenario (a)Time delays at different time Constraint 2 M-, M:,0) 200 400 600 800 Serial number i of the segments (b)Simplified 2-D coordinate system (b)Maximum correlation distribution Fig.4:The model of the scenario. Fig.5:Cross-correlation of the corresponding segments. The coordinate of Mi and M2 is(-1,0,Ah)and (1,0,Ah). only 0.5 meter,which means we can approximately consider 2l represents the horizontal distance between the two micro- that the position of the sound source remains unchanged in phones and 2Ah represents the height difference between one segment. the two microphones when the mobile phone is in landscape After the segmentation of the acoustic signals,we get orientation.S(r,y,h)represents the sound source.h of two sequences of segments S1 {Wi1W12...Win},S2 S(z,y,h)represents the height difference between the x- {W21W22...W2n}from the top and the bottom micro- y plane and the sound source.Since Ah<l and hz or phones respectively.The following equations calculate the y,we can simplify the scenario into a 2-D model as shown cross-correlations Ri and delays Adi,where i represents the in figure 4b which means h and Ah can be ignored. serial number of the segment pairs: 3.2.2 Preprocessing of the Acoustic Signals B.(n)Wa(m)War(m +n). (2) In this section we split the acoustic signals into small seg- m=-ns ments and calculate the cross-correlation of the correspond- △d=arg max(R:(t): (3) ing segments to get the time delay. To further study the time delay between the two acoustic After we get the result of cross-correlation Ri(n),we may signals,we need to split the signals into segments sorted by find the the largest element Ri(t).And the Adi =t who time.The size of the segment needs to be discussed.We will makes Ri(n)largest is the time delay of the i-th pair of get one time delay from one pair of corresponding segments. segments. The more sampling points one segment includes,the more After we get the time delay Ad between the correspond- time one segment will last for.As a result,the fewer segment ing segments,we want to know whether Ad is suitable for pairs and time delays we will get.This will cause two our system.Some points we get from the equation may be troubles.First,the automobile will change its position in one erroneous due to different kinds of noises.The time delays segment.If the size is too large,the automobile will drive with little noise,which are suitable for further calculation for a long distance.This makes the time delay inaccurate should satisfy thethe following constraints: since the sound source cannot no longer be considered 1)The delay Ad should be less than the maximum as a point.Second,if the amount of time delays is too time delay Adm determined by the type of the small,the time delay curve we draw will be coarse-grained. This influence the estimation precision.However,the fewer mobile phone. 2) The correlation of the corresponding segments sampling points one segment includes,the more easily the should exceed a preset threshold R. segment will be influenced by the environment noise.As a result,we need to choose an appropriate segment size. The upper bound of the valid delay in constraint 1 is In our scenario,we let one segment consist of ns inferred from triangle inequality.We can see from figure 4b fs/100 =441 samples,which means one segment lasts that the M1S-M2S<MM2,where M1S-M2S]can for 0.01 second.In this case,the signals from the top and be calculated by the time delay and MM2 is the distance the bottom microphones are similar enough to calculate the between the two microphones.As a result,the value is time delay.Suppose the speed of the automobile is about mainly determined by the distance between the top and 50m/s(180km/h),in one segment the automobile moves bottom microphones.Suppose the sampling rate is fs,the 36-1233(c)2020 IEEE Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply
1536-1233 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2020.3034354, IEEE Transactions on Mobile Computing IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 6 (a) 3-D coordinate system of the scenario. (b) Simplified 2-D coordinate system. Fig. 4: The model of the scenario. The coordinate of M1 and M2 is (−l, 0, ∆h) and (l, 0, ∆h). 2l represents the horizontal distance between the two microphones and 2∆h represents the height difference between the two microphones when the mobile phone is in landscape orientation. S (x, y, h) represents the sound source. h of S (x, y, h) represents the height difference between the xy plane and the sound source. Since ∆h l and h x or y, we can simplify the scenario into a 2-D model as shown in figure 4b which means h and ∆h can be ignored. 3.2.2 Preprocessing of the Acoustic Signals In this section we split the acoustic signals into small segments and calculate the cross-correlation of the corresponding segments to get the time delay. To further study the time delay between the two acoustic signals, we need to split the signals into segments sorted by time. The size of the segment needs to be discussed. We will get one time delay from one pair of corresponding segments. The more sampling points one segment includes, the more time one segment will last for. As a result, the fewer segment pairs and time delays we will get. This will cause two troubles. First, the automobile will change its position in one segment. If the size is too large, the automobile will drive for a long distance. This makes the time delay inaccurate since the sound source cannot no longer be considered as a point. Second, if the amount of time delays is too small, the time delay curve we draw will be coarse-grained. This influence the estimation precision. However, the fewer sampling points one segment includes, the more easily the segment will be influenced by the environment noise. As a result, we need to choose an appropriate segment size. In our scenario, we let one segment consist of ns = fs/100 = 441 samples, which means one segment lasts for 0.01 second. In this case, the signals from the top and the bottom microphones are similar enough to calculate the time delay. Suppose the speed of the automobile is about 50m/s(180km/h), in one segment the automobile moves 02468 Time(s) -50 0 50 100 Time Delay(sample) Constraint 1 (a) Time delays at different time. 0 200 400 600 800 Serial number i of the segments 0 1 2 3 Maximum correlation Constraint 2 (b) Maximum correlation distribution. Fig. 5: Cross-correlation of the corresponding segments. only 0.5 meter, which means we can approximately consider that the position of the sound source remains unchanged in one segment. After the segmentation of the acoustic signals, we get two sequences of segments S1 = {W11W12 . . . W1n}, S2 = {W21W22 . . . W2n} from the top and the bottom microphones respectively. The following equations calculate the cross-correlations Ri and delays ∆di , where i represents the serial number of the segment pairs: Ri(n) = Xns m=−ns W1i(m)W2i(m + n). (2) ∆di = arg max t∈N (Ri(t)). (3) After we get the result of cross-correlation Ri(n), we may find the the largest element Ri(t). And the ∆di = t who makes Ri(n) largest is the time delay of the i-th pair of segments. After we get the time delay ∆d between the corresponding segments, we want to know whether ∆d is suitable for our system. Some points we get from the equation may be erroneous due to different kinds of noises. The time delays with little noise, which are suitable for further calculation should satisfy the the following constraints: 1) The delay ∆d should be less than the maximum time delay ∆dm determined by the type of the mobile phone. 2) The correlation of the corresponding segments should exceed a preset threshold Rs. The upper bound of the valid delay in constraint 1 is inferred from triangle inequality. We can see from figure 4b that the |M1S − M2S| < M1M2, where |M1S − M2S| can be calculated by the time delay and M1M2 is the distance between the two microphones. As a result, the value is mainly determined by the distance between the top and bottom microphones. Suppose the sampling rate is fs, the Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply
IEEE TRANSACTIONS ON MOBILE COMPUTING,VOL.XX,NO.XX,2020 30 Y(m)1 .Invalid Actual Trace D .Valid 6 10 False Trace Maior X(m) Detection Kegion -0.60.4 、0.20.40.6 20 2 Field of View Time(s)】 (a)Time delay curve (a)Hyperbolas generated by the time delays. 20 ·0 riginal -Smoothed Tracel 10 B 0 5 -10 Trace2 L -20 L 2.5 3 3.544.555.56 H H' X Time(s) 0 (b)Smoothed time delay curve. (b)A simplified model of the asymptotes and the trace. Fig.6:Generating Time Delay curve Fig.7:Slope Calculating. maximum valid delay Adm between the two signals should can be determined by the correlation of these time delays be calculated as equation(4): For example,in figure 5b,the threshold can be set as 0.5. 2lfs Constraint 2 can help us remove some of the noise appears △dm= C (4) in Major Detection Region.Figure 6a draws the time delay where 21 is the distance between the two microphones and curve with blue points represent the valid time delays and C is the speed of sound.For example,we set fs =44.1KHz red points represent the invalid time delays. and C=343m/s,and the distance,without loss of gener- ality,of Samsung note 8(Experiments in Section 5 are based 3.2.3 Candidate Trajectories Estimation on this type of mobile phone.)21=15cm,thus Adm is After we get a series of time delays,we need to recover the 0.15m×44100s- trace of the automobile.We utilize Major Detection Region 343m/s -=19.220 samples.We denote the delay between the segment pair as Ad.According to triangle to estimate candidate trajectories of the automobile.The inequality,the valid delay we get from cross-correlation duration of Major Detection Region is less than 3 seconds should be an integer whose absolute value Ad is less than in most situations.For example,in figure 6a the duration of Adm.That means Ad should be an integer ranging from Major Detection Region is 1.5 seconds.Since the duration is -Adm to Adm,just as figure 5a shows.We define the region short we can assume that the trace of the automobile is a where the time delays vary from Adm to -Adm(or on the line.It is known that in two dimensions,the linear trace can contrary)as Major Detection Region.For example,the Major be represented as: Detection Region in figure 5a starts at 2.5s and ends at 5s. y=mx+b. (5) In constraint 2,the threshold R,has its physical interpre- which means that we need two parameters to determine a tation.It implies that the automobile should be close enough line.The parameter m determines the slope of the line and to the mobile phone,which means the signals from the two the parameter b determines the vertical distance between corresponding segments should be similar enough.Cross- the automobile and the mobile phone. correlation is a measure of similarity of two signals.The First we try to calculate the parameter m through the larger the correlation is,the more similar the two signals time delays curve.If the time delay between the top and the will be.If the automobile is far from the mobile phone, bottom microphones is Ad at time t,the automobile should the sound made by the automobile will be too weak to locate in the hyperbolas whose foci are M1(-1,0)and dominate the signal,which means the signals from the top M2 (,0)and vertices are Vi(-zd,0)and V2(zAd,0)at and the bottom microphones are not similar enough.In this this moment.The mathematical expression of the hyperbola case,the cross-correlations of these segments are quite small. is: 12 y2 These time delays are not suitable for speed calculation. a2-2-1, (6) The threshold will change with different scenarios.And the where a=andb=√2-a2 threshold can be determined with constraint 1.Since we Figure 7a shows the hyperbolas generated by different should pay attention to Major Detection Region,we can set time delays.We can see from the figure that the hyper- the maximum cross-correlation of the boundary segments bolas look like a line.The reason is that in our scenario, in Major Detection Region as the threshold.In other words,,yl,where ,y is the coordinate of the automobile in according to constraint 1,there must exist a process in figure 4b,since l is usually shorter than 10cm and z,y are which the time delays are around Adm.The threshold usually longer than 5 meters.So we can use asymptote of Authorized licensed use limited to:Nanjing University.Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore.Restrictions apply
IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, 2020 7 (a) Time delay curve. 2.5 3 3.5 4 4.5 5 5.5 6 Time(s) -20 -10 0 10 20 Time Delay(sample) Original Smoothed t5 t1 t2 t3 t6 t4 t7 (b) Smoothed time delay curve. Fig. 6: Generating Time Delay curve maximum valid delay ∆dm between the two signals should be calculated as equation (4): ∆dm = 2lfs C , (4) where 2l is the distance between the two microphones and C is the speed of sound. For example, we set fs = 44.1KHz and C = 343m/s, and the distance, without loss of generality, of Samsung note 8(Experiments in Section 5 are based on this type of mobile phone.) 2l = 15cm, thus ∆dm is 0.15m×44100s −1 343m/s = 19.2 ≈ 20 samples. We denote the delay between the segment pair as ∆d. According to triangle inequality, the valid delay we get from cross-correlation should be an integer whose absolute value |∆d| is less than ∆dm. That means ∆d should be an integer ranging from −∆dm to ∆dm, just as figure 5a shows. We define the region where the time delays vary from ∆dm to −∆dm(or on the contrary) as Major Detection Region. For example, the Major Detection Region in figure 5a starts at 2.5s and ends at 5s. In constraint 2, the threshold Rs has its physical interpretation. It implies that the automobile should be close enough to the mobile phone, which means the signals from the two corresponding segments should be similar enough. Crosscorrelation is a measure of similarity of two signals. The larger the correlation is, the more similar the two signals will be. If the automobile is far from the mobile phone, the sound made by the automobile will be too weak to dominate the signal, which means the signals from the top and the bottom microphones are not similar enough. In this case, the cross-correlations of these segments are quite small. These time delays are not suitable for speed calculation. The threshold will change with different scenarios. And the threshold can be determined with constraint 1. Since we should pay attention to Major Detection Region, we can set the maximum cross-correlation of the boundary segments in Major Detection Region as the threshold. In other words, according to constraint 1, there must exist a process in which the time delays are around ∆dm. The threshold (a) Hyperbolas generated by the time delays. X Y Trace1 � � � Trace2 �$ �$ �$ � �$ �) �( �' � � �$ (b) A simplified model of the asymptotes and the trace. Fig. 7: Slope Calculating. can be determined by the correlation of these time delays. For example, in figure 5b, the threshold can be set as 0.5. Constraint 2 can help us remove some of the noise appears in Major Detection Region. Figure 6a draws the time delay curve with blue points represent the valid time delays and red points represent the invalid time delays. 3.2.3 Candidate Trajectories Estimation After we get a series of time delays, we need to recover the trace of the automobile. We utilize Major Detection Region to estimate candidate trajectories of the automobile. The duration of Major Detection Region is less than 3 seconds in most situations. For example, in figure 6a the duration of Major Detection Region is 1.5 seconds. Since the duration is short we can assume that the trace of the automobile is a line. It is known that in two dimensions, the linear trace can be represented as: y = mx + b, (5) which means that we need two parameters to determine a line. The parameter m determines the slope of the line and the parameter b determines the vertical distance between the automobile and the mobile phone. First we try to calculate the parameter m through the time delays curve. If the time delay between the top and the bottom microphones is ∆d at time t, the automobile should locate in the hyperbolas whose foci are M1 (−l, 0) and M2 (l, 0) and vertices are V1 − 1 2∆d , 0 and V2 1 2∆d , 0 at this moment. The mathematical expression of the hyperbola is: x 2 a 2 − y 2 b 2 = 1, (6) where a = ∆d 2 and b = √ l 2 − a 2. Figure 7a shows the hyperbolas generated by different time delays. We can see from the figure that the hyperbolas look like a line. The reason is that in our scenario, x, y l, where x, y is the coordinate of the automobile in figure 4b, since l is usually shorter than 10cm and x, y are usually longer than 5 meters. So we can use asymptote of Authorized licensed use limited to: Nanjing University. Downloaded on July 06,2021 at 04:35:27 UTC from IEEE Xplore. Restrictions apply