Data Structures and Algorithm Xiaoqing Zheng Zhengxq@fudan.edu.cn
Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn
String matching problem Pattern P occurs with shift s in text T(or, equivalently that pattern P occurs beginning at position S I in text T)if Ts+1….s+m]=P1….m」and0≤s≤n-m.If P occurs with shift s in T, the we call s a valid shift The string-matching problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text 1. Text T b c lab b c a b Pattern p albl a
String matching problem Pattern P occurs with shift s in text T (or , equivalently, that pattern P occurs beginning at position s + 1 in text T) if T[ s + 1 ... s + m] = P[1 ... m ] and 0 ≤ s ≤ n − m. If P occurs with shift s in T, the we call s a valid shift. The string-matching problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text T. a b c a b a a b c a b a b a a Text T Pattern P s = 3
Notation and terminology ∑ finite alphabet 2-fa 2 set of all finite-length strings formed using characters from the alphabet 2 zero-length empty string belongs to 2 I x length of a string x concatenation of two string x and y WDx w is a prefix of a string x, if x= wy for some string y∈2 wax w is a suffix of a string x, ifx= yw for some string y∈2 k-character prefix P[l... k] of the pattern P
Notation and terminology finite alphabet. ∑ = { a, b, ..., z }. set of all finite-length strings formed using characters from the alphabet ∑. zero-length empty string belongs to ∑ *. * Σ Σ ε | | x length of a string x. xy concatenation of two string x and y. w is a prefix of a string x, if x = wy for some string y ∈ ∑ *. w x w is a suffix of a string x, if x = yw for some string y ∈ ∑ *. w x k-character prefix P[1 ... k] of the pattern P. Pk
Naive string-matching algorithm NAIVE-STRING-MATCHER(T P 1.n← length[T 2.m← length[P 3.fors←0ton-m 4. do if Pll.m=T[s+l.S+m then print "Pattern occurs with shift" s Running time is O((n-m+ I)m)
Naive string-matching algorithm NAIVE-STRING-MATCHER ( T, P ) 1. n ← length [ T ] 2. m ← length [ P ] 3. for s ← 0 to n − m 4. do if P[1 ... m] = T[ s +1 ... s + m ] 5. then print "Pattern occurs with shift" s Running time is O(( n − m + 1) m )
Idea of rabin-Karp algorithm Input characters and string can be represented by y graphical symbols or digits Given a pattern Pll.ml, let p denote its corresponding decimal value p=P[m]+10(P[m-1)+ 10(P[m-2]+….+10([2]+10P[]).) We also let t denote the decimal value of the length-m substring T[s+1….s+ml,fors=0,12…,n-m Then,t= p if and only if Ts+1….s+m]=P[1….m
Idea of Rabin-Karp algorithm Input characters and string can be represented by graphical symbols or digits. Given a pattern P[1 ... m ], let p denote its corresponding decimal value. Then, ts = p if and only if T[ s + 1 ... s + m] = P[1 ... m ]. We also let ts denote the decimal value of the lengthm substring T[ s + 1 ... s + m ], for s = 0, 1, ..., n − m. p = P [ m] + 10(P [ m − 1]) + 10( P [ m − 2] + ... + 10( P[2] + 10 P[1])...))