Preliminary: q-gram g-gram of the substring with length g youtbecom 2-gram om
Preliminary: q-gram • q-gram of the substring with length q yo ou uttbbe ec co om youtbecom 2-gram
Preliminary: g-gram 1 edit operation destroies at most g grams yout decom U td e ec cO om τ edit operations destroy at most gτ grams if r and s have more than gt mismatch grams, ED(r, s)>t
d d d Preliminary: q-gram • 1 edit operation destroies at most q grams. • τ edit operations destroy at most qτ grams. • if r and s have more than qτ mismatch grams, ED(r, s)>τ. yout ecom yo ou ut t e ec co om
Preliminary: Prefix Filter Sort all q-grams by global ordering, such as idf q(r): the sorted q-gram set of string r Pre(r) suffix/r) Pre() is the prefiⅸofq( /Pre()=q+1 Prels) qls): The sorted g-gram set of string s Prefix Filter: f pre(r)n pre(s)-, ED(rs)>t
Preliminary: Prefix Filter Sort all q-grams by global ordering, such as idf Pre(s) q(r) : The sorted q-gram set of string r Pre(r) q(s): The sorted q-gram set of string s Pre(•) is the prefix of q(•) |Pre(•)|= qτ+1 Prefix Filter: If pre(r) ∩ pre(s) = ϕ, ED(r,s) > τ suffix(r)
Preliminary: Prefix Filter Sort all q-grams by global ordering, such as idf q(r): the sorted q-gram set of string r Pre(r) suffix/r) 919 guan [1LT Pre() is the prefix of ql) >910>910>910>910>91>910 / Pre(/=gt+1 gaIglglg glgn 9 I⊥L⊥工 Prels) qls): The sorted g-gram set of string s Prefix Filter: f pre(r)n pre(s)-, ED(rs)>t
Preliminary: Prefix Filter Sort all q-grams by global ordering, such as idf Pre(s) g1 g2 g5 g6 g11 g12 g13 g3 g4 g7 g8 g9 g10 g12 q(r) : The sorted q-gram set of string r Pre(r) q(s): The sorted q-gram set of string s Pre(•) is the prefix of q(•) |Pre(•)|= qτ+1 Prefix Filter: If pre(r) ∩ pre(s) = ϕ, ED(r,s) > τ >g10 >g10 >g10 >g10 >g10 >g10 suffix(r)
Preliminary: disjoint q-gram One edit operation destroies at most 1 disjoint gram youtdecom yo ut ae om t edit operations destroy at most t disjoint grams if r and s have more than t mismatch disjoint grams, ED(r, s)>τ
d d Preliminary: disjoint q-gram • One edit operation destroies at most 1 disjoint gram. • τ edit operations destroy at most τ disjoint grams. • if r and s have more than τ mismatch disjoint grams, ED(r, s)> τ yout ecom e yo ut om