Query: Schwarzenegger? The user doesnt know the exact spelling Star Title Year Genre Keanu reeves The matrix 1999Sc|-Fi Samuel jackson Iron man 2008sc|-Fi Schwarzenegger The Terminator 1984 Sci-Fi Samuel Jackson The man 2006Crime
6 Query: Schwarzenegger? Star Title Year Genre Keanu Reeves The Matrix 1999 Sci-Fi Samuel Jackson Iron man 2008 Sci-Fi Schwarzenegger The Terminator 1984 Sci-Fi Samuel Jackson The man 2006 Crime The user doesn’t know the exact spelling!
Relaxing Conditions Find movies with a star similar to Schwarrzenger Star Title ear Genre Keanu reeves The matrix 1999 Sci-Fi Samuel Jackson Iron man 2008 Sci-Fi Schwarzeneggerpfrhe Terminator 1984 Sci-Fi Samuel Jackson The man 2006 Crime
7 Relaxing Conditions Star Title Year Genre Keanu Reeves The Matrix 1999 Sci-Fi Samuel Jackson Iron man 2008 Sci-Fi Schwarzenegger The Terminator 1984 Sci-Fi Samuel Jackson The man 2006 Crime Find movies with a star “similar to” Schwarrzenger
String Similarity Search String Similarity Search finds all entries from the dictionary that approximately match the query. ° Applications: Biology, Bioinformatics Information retrieve Data Quality, Data Cleaning 2/2/2021 Topk Search(@ ICD E2013
String Similarity Search finds all entries from the dictionary that approximately match the query. Applications: Biology, Bioinformatics Information Retrieve Data Quality, Data Cleaning …. String Similarity Search 2/2/2021 TopkSearch @ ICDE2013 8/42
Outline ● Motivation ● Problem formulation o Progressive Framework Pivotal Entry-based Method ● Range- based method ° Experiment ● Conclusion 2/2/2021 Topk Search(@ ICD E2013 9/42
Outline Motivation Problem Formulation Progressive Framework Pivotal Entry-based Method Range-based Method Experiment Conclusion 2/2/2021 TopkSearch @ ICDE2013 9/42
Problem formulation o Top-k String Similarity Search: Given a string set s and a query string g, top-k string similarity search returns a string setr sS such that r/=k and for any string reR and sES-R, ED( q s ED(s, q) TABLE I A STRING SET S AND A QUERY q="srajit ID s2 S3 S4 S5 S6 String (sarit (erajD suijt suit(urajitthrifty the top-3 similar strings of srajit 2/2/2021 Topk Search(@ ICD E2013
Problem Formulation 2/2/2021 TopkSearch @ ICDE2013 Top-k String Similarity Search: Given a string set S and a query string q, top-k string similarity search returns a string set R ⊆ S such that |R|=k and for any string r∈ R and s∈ S − R, ED(r, q) ≤ ED(s, q). 10/42 the top-3 similar strings of srajit