Application Scenarios
6 Application Scenarios
Application Scenarios Complex object identification Data quality Real-life data is often dirty 1%0-5% of business data contains errors Dirty costs us businesses 600 billion dollars each year gartner Wrong price data in retail databases alone costs US consumers $2.5 billion annually Data cleaning tools deliver an overall business value of more than 600 million GBP each year at BT Data cleaning FORRESTER Data repairing Record matching(aka object identification, entity resolution, data deduplication) Complex object identification Modeling complex objects as graphs
• Data quality – Real-life data is often dirty: 1%–5% of business data contains errors – Dirty costs us businesses 600 billion dollars each year – Wrong price data in retail databases alone costs US consumers $2.5 billion annually – Data cleaning tools deliver an overall business value of more than ‘‘600 million GBP’’ each year at BT. • Data cleaning – Data repairing – Record matching (aka. object identification, entity resolution, data deduplication) • Complex object identification – Modeling complex objects as graphs Application Scenarios 7 Complex object identification
Application Scenarios Software plagiarism detection [131 Traditional plagiarism detection tools may not be applicable for serious software plagiarism problems A new tool based on graph pattern matching Represent the source codes as program dependence graphs [141 se graph pattern matching to detect plagiarism int sum(int array [], int court) ant 1, sun; 2: declaration. int sum o for (i= O; i< count: 1++)[ sum= add(sum, array[i])i 6: assignment, i=0(0: declaration, int count retm int add(int ant 5: control i< count 1: declaration. retum a+ b: 8. increment i++ 7: assignment, sum=0 9: assignment, sum=addo 3: declaration. int array 4: retum retum sum 10: call-site. add( sum. armaylil 8
Application Scenarios 8 • Traditional plagiarism detection tools may not be applicable for serious software plagiarism problems. • A new tool based on graph pattern matching – Represent the source codes as program dependence graphs [14] . – Use graph pattern matching to detect plagiarism. Software plagiarism detection [13]
Application Scenarios ransport routing【16 T Graph search is a common practice in transportation networks, due to the wide application of Location-Based Services Exam ple: Mark, a driver in the U.S. who wants to go from Irvine to Riverside in california If Mark wants to reach Riverside by his car in the shortest time, the problem can be expressed as the shortest path problem. Then by using existing methods, we can get the shortest path from irvine, ca to Riverside, CA traveling along State Route 261 If Mark drives a truck delivering hazardous materials may not be allowed to cross over some bridges or railroad crossings this time we can use a pattern graph containing specific route constraints(such as regular expressions)to find the optimal transport routes
Application Scenarios 9 • Graph search is a common practice in transportation networks, due to the wide application of Location-Based Services. • Example: Mark, a driver in the U.S. who wants to go from Irvine to Riverside in California. – If Mark wants to reach Riverside by his car in the shortest time, the problem can be expressed as the shortest path problem. Then by using existing methods, we can get the shortest path from Irvine, CA to Riverside, CA traveling along State Route 261. Transport routing [16] – If Mark drives a truck delivering hazardous materials may not be allowed to cross over some bridges or railroad crossings. This time we can use a pattern graph containing specific route constraints (such as regular expressions) to find the optimal transport routes
Application Scenarios Recommender systems [131 Recommendations have found its usage in many emerging specific applications, such as social matching systems Graph search is a useful tool for recommendations a headhunter wants to find a biologist (Bio)to help a group of software G engineers(sEs) analyze genetic data DM1 To do this, (s)he uses an expertise HRI Bio1 recommendation network g. as Bi depicted in g, where All Al2 v a node denotes a person labeled SE1 B102 Bio3 with expertise, and SE2 DM2 an edge indicates recommendation AlI DM Alk DMK e.g., HR, recommends Bio,, and Al, recommends DM1 10
Application Scenarios 10 • Recommendations have found its usage in many emerging specific applications, such as social matching systems. • Graph search is a useful tool for recommendations. Recommender systems [13] – A headhunter wants to find a biologist (Bio) to help a group of software engineers (SEs) analyze genetic data. – To do this, (s)he uses an expertise recommendation network G, as depicted in G, where ✓ a node denotes a person labeled with expertise, and ✓ an edge indicates recommendation, e.g., HR1 recommends Bio1 , and AI1 recommends DM1