Big Data Integration Xin Luna Dong Google Inc) Divesh Srivastava(AT&T Labs-Research)
Big Data Integration Xin Luna Dong (Google Inc.) Divesh Srivastava (AT&T Labs-Research)
What is“ Big data Integration?” o Big data integration= Big data+ data integration Data integration: easy access to multiple data sources[DH[12 Virtual: mediated schema, query reformulation, link fuse answers Warehouse: materialized data, easy querying, consistency issues ◆ Big data: all about the v Size: large volume of data, collected and analyzed at high velocity Complexity huge variety of data, of questionable veracity Utility: data of considerable value
What is “Big Data Integration?” Big data integration = Big data + data integration Data integration: easy access to multiple data sources [DHI12] – Virtual: mediated schema, query reformulation, link + fuse answers – Warehouse: materialized data, easy querying, consistency issues Big data: all about the V’s ☺ – Size: large volume of data, collected and analyzed at high velocity – Complexity: huge variety of data, of questionable veracity – Utility: data of considerable value 2
What is“ Big data Integration?” o Big data integration= Big data+ data integration Data integration: easy access to multiple data sources[DH[12 Virtual: mediated schema, query reformulation, link fuse answers Warehouse: materialized data, easy querying, consistency issues Big data in the context of data integration: still about the v's g Size: large volume of sources, changing at high velocity Complexity huge variety of sources, of questionable veracity Utility: sources of considerable value
What is “Big Data Integration?” Big data integration = Big data + data integration Data integration: easy access to multiple data sources [DHI12] – Virtual: mediated schema, query reformulation, link + fuse answers – Warehouse: materialized data, easy querying, consistency issues Big data in the context of data integration: still about the V’s ☺ – Size: large volume of sources, changing at high velocity – Complexity: huge variety of sources, of questionable veracity – Utility: sources of considerable value 3
Outline ◆ Motivation Why do we need big data integration? How has"small"data integration been done? Challenges in big data integration ◆ Schema alignment ◆ Record linkage ◆ Data fusion ◆ merging topICs
Outline Motivation – Why do we need big data integration? – How has “small” data integration been done? – Challenges in big data integration Schema alignment Record linkage Data fusion Emerging topics 4
Why do We need"Big Data Integration? Building web-scale knowledge bases ProBase MSR knowledge base A Little Knowledge Goes a Long Way Google knowledge graph 产 Freebase Doman Topics Facts 24M161M aGO ct knowledge Meda common
Why Do We Need “Big Data Integration?” Building web-scale knowledge bases 5 Google knowledge graph MSR knowledge base A Little Knowledge Goes a Long Way. NELL