Querying and Mining Data Streams you only Get One Look A Tutorial Minos Garofalakis Johannes Gehrke° Rajeev rastogi ○Be| Laboratories cOrnell University Garofalakis, Gehrke, Rastogi, VLDB02 #1
Garofalakis, Gehrke, Rastogi, VLDB’02 # 1 Querying and Mining Data Streams: You Only Get One Look A Tutorial Minos Garofalakis Johannes Gehrke Rajeev Rastogi Bell Laboratories Cornell University
Ou uTIne Introduction Motivation Stream computation model applications Basic stream synopses computation Samples, Equi-depth histograms, Wavelets Mining data streams Decision trees, clustering, association rules Sketch-based computation techniques Self-joins, Joins, Wavelets, V-optimal histograms Advanced techniques Sliding windows, Distinct values, Hot lists Future directions Conclusions Garofalakis, Gehrke, Rastogi, VLDB02 #2
Garofalakis, Gehrke, Rastogi, VLDB’02 # 2 Outline • Introduction & Motivation – Stream computation model, Applications • Basic stream synopses computation – Samples, Equi-depth histograms, Wavelets • Mining data streams – Decision trees, clustering, association rules • Sketch-based computation techniques – Self-joins, Joins, Wavelets, V-optimal histograms • Advanced techniques – Sliding windows, Distinct values, Hot lists • Future directions & Conclusions
Processing data Streams: MotivatioAont sdmelege O CRe A growing number of applications generate streams of data Performance measurements in network monitoring and traffic management Call detail records in telecommunications Transactions in retail chains, A TM operations in banks Log records generated by Web Servers Sensor network data Application characteristics Massive volumes of data( several terabytes Records arrive at a rapid rate Goa: Mine patterns, process queries and compute statistics on data streams in real-time Garofalakis, Gehrke, Rastogi, VLDB02 #3
Garofalakis, Gehrke, Rastogi, VLDB’02 # 3 Processing Data Streams: Motivation • A growing number of applications generate streams of data – Performance measurements in network monitoring and traffic management – Call detail records in telecommunications – Transactions in retail chains, ATM operations in banks – Log records generated by Web Servers – Sensor network data • Application characteristics – Massive volumes of data (several terabytes) – Records arrive at a rapid rate • Goal: Mine patterns, process queries and compute statistics on data streams in real-time
Data Streams: Computation Model A data stream is a(massive) sequence of elements:el,-.,en Synopsis in Memory Data streams Stream Processing (Apro×mate) Engine Answer Stream processing requirements Single pass: Each record is examined at most once Bounded storage: Limited Memory(M)for storing synopsis Real-time: Per record processing time(to maintain synopsis)must be low Garofalakis, Gehrke, Rastogi, VLDB'02 #4
Garofalakis, Gehrke, Rastogi, VLDB’02 # 4 Data Streams: Computation Model • A data stream is a (massive) sequence of elements: • Stream processing requirements – Single pass: Each record is examined at most once – Bounded storage: Limited Memory (M) for storing synopsis – Real-time: Per record processing time (to maintain synopsis) must be low Stream Processing Engine (Approximate) Answer Synopsis in Memory Data Streams e en ,..., 1
Network Management Application uren gdm: O owL Network Management involves monitoring and configuring network hardware and software to ensure smooth operation Monitor link bandwidth usage, estimate traffic demands Quickly detect faults, congestion and isolate root cause oad balancing, improve utilization of network resources Network Operations Measurements Center Alarms Network Garofalakis, Gehrke, Rastogi, VLDB02 #5
Garofalakis, Gehrke, Rastogi, VLDB’02 # 5 Network Management Application • Network Management involves monitoring and configuring network hardware and software to ensure smooth operation – Monitor link bandwidth usage, estimate traffic demands – Quickly detect faults, congestion and isolate root cause – Load balancing, improve utilization of network resources Network Operations Center Network Measurements Alarms