Evolution of sciences ◆1960s: Data collection database creation, IMs and network DBMS ◆1970s: Relational data model, relational DBMS implementation ◆1980s: RDBMS, advanced data models(extended-relational OO, deductive, etc.) Application-oriented DBMs(spatial, scientific engineering, etc.)
11 Evolution of Sciences 1960s: ◼ Data collection, database creation, IMS and network DBMS 1970s: ◼ Relational data model, relational DBMS implementation 1980s: ◼ RDBMS, advanced data models (extended-relational, OO, deductive, etc.) ◼ Application-oriented DBMS (spatial, scientific, engineering, etc.)
Evolution of sciences, new data science era 91990-now: Data science The flood of data from new scientific instruments and simulations The ability to economically store and manage petabytes of data online The Internet and computing Grid that makes all these archives universally accessible Scientific info. management, acquisition, organization, query, and visualization tasks scale almost linearly with data volumes Data mining is a major new challenge ◆19905 Data mining, data warehousing, multimedia databases, and Web databases ◆2000s Stream data management and mining Data mining and its applications Web technology (XML data integration) and global information systems
12 Evolution of Sciences: New Data Science Era 1990-now: Data science ◼ The flood of data from new scientific instruments and simulations ◼ The ability to economically store and manage petabytes of data online ◼ The Internet and computing Grid that makes all these archives universally accessible ◼ Scientific info. management, acquisition, organization, query, and visualization tasks scale almost linearly with data volumes ◼ Data mining is a major new challenge! 1990s: ◼ Data mining, data warehousing, multimedia databases, and Web databases 2000s ◼ Stream data management and mining ◼ Data mining and its applications ◼ Web technology (XML, data integration) and global information systems
Data vs. information ◆ Concep Data: recorded facts Information patterns underlying the data o Society produces huge amounts of data Sources from business, science, medicine economics, geography, environment, sports, Potentially valuable resource Raw data is useless: need techniques to automatically extract information from it 13
13 Data vs. information Concept ◼ Data: recorded facts ◼ Information: patterns underlying the data. Society produces huge amounts of data. Sources from business, science, medicine, economics, geography, environment, sports, … ◼ Potentially valuable resource. ◼ Raw data is useless: need techniques to automatically extract information from it
What Is Data Mining e Extraction of implicit, previously unknown and potentially useful information from data Needed: programs that detect patterns and regularities in the data o Strong patterns can be used to make predictions
14 What Is Data Mining? Extraction of implicit, previously unknown, and potentially useful information from data; Needed: programs that detect patterns and regularities in the data; Strong patterns can be used to make predictions
What Is Data Mining? o Data mining(knowledge discovery in databases) Extraction of interesting (non-trivial, implicit previously unknown and potentially useful) information or patterns from data in large databases o Alternative names and their inside stories a Data mining: a misnomer? Knowledge discovery(mining) in databases(KDD), knowledge extraction, data/pattern analysis, information harvesting business intelligence, eto What is not data mining? (Deductive) query processing Expert systems or small ML/statistical programs 15
15 What Is Data Mining? Data mining (knowledge discovery in databases): ◼ Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases Alternative names and their “inside stories”: ◼ Data mining: a misnomer? ◼ Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, information harvesting, business intelligence, etc. What is not data mining? ◼ (Deductive) query processing. ◼ Expert systems or small ML/statistical programs