What's big dAM? Big data is the term for a collection of data sets so large and complex that it becomes dificult to process using on-hand database management tools or traditional data processing applications The challenges include capture, curation, storage search sharing, transfer, analysis and visualization a Our course: How to do daM in the Big data context Data Mining≈ Predictive Analytics≈ Data Science≈ Business Intelligence ◆ Big data mining≈ Massive data analysis 2021/1/30 同济大学软件学院
2021/1/30 9 What’s big DAM? ◼ Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. ◆ The challenges include capture, curation, storage search, sharing, transfer, analysis and visualization ◼ Our course: How to do DAM in the Big data context ◆ Data Mining ≈ Predictive Analytics ≈Data Science ≈ Business Intelligence ◆ Big data mining ≈ Massive data analysis
Let's focus on big DAM what matters when dealing with data? Challenges Usage Context Streaming Scalability Collect Data Modalities Reason Data Operators 2021/1/30 同济大学软件学院
2021/1/30 10 Let’s focus on big DAM -what matters when dealing with data?
Let's focus on big DAM cultures of data minging a Data mining overlaps with Databases: Large-scale data, simple queries Machine learning: Small data complex models CS Theory:(Randomized) Algorithms Statistics Machine Learning a Different cultures: To a DB person, data mining is an extreme Data Mining form of analytic processing -queries that examine large amounts of data Database n Result is the query answer o to a ml person data-mining is the inference of models a Result is the parameters of the mode 2021/1/30 同济大学软件学院 11
2021/1/30 11 Let’s focus on big DAM - cultures of data minging? ◼ Data mining overlaps with: ◆ Databases: Large-scale data, simple queries ◆ Machine learning: Small data, Complex models ◆ CS Theory: (Randomized) Algorithms ◼ Different cultures: ◆ To a DB person, data mining is an extreme form of analytic processing – queries that examine large amounts of data Result is the query answer ◆ To a ML person, data-mining is the inference of models Result is the parameters of the model
Let's focus on big data mining a This class overlaps with machine learning, statistics artificial intelligence databases but more stress on ◆ Scalability( big data) ◆ Algorithms o Computing architectures Sti atistIcs Machine o Automation for handling real big data Learning the required background Data Mining Data structure and algorithm design o Probability and linear algebra stems ◆ Operating system ◆ Java program design 2021/1/30 同济大学软件学院
2021/1/30 12 Let’s focus on big data mining ◼ This class overlaps with machine learning, statistics, artificial intelligence, databases but more stress on ◆ Scalability (big data) ◆ Algorithms ◆ Computing architectures ◆ Automation for handling real big data ◼ The required background ◆ Data structure and Algorithm design ◆ Probability and Linear algebra ◆ Operating System ◆ Java program design
What will we learn? a We will learn to mine different types of data: ◆ Data is high dim yonal ◆ Data is a graph *Data-is infinite/never-ending Data is labeled a We will learn to use different models of computation: ◆ Matlab+ Hadoop+ Spark e Streams and online algorith o Single machine in-memory 2021/1/30 同济大学软件学院
2021/1/30 13 What will we learn? ◼ We will learn to mine different types of data: ◆ Data is high dimensional ◆ Data is a graph ◆ Data is infinite/never-ending ◆ Data is labeled ◼ We will learn to use different models of computation: ◆ Matlab + Hadoop + Spark ◆ Streams and online algorithms ◆ Single machine in-memory