The power law theorem of data set size The number of data sets of size n is inversely proportional to n There are vastly more small data sets than very large ones So small data sets are likely to have a much larger impact on the world than big data sets David Hand: Some comments on big data December 2013
The power law theorem of data set size: • The number of data sets of size n is inversely proportional to n • There are vastly more small data sets than very large ones • So small data sets are likely to have a much larger impact on the world than big data sets David Hand: Some comments on big data, December 2013
No-one actually wants data What people want are answers Which may be extracted from data So data are only half the answer The other half is statistics, data mining machine learning and other data analytic sciplines David Hand: Some comments on big data December 2013
No-one actually wants data • What people want are answers • Which may be extracted from data • So data are only half the answer • The other half is statistics, data mining, machine learning, and other data analytic disciplines David Hand: Some comments on big data, December 2013
The manure heap theorem of data discoveries The probability of finding a gold coin in a heap of manure tends towards 1 as the size of the heap tends to infinity. (This theorem is false) David Hand: Some comments on big data December 2013
The manure heap theorem of data discoveries The probability of finding a gold coin in a heap of manure tends towards 1 as the size of the heap tends to infinity. (This theorem is false) David Hand: Some comments on big data, December 2013
0100 00100 Data Science not just for Big data Gregory piatetsky @kdnuggets nuggets Analytics, Big Data. Data mining, and data Science resources o KDnuggets 2013
Data Science not just for Big Data Gregory Piatetsky, @kdnuggets Analytics, Big Data, Data Mining, and Data Science Resources © KDnuggets 2013 9
What do we call it? Statistics, 1830 Same Core ldea Data mining, 1980 Finding Useful Knowledge Discovery in Patterns in Data Data(KDD),1989 Business analytics, 1997 Predictive analytics, 2002 Data analytics, 2011 Different · Data science,2011 Empl hasis Big Data, 2012 o KDnuggets 2013
What do we call it? • Statistics, 1830- • Data mining, 1980- • Knowledge Discovery in Data (KDD), 1989- • Business Analytics, 1997- • Predictive Analytics, 2002- • Data Analytics,2011- • Data Science, 2011- • Big Data, 2012 - © KDnuggets 2013 10 Same Core Idea: Finding Useful Patterns in Data Different Emphasis