Parameter Estimation and Evaluation Important Features of Big Data Important Features of Big Data information from sensors and satallite images,and etc. GPS sensors record various key environmental data and physical data (e.g.,satallite images)on the earth,such as hourly observa- tions of PM 2.5 in majoir cities in China,traffic flows in major train stations,and night light brightness of major cities on the earth.Also,telescopes and radio telescopes are searching the sky 24 hours per day and the streams of astranomic data are recorded in real-time. Big Data,Machine Learning and Statistics Introduction to Statistics and Econometrics Juy8,2020 16/70
Parameter Estimation and Evaluation Big Data, Machine Learning and Statistics Introduction to Statistics and Econometrics July 8, 2020 16/70 Important Features of Big Data Important Features of Big Data
Parameter Estimation and Evaluation Important Features of Big Data Important Features of Big Data Important features of Big data: ●Volume. Data is collected from a variety of sources,including business transactions,messages in social media and information from sensors or machine-to-machine data.In the past,storing such a big scale of data would have been a problem -but new tech- nologies (such as Hadoop)have eased the burden of storage. ●Velocity. Most data arrives at an unprecedented speed and must be dealt with in a timely manner.RFID tags,sensors and smart me- tering are driving the need to deal with torrents of data in real time or near real time.In many cases,data typically arrives in a clustering manner,with periodic peaks over time. Big Data,Machine Learning and Statistics Introduction to Statistics and Econometrics July8,2020 17170
Parameter Estimation and Evaluation Big Data, Machine Learning and Statistics Introduction to Statistics and Econometrics July 8, 2020 17/70 Important Features of Big Data Important Features of Big Data
Parameter Estimation and Evaluation Important Features of Big Data Important Features of Big Data Variety. Data comes in all types of formats -from structured,numerical data in traditional databases to unstructured text documents, emails,photos,video,audio,stock ticker data,and etc. Veracity. There is usually a vast amount of data but often with low in- formation density.There may be a lot of noises in data.In addition,missing data and manipunated data are present,and so clearnsing is necessary. Big Data,Machine Learning and Statistics Introduction to Statistics and Econometrics Juy8,2020 18/70
Parameter Estimation and Evaluation Big Data, Machine Learning and Statistics Introduction to Statistics and Econometrics July 8, 2020 18/70 Important Features of Big Data Important Features of Big Data
Parameter Estimation and Evaluation Important Features of Big Data Important Features of Big Data Impact of a high volume of Big data: Tall Big data:Big data may have a very large sample size.In partic- ular,the sample size may be far more than the number of explanatory or predicting variables.For many data sets,the sample sizes can be tens of thousands or even several millions of observations.Such Big data is called a "tall big data".A large sample size means that new information can be explored from Big data,especially from unstruc- tured data,to improve inference of the DGP and resulting decision making.Often,only a small fraction of a tall big data is used in feasible statistical analysis (e.g.,Engle and Rusell 1998,Engle 2000). Big Data,Machine Learning and Statistics Introduction to Statistics and Econometrics Juy8,2020 19/70
Parameter Estimation and Evaluation Big Data, Machine Learning and Statistics Introduction to Statistics and Econometrics July 8, 2020 19/70 Important Features of Big Data Important Features of Big Data
Parameter Estimation and Evaluation Important Features of Big Data Important Features of Big Data Fat Big data:It may be noted that a high volume of data does not al- ways mean a large sample size.Instead,it may means a vast variety of descriptions or characterizations of the DGP over a given time period. In other words,we have a high dimensional set of explanatory vari- ables or covariates.An example is Google search trends for tourism for some city in China.This provides a great potential and flexibility to find many important covariates or predictive variables.However, it poses a challenge to statistical modelling and inference because the number of potential explanatory variables can be far more than the sample size.Such Big data is called a "fat big data".There exists the notorious "curse of dimensionality"problem in statistical analysis when there is a high dimensional set of explanatory variables.It is possible that many explanatory variables may have no or little impact on the dependent variable,or there may exist multicollinrearity among a high dimensional explanatory variables. Big Data,Machine Learning and Statistics Introduction to Statistics and Econometrics July8,2020 20/70
Parameter Estimation and Evaluation Big Data, Machine Learning and Statistics Introduction to Statistics and Econometrics July 8, 2020 20/70 Important Features of Big Data Important Features of Big Data