Lecture 4 Sampling for Big Data
Lecture 4 Sampling for Big Data
What's the value of TT? MC Approximation of Pi 3.14616 8 -0.5 0.0 0.5 0 Areacircle πr2 π #Pointcircle π=4* Areasquare (2r)(2r) =4 #Pointsquare
What’s the value of π?
Outline >Motivation /Benefits >Basics of sampling Inverse Transform sampling ·Rejection sampling ·Importance sampling Markov chain Monte Carlo (MCMC) MH Sampling Gibbs Sampling >Stream sampling Sample >Conclusion
¾Motivation / Benefits ¾Basics of sampling • Inverse Transform sampling • Rejection sampling • Importance sampling • Markov chain Monte Carlo (MCMC) MH Sampling Gibbs Sampling ¾Stream sampling ¾Conclusion Outline
Why Sampling? 10 12 Big data issue 。 Store complexity 2 10 Sample Calculate complexity 35 ● Posterior estimation Expectation estimation Population ● Mean age of people in China
Big data issue • Store complexity • Calculate complexity • … Why Sampling? Posterior estimation • Expectation estimation • …… Mean age of people in China
Bad Sampling Perform your research with bad samples,or just ones that are inaccurately designed,and you will almost certainly get misleading results. Examples:only sample teenagers when querying the mean age of people in China YOUR SAMPLING IS BAD AND YOU SHOULD FEEL BAD
Bad Sampling • Perform your research with bad samples, or just ones that are inaccurately designed, and you will almost certainly get misleading results. • Examples: only sample teenagers when querying the mean age of people in China