当前位置：和泉文库 > 数学 > 浏览文档

南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）04 Hashing and Sketching

文件格式：PDF，文件大小：1.45MB，售价：12.96元

文档详细内容（约46页）

Advanced Algorithms Hashing and Sketching 尹一通Nanjing University,2022Fall

尹⼀通 Nanjing University, 2022 Fall Advanced Algorithms Hashing and Sketching

Count Distinct Elements Input:a sequence,,...,=[N] Output:an estimation of= {x,,} Data stream model:input data item comes one at a time x12 Xn Algorithm an estimation of f,X）={,,,x} Naive algorithm:store all distinct data items using (log N)bits Sketch:(lossy)representation of data using space Lower bound (Alon-Matias-Szegedy):any deterministic(exact or approx.)algorithm must use (N)bits of space in the worst case

Count Distinct Elements Input: a sequence Output: an estimation of x1, x2,…, xn ∈ U = [N] z = {x1, x2, …, xn} • Data stream model: input data item comes one at a time • Naïve algorithm: store all distinct data items using bits • Sketch: (lossy) representation of data using space • Lower bound (Alon-Matias-Szegedy): any deterministic (exact or approx.) algorithm must use bits of space in the worst case Ω(zlog N) ≪ z Ω(N) x1 x2 xn Algorithm an estimation of f(x1,…, xn) = {x1, x2,…, xn}

Count Distinct Elements Input:a sequence,,...,=[N] Output:an estimation of= {出，，} Data stream model:input data item comes one at a time X1 X2 Xn Algorithm 2 ·(e,δ)-estimator: randomized variable Pr1-ek≤2≤1+ez≥1-6 Using only memory equivalent to 5 lines of printed text,you can estimate with a typical accuracy of 5%and in a single pass the total vocabulary of Shakespeare. -Durand and Flajolet 2003

Count Distinct Elements • Data stream model: input data item comes one at a time • (ϵ, δ)-estimator: randomized variable Z ̂ Pr [ (1 − ϵ)z ≤ Z ̂ ≤ (1 + ϵ)z ] ≥ 1 − δ Using only memory equivalent to 5 lines of printed text, you can estimate with a typical accuracy of 5% and in a single pass the total vocabulary of Shakespeare. ——Durand and Flajolet 2003 x1 x2 xn Algorithm Z ̂ Input: a sequence Output: an estimation of x1, x2,…, xn ∈ U = [N] z = {x1, x2, …, xn}

Input:a sequencex,,...,=[N] Output:an estimation of ={,2,,} Simple Uniform Hash Assumption(SUHA): A uniform function is available,whose preprocessing, representation and evaluation are considered to be easy. (idealized)uniform hash function h:U->[0,1] ·x:=x→the same hash value(x)=h(x)∈，[0，l] h),...,h:uniform and independent values in [,1] partition [0,1]into +1 subintervals (with identically distributed lengths) Prlongth of a subintervall 1 (by symmetry) 。estimator: 2= -1? Variance is too large! min;h(xi)

• (idealized) uniform hash function • the same hash value • : uniform and independent values in • partition into subintervals (with identically distributed lengths) h : U → [0,1] xi = xj⟶ h(xi ) = h(xj ) ∈r [0,1] {h(x1), …, h(xn)} z × [0,1] [0,1] z + 1 Simple Uniform Hash Assumption (SUHA): A uniform function is available, whose preprocessing, representation and evaluation are considered to be easy. 𝔼 [ min 1≤i≤n h(xi ) ] = Pr[length of a subinterval] = 1 z + 1 (by symmetry) • estimator: ? Z ̂ = 1 mini h(xi ) − 1 Variance is too large! Input: a sequence Output: an estimation of x1, x2,…, xn ∈ U = [N] z = {x1, x2, …, xn}

Markov's Inequality Markov's Inequality For nonnegative random variable X,for any t >0, E[X] Pr[X≥t1≤ t f(x) LetY= 1 X 0 PX≥0=n≤E月- E[X] p(Xza) tight if only knowing the expectation of X

Markov’s Inequality Markov’s Inequality For nonnegative random variable , for any , X t > 0 Pr[X ≥ t] ≤ 𝔼[X] t Let Y = { 1 X ≥ t 0 o.w. ⟹ Y ≤ ⌊ X t ⌋ ≤ X t Pr[X ≥ t] = 𝔼[Y] ≤ 𝔼 [ X t ] = 𝔼[X] t tight if only knowing the expectation of X

点击进入文档下载页（PDF格式）

共46页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）03 Balls into Bins
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）02 Fingerprinting
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）01 Introduction - Min-Cut and Max-Cut（尹⼀通）
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）03 Balls-into-Bins Model and Chernoff Bounds
《量子计算》课程教学资源（阅读文献）Lecture Notes on Quantum Algorithms（Andrew M. Childs）
《量子计算》课程教学资源（阅读文献）Quantum Computation and Quantum Information（10th Anniversary Edition，Michael A. Nielsen & Isaac L. Chuang）
《理论计算机科学》课程教学资源（阅读文献）Computational Complexity - A Modern Approach
《理论计算机科学》课程教学资源（阅读文献）Approximation via Correlation Decay when Strong Spatial Mixing Fails（HIS）
《理论计算机科学》课程教学资源（阅读文献）Galton–Watson process - Branching
《理论计算机科学》课程教学资源（阅读文献）Analysis Of Boolean Functions（Ryan O’Donnell）
南京大学：《概率论与数理统计 Probability and Statistics》课程教学资源（PPT课件讲稿）Lecture 14 假设检验
南京大学：《概率论与数理统计 Probability and Statistics》课程教学资源（PPT课件讲稿）Lecture 13 区间估计（参数估计）
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）05 Concentration of measure
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）06 Dimension Reduction
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）07 Lovász Local Lemma
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）08 Greedy and Local Search
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）09 Rounding Dynamic Programs
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）10 Rounding Linear Programs
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）11 The Primal-Dual Schema
南京大学：《高级算法 Advanced Algorithms》课程教学资源（课件讲稿）12 SDP-Based Algorithms
电子科技大学：《图论及其应用 Graph Theory and its Applications》研究生课程教学资源（教学大纲，杨春）
电子科技大学：《图论及其应用 Graph Theory and its Applications》研究生课程教学资源（课件讲稿）01 图论简介、图的定义及其相关概念、图的同构
电子科技大学：《图论及其应用 Graph Theory and its Applications》研究生课程教学资源（课件讲稿）02 完全图、偶图与补图、顶点的度与图的度序列
电子科技大学：《图论及其应用 Graph Theory and its Applications》研究生课程教学资源（课件讲稿）03 子图的相关概念、几种典型的图运算、路与连通性

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录