那么,不发布详细的元组数据,而发布粗粒度的 统计数据,则不会导致隐私泄露呢? Publishing Histogram The original dataset in terms of the attribute age Name HIV+ age (Range count query) indicator hiv+ Tom 52 0 Jack 43 1 Henry 21 Diego 41 0 2 Alice 54 1 +++ ””4””中+ 2030 30-40 40-50 50-60 70- age Suppose that the attacker know Alice's age is 54.He is also aware of the other patients' condition in bin 50-60 except Alice.He can infer whether Alice's HIV+is positive or not according to the above histogram publication
Name age HIV+ indicator Tom 52 0 Jack 43 1 Henry 21 1 Diego 41 0 Alice 54 1 …… ……. The original dataset Publishing Histogram in terms of the attribute age (Range count query) 1 2 1 3 5 0 1 2 3 4 5 6 20-30 30-40 40-50 50-60 70- hiv+ age Suppose that the attacker know Alice’s age is 54. He is also aware of the other patients’ condition in bin 50-60 except Alice. He can infer whether Alice’s HIV+ is positive or not according to the above histogram publication. 那么,不发布详细的元组数据,而发布粗粒度的 统计数据,则不会导致隐私泄露呢?
发布带噪声的统计数据,是否会产生隐私泄露呢?Diur和 Nissim提出从统计数据重构源数据的算法 线性规划重 构数据。 Original Dataset Marginal Distribution (Count+noise) Age Gender Employed Count Noisy Age Employed Noisy Count Age Gender <18 M Yes xl Count <18 M No x2 <18 Yes 4 <18 M 3 <18 F Yes x3 <18 No 2 <18 F 4 <18 F No x4 >=18 Yes 3 >=18 M 5 >=18 M Yes x5 >=18 No 6 >=18 F 2 Marginal 1 Marginal 2 >=18 M No x6 Minimize8,8,x≥0,且满足: >=18 F Yes x7 2-8≤x2+x4≤2+e 得到具体的统计数据 >=18 F No X8 3-8≤x5+x7≤3+8 后,可以推算出数据 3-8≤x1+x2≤3+8 表中属性Count的值。 444 2-8≤x7+x8≤2+8 I.Dinur and K.Nissim,"Revealing Information While Preserving Privacy,"Proceedings of the 22nd ACM Symposium on Principles of Database Systems(PODS),San Diego,June 2003,pp.202-210
发布带噪声的统计数据,是否会产生隐私泄露呢?Dinur和 Nissim提出从统计数据重构源数据的算法——线性规划重 构数据。 I. Dinur and K. Nissim, “Revealing Information While Preserving Privacy,” Proceedings of the 22nd ACM Symposium on Principles of Database Systems (PODS), San Diego, June 2003, pp. 202-210. Age Gender Employed <18 M Yes <18 M No <18 F Yes <18 F No >=18 M Yes >=18 M No >=18 F Yes >=18 F No … Original Dataset Age Employed Noisy Count <18 Yes 4 <18 No 2 >=18 Yes 3 >=18 No 6 Age Gender Noisy Count <18 M 3 <18 F 4 >=18 M 5 >=18 F 2 Marginal Distribution(Count+noise) Marginal 1 Marginal 2 Minimize ε, ε,xi≥0,且满足: 2- ε ≤x2+x4 ≤2+ ε 3- ε ≤x5+x7 ≤3+ ε 3- ε ≤x1+x2 ≤3+ ε 2- ε ≤x7+x8 ≤2+ ε .... Count x1 x2 x3 x4 x5 x6 x7 X8 得到具体的统计数据 后,可以推算出数据 表中属性Count的值
数据重构攻击的实际效果 ■ 美国普查局用他们2010年所发布的一组统计数据,试验了数据 重构攻击。结果表明能重构17%的美国人口数据。 ■为此,他们宣布在2020年的统计数据发布中使用差分隐私保护。 Source from: https://www.census.gov/content/dam/Census/newsroom/press- kits/2019/jsm/presentation-deploying-differential-privacy-for-the- 2020-census-of-pop-and-housing.pdf
数据重构攻击的实际效果 ■ 美国普查局用他们2010年所发布的一组统计数据,试验了数据 重构攻击。结果表明能重构17%的美国人口数据。 ■ 为此,他们宣布在2020年的统计数据发布中使用差分隐私保护。 Source from: https://www.census.gov/content/dam/Census/newsroom/presskits/2019/jsm/presentation-deploying-differential-privacy-for-the- 2020-census-of-pop-and-housing.pdf
更进一步,发布机器学习模型是否导致隐私泄露? Data mining/ Big Data Parametric Models Machine learning or Algorithm Models Statistical analysis Releasing the statistical results Prediction or other Results
更进一步,发布机器学习模型是否导致隐私泄露? Big Data Parametric Models or Algorithm Models Data mining/ Machine learning Statistical analysis Releasing the statistical results Prediction or other Results
更进一步,发布机器学习模型是否导致隐私泄露? Attacker对模型 具有黑盒访问权 Data record.Class Label 机器学习中的成员推断攻击 Private Name age Class Label Training Training 0 Bata Tom 52 1 Model Label Same distribution Prediction Data set Data record∈Private nda dog training set? Shadow models panda dog cat Multiple Attack Attacker models R.Shokri,M.Stronati,C.Song and V.Shmatikov,"Membership Inference Attacks Against Machine Learning Models,"2017 IEEE Symposium on Security and Privacy(SP),2017,pp.3-18,doi:10.1109/SP.2017.41
Private Training Data Training 机 器 学 习 中 的 成 员 推 断 攻 击 Name age Class Label Tom 52 1 Attacker对模型 具有黑盒访问权 Data record, Class Label Multiple Attack models Label Data record∈Private training set? Data set Shadow models Same distribution Attacker Prediction R. Shokri, M. Stronati, C. Song and V. Shmatikov, "Membership Inference Attacks Against Machine Learning Models," 2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 3-18, doi: 10.1109/SP.2017.41. 更进一步,发布机器学习模型是否导致隐私泄露?