ReCaptcha与数据再利用( Luis von ahn ·人们需要从计算机光学字符识别程序无法识别的文 本扫描项目中读出两个单词并输入。 其中一个单词其他用户也识别过,从而可以从该用户的输入中判 断注册者是人; 另一个单词则是有待辨识和解疑的新词。为了保证准确度,系统 会将同一个模糊单词发给五个不同的人,直到他们都输入正确后 才确定这个单词是对的。 在这里,数据的主要用途是证明用户是人,但它也 有第二个目的:破译数字化文本中不清楚的单词
ReCaptcha与数据再利用(Luis von Ahn) • 人们需要从计算机光学字符识别程序无法识别的文 本扫描项目中读出两个单词并输入。 • 其中一个单词其他用户也识别过,从而可以从该用户的输入中判 断注册者是人; • 另一个单词则是有待辨识和解疑的新词。为了保证准确度,系统 会将同一个模糊单词发给五个不同的人,直到他们都输入正确后 才确定这个单词是对的。 • 在这里,数据的主要用途是证明用户是人,但它也 有第二个目的:破译数字化文本中不清楚的单词
HumanGepokaicshttp://www.int Wikipedia Particle Physics Large Hadron World wide Web(10GB) 200PB-Caplured (~1PB) 1000 CAGR www.an (15PB) //www.intel,cow ARCHIVE Personal Digital Annual email Estimated On-line Internet Archive RAM in Google Photos 3 Traffic, no spam (300pB+) (1PB+ (8PB)(100p8 100%o CAGR 200 of Londons 2004 Walmart Typical Oil Merck bio Traffic Cams Transaction db Company Research DB (8T8/day)501B)(350B+)(L5B/q UPMC Hospitals MIT Babytalk Terashake One Day of Imaging Data Speech Earthquake Model Instant Messaging (500TB/ yr)(1.4PB) of LA Basin (1PB)(750GB) Total digital data to be created this year 270, 000PB ( IDC)
7
cDwww.data.g 开放数据 ≡ATAG0V DATA TOPICS. IMPACT APPLICATIONS DEVELOPERS CONTACT The home of the U.S. Government's open data Www.dAta.Gov Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more 2009年的47个数据集, 2015年3月达到12万, GETSTARTED EARCHOVER 194 824 DATASETS 2016年2月1948万。 Monthly House Price Indexes BROWSE TOPICS 彰后M2 中盒AL Finance Local Manufacturin Public Safety Govemment
开放数据 • www.data.gov • 2009年的47个数据集, 2015年3月达到12万, 2016年2月19.48万
The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East(IDC& EMC, December 2012) The Digital Universe: 50-fold Growth from the Beginning of The Geography of the Digital Universe (2012) 2010 to the End of 2020 United States ● Westem Europe 32% China o India ● Rest of the World 10 092010201120122013201420152016201720182019200 a:2837EB
The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East (IDC & EMC, December 2012)
To promote the development of robust and reusable software take a consistent object-oriented viewpoint data should be presented as being encapsulated with the methods that access and modify them think of data objects as instances of an abstract data type (ADT), which includes a repertoire of methods for performing operations on data objects of this type there may be several different implementation strategies for a particular ADt, and explore the relative pros and cons of these choices
To promote the development of robust and reusable software • take a consistent object-oriented viewpoint • data should be presented as being encapsulated with the methods that access and modify them. • think of data objects as instances of an abstract data type (ADT), which includes a repertoire of methods for performing operations on data objects of this type. • there may be several different implementation strategies for a particular ADT, and explore the relative pros and cons of these choices