Key Value Storage Systems Key-value stores support put(key,value):used to store values with an associated key, get(key):which retrieves the stored value associated with the specified key delete(key)--Remove the key and its associated value ■ Some systems also support range queries on key values Document stores also support queries on non-key attributes See book for MongoDB queries Key value stores are not full database systems .Have no/limited support for transactional updates Applications must manage query processing on their own Not supporting above features makes it easier to build scalable data storage systems Also called NoSQL systems Database System Concepts-7th Edition 10.13 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.13 ©Silberschatz, Korth and Sudarshan th Edition Key Value Storage Systems ▪ Key-value stores support • put(key, value): used to store values with an associated key, • get(key): which retrieves the stored value associated with the specified key • delete(key) -- Remove the key and its associated value ▪ Some systems also support range queries on key values ▪ Document stores also support queries on non-key attributes • See book for MongoDB queries ▪ Key value stores are not full database systems • Have no/limited support for transactional updates • Applications must manage query processing on their own ▪ Not supporting above features makes it easier to build scalable data storage systems • Also called NoSQL systems
Parallel and Distributed Databases Parallel databases run multiple machines (cluser) Developed in 1980s,well before Big Data Parallel databases were designed for smaller scale(10s to 100s of machines) Did not provide easy scalability Replication used to ensure data availability despite machine failure But typically restart query in event of failure Restarts may be frequent at very large scale Map-reduce systems(coming up next)can continue query execution,working around failures Database System Concepts-7th Edition 10.14 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.14 ©Silberschatz, Korth and Sudarshan th Edition Parallel and Distributed Databases ▪ Parallel databases run multiple machines (cluser) • Developed in 1980s, well before Big Data ▪ Parallel databases were designed for smaller scale (10s to 100s of machines) • Did not provide easy scalability ▪ Replication used to ensure data availability despite machine failure • But typically restart query in event of failure ▪ Restarts may be frequent at very large scale ▪ Map-reduce systems (coming up next) can continue query execution, working around failures
Replication and Consistency Availability(system can run even if parts have failed)is essential for parallel/distributed databases Via replication,so even if a node has failed,another copy is available Consistency is important for replicated data All live replicas have same value,and each read sees latest version Often implemented using majority protocols E.g.,have 3 replicas,reads/writes must access 2 replicas ·Details in chapter23 Network partitions(network can break into two or more parts,each with active systems that can't talk to other parts) In presence of partitions,cannot guarantee both availability and consistency ·Brewer'sCAP“Theorem Database System Concepts-7th Edition 10.15 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.15 ©Silberschatz, Korth and Sudarshan th Edition Replication and Consistency ▪ Availability (system can run even if parts have failed) is essential for parallel/distributed databases • Via replication, so even if a node has failed, another copy is available ▪ Consistency is important for replicated data • All live replicas have same value, and each read sees latest version • Often implemented using majority protocols ▪ E.g., have 3 replicas, reads/writes must access 2 replicas • Details in chapter 23 ▪ Network partitions (network can break into two or more parts, each with active systems that can’t talk to other parts) ▪ In presence of partitions, cannot guarantee both availability and consistency • Brewer’s CAP “Theorem
Replication and Consistency Very large systems will partition at some point Choose one of consistency or availability Traditional database choose consistency Most Web applications choose availability Except for specific parts such as order processing More details later,in Chapter 23 Database System Concepts-7th Edition 10.16 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.16 ©Silberschatz, Korth and Sudarshan th Edition Replication and Consistency ▪ Very large systems will partition at some point • Choose one of consistency or availability ▪ Traditional database choose consistency ▪ Most Web applications choose availability • Except for specific parts such as order processing ▪ More details later, in Chapter 23
The MapReduce Paradigm Platform for reliable,scalable parallel computing Abstracts issues of distributed and parallel environment from programmer Programmer provides core logic (via map()and reduce()functions) System takes care of parallelization of computation,coordination,etc. Paradigm dates back many decades But very large scale implementations running on clusters with 10^3 to 10^4 machines are more recent Google Map Reduce,Hadoop,. Data storage/access typically done using distributed file systems or key- value stores Database System Concepts-7th Edition 10.17 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.17 ©Silberschatz, Korth and Sudarshan th Edition The MapReduce Paradigm ▪ Platform for reliable, scalable parallel computing ▪ Abstracts issues of distributed and parallel environment from programmer • Programmer provides core logic (via map() and reduce() functions) • System takes care of parallelization of computation, coordination, etc. ▪ Paradigm dates back many decades • But very large scale implementations running on clusters with 10^3 to 10^4 machines are more recent • Google Map Reduce, Hadoop, .. ▪ Data storage/access typically done using distributed file systems or keyvalue stores