The data, while physically split, is used and managed as a logical whole Users can choose how to partition their data by declaring partitioning keys Users can also determine across which and how many database partitions their table data can be spread, by selecting the table space and the associated database partition group in which the data should be stored. In addition, an updatable partitioning map is used with a hashing algorithm to specify the mapping of partitioning key values to database partitions, which determines the placement and retrieval of each row of data. As a result, you can spread the workload across a partitioned database for large tables, while allowing smaller tables to be stored on one or more database partitions. Each database partition has local indexes on the data it stores, resulting in increased performance for local data access You are not restricted to having all tables divided across all database partitions in the database DB2 supports partial declustering, which means that you can divide tables and their table spaces across a subset of database partitions in the system. An alternative to consider when you want tables to be positioned on each database partition, is to use materialized query tables and then replicate those tables. You can create a materialized query table containing the information that you need, and then replicate it to each node Parallelism Components of a task, such as a database query, can be run in parallel to dramatically enhance performance. The nature of the task, the database configuration, and the hardware environment, all determine how DB2 will perform a task in parallel. These considerations are interrelated, and should be considered together when you work on the physical and logical design of database. The following types of parallelism are supported by DB2 Query · Utility Input/output parallelism When there are multiple containers for a table space, the database manager can exploit parallel 1/O. Parallel I/O refers to the process of writing to, or eading from two or more i/o devices simultaneously; it can result in significant improvements in throughput. Query parallelism There are two types of query parallelism: inter-query parallelism and intra-query parallelism 26 Administration Guide: Planning
The data, while physically split, is used and managed as a logical whole. Users can choose how to partition their data by declaring partitioning keys. Users can also determine across which and how many database partitions their table data can be spread, by selecting the table space and the associated database partition group in which the data should be stored. In addition, an updatable partitioning map is used with a hashing algorithm to specify the mapping of partitioning key values to database partitions, which determines the placement and retrieval of each row of data. As a result, you can spread the workload across a partitioned database for large tables, while allowing smaller tables to be stored on one or more database partitions. Each database partition has local indexes on the data it stores, resulting in increased performance for local data access. You are not restricted to having all tables divided across all database partitions in the database. DB2 supports partial declustering, which means that you can divide tables and their table spaces across a subset of database partitions in the system. An alternative to consider when you want tables to be positioned on each database partition, is to use materialized query tables and then replicate those tables. You can create a materialized query table containing the information that you need, and then replicate it to each node. Parallelism Components of a task, such as a database query, can be run in parallel to dramatically enhance performance. The nature of the task, the database configuration, and the hardware environment, all determine how DB2® will perform a task in parallel. These considerations are interrelated, and should be considered together when you work on the physical and logical design of a database. The following types of parallelism are supported by DB2: v I/O v Query v Utility Input/output parallelism When there are multiple containers for a table space, the database manager can exploit parallel I/O. Parallel I/O refers to the process of writing to, or reading from, two or more I/O devices simultaneously; it can result in significant improvements in throughput. Query parallelism There are two types of query parallelism: inter-query parallelism and intra-query parallelism. 26 Administration Guide: Planning
Inter-query parallelism refers to the ability of multiple applications to query a database at the same time. Each query executes independently of the others but DB2 executes all of them at the same time. DB2 has always supported this type of parallelism Intra-query parallelism refers to the simultaneous processing of parts of a single query, using either intra-partition parallelism, inter-partition parallelism, or both. Intra-partition parallelism Intra-partition parallelism refers to the ability to break up a query into multiple parts. Some DB2 utilities also perform this type of parallelism. into multiple parts, many or all of which can be run in parallel within a single database partition Figure 13 shows a query that is broken into four pieces that can be run in parallel, with the results returned more quickly than if the query were run in serial fashion. The pieces are copies of each other. To utilize intra-partition the degree of u must configure the database appropriately. You can choose parallelism or let the system do it for you. The degree of parallelism represents the number of pieces of a query running in parallel. SELECT. FROM A query is divided into parts, each being Database Partition Figure 13. Intra-partition Parallelism Inter-partition parallelism Inter-partition parallelism refers to the ability to break up a query into multiple parts across multiple partitions of a partitioned database, on one machine or multiple machines. The query is run in parallel. Some DB2 utilities also perform this type of parallelism Chapter 2 Parallel database systems 27
Inter-query parallelism refers to the ability of multiple applications to query a database at the same time. Each query executes independently of the others, but DB2 executes all of them at the same time. DB2 has always supported this type of parallelism. Intra-query parallelism refers to the simultaneous processing of parts of a single query, using either intra-partition parallelism, inter-partition parallelism, or both. Intra-partition parallelism Intra-partition parallelism refers to the ability to break up a query into multiple parts. Some DB2 utilities also perform this type of parallelism. Intra-partition parallelism subdivides what is usually considered a single database operation such as index creation, database loading, or SQL queries into multiple parts, many or all of which can be run in parallel within a single database partition. Figure 13 shows a query that is broken into four pieces that can be run in parallel, with the results returned more quickly than if the query were run in serial fashion. The pieces are copies of each other. To utilize intra-partition parallelism, you must configure the database appropriately. You can choose the degree of parallelism or let the system do it for you. The degree of parallelism represents the number of pieces of a query running in parallel. Inter-partition parallelism Inter-partition parallelism refers to the ability to break up a query into multiple parts across multiple partitions of a partitioned database, on one machine or multiple machines. The query is run in parallel. Some DB2 utilities also perform this type of parallelism. A query is divided into parts, each being executed in parallel. Database Partition Data SELECT... FROM... Figure 13. Intra-partition Parallelism Chapter 2. Parallel database systems 27
Inter-partition parallelism subdivides what is usually considered a single database operation such as index f which database loading, or SQL queries into multiple parts, many or all of can be run in parallel across multiple partitions of a partitioned database on one machine or on multiple machines Figure 14 shows a query that is broken into four pieces that can be run in parallel, with the results returned more quickly than if the query were run serial fashion on a single partition. The degree of parallelism is largely determined by the number of partitions you create and how you define your database partition groups SELECT. FROM Data Database Partition Database Partition Database Partition Database Partition Figure 14 Inter-partition Parallelism Simultaneous intra-partition and inter-partition parallelism You can use intra-partition parallelism and inter-partition parallelism at the same time. This combination provides two dimensions of parallelism, resulting in an even more dramatic increase in the speed at which queries are processed 28 Administration Guide: Planning
Inter-partition parallelism subdivides what is usually considered a single database operation such as index creation, database loading, or SQL queries into multiple parts, many or all of which can be run in parallel across multiple partitions of a partitioned database on one machine or on multiple machines. Figure 14 shows a query that is broken into four pieces that can be run in parallel, with the results returned more quickly than if the query were run in serial fashion on a single partition. The degree of parallelism is largely determined by the number of partitions you create and how you define your database partition groups. Simultaneous intra-partition and inter-partition parallelism You can use intra-partition parallelism and inter-partition parallelism at the same time. This combination provides two dimensions of parallelism, resulting in an even more dramatic increase in the speed at which queries are processed: Database Partition Database Partition Database Partition Database Partition A query is divided into parts, each being executed in parallel. Data Data Data Data SELECT... FROM... Figure 14. Inter-partition Parallelism 28 Administration Guide: Planning
SELECT. FROM executed Database Partition Database Partition Figure 15. Simultaneous Inter-partition and Intra-partition Parallelism Utility parallelism DB2 utilities can take advantage of intra-partition parallelism. They can also take advantage of inter-partition parallelism; where multiple database partitions exist, the utilities execute in each of the partitions in parallel The load utility can take advantage of intra-partition parallelism and I/O parallelism. Loading data is a CPU-intensive task. The load utility takes advantage of multiple processors for tasks such as parsing and formatting data. It can also use parallel I/O servers to write the data to containers in n a partitioned database environment, the LOAD command takes advantage of intra-partition, inter-partition, and I/O parallelism by parallel invocations at each database partition where the table resides During index creation, the scanning and subsequent sorting of the data occurs in parallel. DB2 exploits both I/o parallelism and intra-partition parallelism when creating an index. This helps to speed up in dex creation when a CREATE INDEX statement is issued, during restart (if an index is marked invalid), and during the reorganization of data Backing up and restoring data are heavily I/O-bound tasks. DB2 exploits both /0 parallelism and intra-partition parallelism when performing backup and Chapter 2 Parallel database systems 29
Utility parallelism DB2 utilities can take advantage of intra-partition parallelism. They can also take advantage of inter-partition parallelism; where multiple database partitions exist, the utilities execute in each of the partitions in parallel. The load utility can take advantage of intra-partition parallelism and I/O parallelism. Loading data is a CPU-intensive task. The load utility takes advantage of multiple processors for tasks such as parsing and formatting data. It can also use parallel I/O servers to write the data to containers in parallel. In a partitioned database environment, the LOAD command takes advantage of intra-partition, inter-partition, and I/O parallelism by parallel invocations at each database partition where the table resides. During index creation, the scanning and subsequent sorting of the data occurs in parallel. DB2 exploits both I/O parallelism and intra-partition parallelism when creating an index. This helps to speed up index creation when a CREATE INDEX statement is issued, during restart (if an index is marked invalid), and during the reorganization of data. Backing up and restoring data are heavily I/O-bound tasks. DB2 exploits both I/O parallelism and intra-partition parallelism when performing backup and Database Partition Database Partition A query is divided into parts, each being executed in parallel. Data Data SELECT... FROM... SELECT... FROM... SELECT... FROM... Figure 15. Simultaneous Inter-partition and Intra-partition Parallelism Chapter 2. Parallel database systems 29
restore operations. Backup exploits I/O parallelism by reading from multiple table space containers in parallel, and asvnchronous ly writing to multiple backup media in pa arable Related concepts "Partition and processor environments"on page 30 Partition and processor environments This section provides an overview of the following hardware environment Single partition on a single processor(uniprocessor) Single partition with multiple processors(SMP) Multiple partition configu Partitions with one processor (MPP) Partitions with multiple processors(cluster of SMPs) Logical database partitions (also known as Multiple Logical Nodes, or MLN, in DB2 Parallel Edition for AIX@ Version 1) Capacity and scalability are discussed for each environment. Capacity refers to he number of users and applications able to access the database. This is in arge part determined by memory, agents, locks, I/O, and storage management Scalability refers to the ability of a database to grow and continue to exhibit the same operating characteristics and response times Single partition on a single processor This environment is made up of memory and disk, but contains only a single CPU (see Figure 16 on page 31). It is referred to by many different names, including stand-alone database, client/server database, serial database, uniprocessor system, and single node or non-parallel environment. The database in this environment serves the needs of a department or small office, where the data and system resources(including a single processor or CPU) are managed by a single database manager 0 Administration Guide: Planning
restore operations. Backup exploits I/O parallelism by reading from multiple table space containers in parallel, and asynchronously writing to multiple backup media in parallel. Related concepts: v “Partition and processor environments” on page 30 Partition and processor environments This section provides an overview of the following hardware environments: v Single partition on a single processor (uniprocessor) v Single partition with multiple processors (SMP) v Multiple partition configurations – Partitions with one processor (MPP) – Partitions with multiple processors (cluster of SMPs) – Logical database partitions (also known as Multiple Logical Nodes, or MLN, in DB2® Parallel Edition for AIX® Version 1) Capacity and scalability are discussed for each environment. Capacity refers to the number of users and applications able to access the database. This is in large part determined by memory, agents, locks, I/O, and storage management. Scalability refers to the ability of a database to grow and continue to exhibit the same operating characteristics and response times. Single partition on a single processor This environment is made up of memory and disk, but contains only a single CPU (see Figure 16 on page 31). It is referred to by many different names, including stand-alone database, client/server database, serial database, uniprocessor system, and single node or non-parallel environment. The database in this environment serves the needs of a department or small office, where the data and system resources (including a single processor or CPU) are managed by a single database manager. 30 Administration Guide: Planning