1 CHAPTER 7 2 Interaction Between the num_threads 3 Clause and omp_set_dynamic The following example demonstrates the numthreads clause and the effect of the 5 omp_set_dynamic routine on it. 678 amic adjustment of th nu ations that support 9 is free to abort the program or to supply any number of threads available. C/C++ 10 Example nthrs_dynamic.Ic int main(omp.h #include num_threads(10) /t do work here return 0; C/C++ 20
1 CHAPTER 7 2 Interaction Between the num_threads 3 Clause and omp_set_dynamic 4 The following example demonstrates the num_threads clause and the effect of the 5 omp_set_dynamic routine on it. 6 The call to the omp_set_dynamic routine with argument 0 in C/C++, or .FALSE. in Fortran, 7 disables the dynamic adjustment of the number of threads in OpenMP implementations that support 8 it. In this case, 10 threads are provided. Note that in case of an error the OpenMP implementation 9 is free to abort the program or to supply any number of threads available. C / C++ 10 Example nthrs_dynamic.1c S-1 #include <omp.h> S-2 int main() S-3 { S-4 omp_set_dynamic(0); S-5 #pragma omp parallel num_threads(10) S-6 { S-7 /* do work here */ S-8 } S-9 return 0; S-10 } C / C++ 20
Fortran Example nthrs_dynamic.If s-1 PROGRAM EXAMPLE INCLUDE "omp_lib.h" !or USE OMP_LIB CALL OMP_SET_DYNAMIC (.FALSE. 8 SOMP PARALLEL NUM_THREADS(10) !do work here SOMP END PARALLEL END PROGRAM EXAMPLE Fortran The call to the omp_set_dynamic routine with a non-zero argument in C/C++,or.TRUE.in Fortran,allows the OpenMP implementation to choose any number of threads between 1 and 10. C/C++ Example nthrs_dynamic.2c 2 #include <omp.h> int main() omp_set_dynamic(1); #pragma omp parallel num_threads(10) /do work here return 0; C/C++ Fortran Example nthrs_dynamic.2f S-1 PROGRAM EXAMPLE INCLUDE "omp_lib.h" !or USE OMP_LIB CALL OMP_SET_DYNAMIC (TRUE. 85 !SOMP PARALLEL NUM_THREADS(10) do work here SOMP END PARALLEL END PROGRAM EXAMPLE Fortran 67 It is good practice to set the dyn-var ICV explicitly by calling the omp_set_dynamic routine,as its default setting is implementation defined. CHAPTER 7.INTERACTION BETWEEN THE NUM THREADS CLAUSE AND OMPSET_DYNAMIC 21
Fortran 1 Example nthrs_dynamic.1f S-1 PROGRAM EXAMPLE S-2 INCLUDE "omp_lib.h" ! or USE OMP_LIB S-3 CALL OMP_SET_DYNAMIC(.FALSE.) S-4 !$OMP PARALLEL NUM_THREADS(10) S-5 ! do work here S-6 !$OMP END PARALLEL S-7 END PROGRAM EXAMPLE Fortran 2 The call to the omp_set_dynamic routine with a non-zero argument in C/C++, or .TRUE. in 3 Fortran, allows the OpenMP implementation to choose any number of threads between 1 and 10. C / C++ 4 Example nthrs_dynamic.2c S-1 #include <omp.h> S-2 int main() S-3 { S-4 omp_set_dynamic(1); S-5 #pragma omp parallel num_threads(10) S-6 { S-7 /* do work here */ S-8 } S-9 return 0; S-10 } C / C++ Fortran 5 Example nthrs_dynamic.2f S-1 PROGRAM EXAMPLE S-2 INCLUDE "omp_lib.h" ! or USE OMP_LIB S-3 CALL OMP_SET_DYNAMIC(.TRUE.) S-4 !$OMP PARALLEL NUM_THREADS(10) S-5 ! do work here S-6 !$OMP END PARALLEL S-7 END PROGRAM EXAMPLE Fortran 6 It is good practice to set the dyn-var ICV explicitly by calling the omp_set_dynamic routine, as 7 its default setting is implementation defined. CHAPTER 7. INTERACTION BETWEEN THE NUM_THREADS CLAUSE AND OMP_SET_DYNAMIC 21
1 CHAPTER 8 The proc_bind Clause The following examples demonstrate hov use the proc bind clause to control the thread 45 binding for a team of threads in a parallel region. The machine architecture is depicted in the figure below.It consists of two sockets,each equipped with a quad-core processor and configured 6 to execute two hardware threads simultaneously on each core.These examples assume a contiguous core numbering starting from 0,such that the hardware threads 0,I form the first physical core. socket w/ physical core w/2 4 physical cores hardware threads 03 p4 p5 p6 p7 O od d bod bd 8 The following equivalent place list declarations consist of eight places (which we designate as p0 to 9 p7: 10 OMP. PLACES:="(0,1},{2,3},{4,5),{6,7,(8,9},{10,11},{12,13},{14,15} 11 or 12 OP_PLACES:="(0:2}:8:2" 138.1 Spread Affinity Policy The following example shows the result of the spread affinity policy on the partition list when the number of threads is ess of places s in the parent's place partition,for 22
1 CHAPTER 8 2 The proc_bind Clause 3 The following examples demonstrate how to use the proc_bind clause to control the thread 4 binding for a team of threads in a parallel region. The machine architecture is depicted in the 5 figure below. It consists of two sockets, each equipped with a quad-core processor and configured 6 to execute two hardware threads simultaneously on each core. These examples assume a contiguous 7 core numbering starting from 0, such that the hardware threads 0,1 form the first physical core. p0 p1 p2 p3 physical core w/ 2 hardware threads socket w/ 4 physical cores p4 p5 p6 p7 8 The following equivalent place list declarations consist of eight places (which we designate as p0 to 9 p7): 10 OMP_PLACES="{0,1},{2,3},{4,5},{6,7},{8,9},{10,11},{12,13},{14,15}" 11 or 12 OMP_PLACES="{0:2}:8:2" 13 8.1 Spread Affinity Policy 14 The following example shows the result of the spread affinity policy on the partition list when the 15 number of threads is less than or equal to the number of places in the parent’s place partition, for 22
the machine architecture depicted above.Note that the threads are bound to the first place of each subpartition. C/C++ Example affinity.Ic s-1 void work(); S-2 int main() S 4 #pragma omp parallel proc_bind(spread)num_threads(4) work ( return 0; C/C++ Fortran Example affinity.If s-1 PROGRAM EXAMPLE !SOMP PARALLEL PROC_BIND (SPREAD)NUM_THREADS(4) S-3 CALL WORK ( s-4 !SOMP END PARALLEL S-5 END PROGRAM EXAMPLE A Fortran 5 It is unspecified on which place the master thread is initially started.If the master thread is initially started on po,the following placement of threads will be applied in the parallel region: 8 .thread 0executes on p with the place partition po.p .thread I executes on p2 with the place partition p2.p3 9 thread 2 executes on p4 with the place partition p4.p5 .thread 3 executes on p6 with the place partition p6.p7 2 If the master thread would initially be started on p2,the placement of threads and distribution of the place partition would be as follows: 13 thread 0 executes on p2 with the place partition p2.p3 .thread 1 executes on p4 with the place partition p4.p5 1 .thread 2 executes on p6 with the place partition p6.p7 16 thread3 executes on po with the place partition po.pl CHAPTER 8.THE PROC_BIND CLAUSE 23
1 the machine architecture depicted above. Note that the threads are bound to the first place of each 2 subpartition. C / C++ 3 Example affinity.1c S-1 void work(); S-2 int main() S-3 { S-4 #pragma omp parallel proc_bind(spread) num_threads(4) S-5 { S-6 work(); S-7 } S-8 return 0; S-9 } C / C++ Fortran 4 Example affinity.1f S-1 PROGRAM EXAMPLE S-2 !$OMP PARALLEL PROC_BIND(SPREAD) NUM_THREADS(4) S-3 CALL WORK() S-4 !$OMP END PARALLEL S-5 END PROGRAM EXAMPLE Fortran 5 It is unspecified on which place the master thread is initially started. If the master thread is initially 6 started on p0, the following placement of threads will be applied in the parallel region: 7 • thread 0 executes on p0 with the place partition p0,p1 8 • thread 1 executes on p2 with the place partition p2,p3 9 • thread 2 executes on p4 with the place partition p4,p5 10 • thread 3 executes on p6 with the place partition p6,p7 11 If the master thread would initially be started on p2, the placement of threads and distribution of the 12 place partition would be as follows: 13 • thread 0 executes on p2 with the place partition p2,p3 14 • thread 1 executes on p4 with the place partition p4,p5 15 • thread 2 executes on p6 with the place partition p6,p7 16 • thread 3 executes on p0 with the place partition p0,p1 CHAPTER 8. THE PROC_BIND CLAUSE 23
12 The following example illustrates the spread thread affinity policy when the number of threads i greater than the number of places in the parent's place partition. Let T be the number of threads in the team,and P be the number of places in the parent's place 4 56 PTp around. C/C++ 7 Example affinity.2c void work() void foo() #pragma omp parallel num_threads(16)proc_bind(spread) work ( C/C++ Fortran 8 Example affinity.2f S-1 subroutine foo 2 Somp parallel num_threads(16)proc_bind(spread) S-3 call work() !Somp end parallel end subroutine Fortran It is unspecified on which place the master thread is initially started.If the master thread is initially b started on po,the following placement of threads will be applied in the parallel region: .threads 0.1 execute on p with the place partition po 12 .threads2.3 execute on pl with the place partition pl .threads 4.5 execute on p2 with the place partition p2 threads 6.7 execute on p3 with the place partition p3 .threads89 execute on p4 with the place partition p4 threads 10.11 execute on p5 with the place partition p5 17 threads 12.13 execute on p6 with the place partition p6 threads 14.15 execute on p7 with the place partition p7 28 If the master thread would initially be started on p2,the placement of threads and distribution of the place partition would be as follows: 21 .threads 0.1 execute on p2 with the place partition p2 24 OpenMP Examples Version 4.0.2-March 2015
1 The following example illustrates the spread thread affinity policy when the number of threads is 2 greater than the number of places in the parent’s place partition. 3 Let T be the number of threads in the team, and P be the number of places in the parent’s place 4 partition. The first T/P threads of the team (including the master thread) execute on the parent’s 5 place. The next T/P threads execute on the next place in the place partition, and so on, with wrap 6 around. C / C++ 7 Example affinity.2c S-1 void work(); S-2 void foo() S-3 { S-4 #pragma omp parallel num_threads(16) proc_bind(spread) S-5 { S-6 work(); S-7 } S-8 } C / C++ Fortran 8 Example affinity.2f S-1 subroutine foo S-2 !$omp parallel num_threads(16) proc_bind(spread) S-3 call work() S-4 !$omp end parallel S-5 end subroutine Fortran 9 It is unspecified on which place the master thread is initially started. If the master thread is initially 10 started on p0, the following placement of threads will be applied in the parallel region: 11 • threads 0,1 execute on p0 with the place partition p0 12 • threads 2,3 execute on p1 with the place partition p1 13 • threads 4,5 execute on p2 with the place partition p2 14 • threads 6,7 execute on p3 with the place partition p3 15 • threads 8,9 execute on p4 with the place partition p4 16 • threads 10,11 execute on p5 with the place partition p5 17 • threads 12,13 execute on p6 with the place partition p6 18 • threads 14,15 execute on p7 with the place partition p7 19 If the master thread would initially be started on p2, the placement of threads and distribution of the 20 place partition would be as follows: 21 • threads 0,1 execute on p2 with the place partition p2 24 OpenMP Examples Version 4.0.2 - March 2015