1 Programming with Shared Memory 共享存储器程序设计 Part 2.

1 Programming with Shared Memory 共享存储器程序设计 Part 2

2 OpenMP OpenMP 是 1990s 后期由一群工业界的专家所开发的一个已被接受的标准。由一个小型的编译器制导指令集 compiler directives, 一个扩展的小型库例程 library routines, 和使用 Fortran and C/C++ 基本语言的环境变量 environment variables 组成. 现在有很多 OpenMP 编译器可用 (GNU gcc, IBM linux, Oracle, HP 等等）

3 扩展并行控制结构工作共享同步构造数据环境

4 OpenMP 不是建立在分布式存储系统上的OpenMP 应用编程接口 API 是在共享存储体系结构上基于线程的一个编程模型（不是建立在分布式存储系统上的） OpenMP programs 可创建多个线程所有的线程都能访问 global memory 数据可以所有的线程共享，也可以一个线程私有存在同步结构但不清晰标准性简洁实用使用方便可移植性

5 OpenMP 使用 “fork-join” 模型，且基于线程的. 开始，一个主线程 master thread. parallel directive 创建一组用特定代码区块供一组线程并行计算的线程. Other directives used within a parallel construct to specify parallel for loops and different blocks of code for threads.

6 parallel region Multiple threads parallel region Master thread Fork/join model Synchronization

7 对于 C/C++, OpenMP 命令含在 #pragma 语句中形式如下 : #pragma omp directive_name [clauses...] 其中 omp 是 OpenMP 的一个关键字. 在制导指令名 directive_name 后可以附加参数 parameters ( 子句 clauses) 等选项. #pragma ompdirective-name[clause,...]newline 制导指令前缀。对所有的 OpenMP 语句都需要这样的前缀。 OpenMP 制导指令。在制导指令前缀和子句之间必须有一个正确的 OpenMP 制导指令。子句。在没有其它约束条件下，子句可以无序，也可以任意的选择。这一部分也可以没有。换行符。表明这条制导语句的终止。

8 Parallel Directive 并行域结构 #pragma omp parallel structured_block // 语句形成的结构块，对每个线程执行结构块它将创建多线程，每个线程执行特定的 structured_block, structured_block 可以是一条语句也可以是用 {...} 创建的复合语句, 但必须只有一个入口，一个出口。在该结构结束处隐含一个 barrier. 该 parallel 命令就相当于以前提到的 forall 构造.

9 if (scalar_expression)if (scalar_expression) private (list)private (list) shared (list)shared (list) default (shared | none)default (shared | none) firstprivate (list)firstprivate (list) reduction (operator: list)reduction (operator: list) copyin (list)copyin (list) #pragma omp parallel [clause[[,]clause]…]newline clause=

10 #include #include int main() { #pragma omp parallel//OpenMP 指令开始一段 parallel {// 大括号必须新起一行 printf(“Hello, world! This is thread %d of %d\n", omp_get_thread_num(), omp_get_num_threads()); } } Hello World ！ gcc -fopenmp -o helloworld_omp helloworld_omp.c

11 gcc -fopenmp -o helloworld_omp helloworld_omp.c icc -openmp -o helloworld_omp helloworld_omp.c Intel 编译器 GNU gcc 编译器

12 #include #include #include int main(int argc, char* argv[]) { int nthreads,tid; // fork a team of thread #pragma omp parallel private(nthreads,tid)// 说明线程的私有变量 { //obtian and print thread id tid=omp_get_thread_num(); printf("Hello Word from OMP thread %d\n",tid); // only master thread does this; if(tid==0) { nthreads = omp_get_num_threads(); printf("Number of thread: %d\n",nthreads); } } return 0; }

13 Private and shared variables 私有和共享变量变量可以在每个 parallel 区声明，但 OpenMP 提供子句 private clause. int tid; … #pragma omp parallel private(tid) { tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); } 每个进程都有个局部变量 tid Also a shared clause available.

14 #include main () { int var1, var2, var3; Serial code … //Beginning of parallel section. Fork a team of threads. //Specify variable scoping #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads... All threads join master thread and disband } Resume serial code... } 常见的代码结构 #pragma omp parallel default(shared) private(var1, var2)

16 线程组中的线程数目下面的任一方法均可设定 : 1. parallel 命令后的 num_threads 子句 #pragma omp parallel num_threads(8) 2. omp_set_num_threads() 库例程 3. 环境变量 OMP_NUM_THREADS 如果不使用上述方法，可线程可取决于系统。动态调整： omp_set_num_dynamic(int num_threads)

17 Work-Sharing 共享任务结构共享任务结构将它所包含的代码划分给线程组的各成员来执行 1. 并行 for 循环 2. 并行 sections 3.single 串行执行在结构语句结束处有一个隐含的路障，使用了 nowait 子句除外 In all cases, there is an implicit barrier at end of construct unless a nowait clause included, which overrides the barrier. 在结构语句结束处有一个隐含的路障，使用了 nowait 子句除外 Note: These constructs do not start a new team of threads. That done by an enclosing parallel construct.

19 Sections 编译制导语句 sections 编译制导语句指定内部的代码被划分给线程组中的各线程 The construct #pragma omp sections { #pragma omp section structured_block. #pragma omp section structured_block } cause structured blocks to be shared among threads in team. The first section directive optional. 不同的 section 由不同的线程执行

20 Example #pragma omp parallel shared(a,b,c,d,nthreads) private(i,tid) { tid = omp_get_thread_num(); #pragma omp sections nowait { #pragma omp section { printf("Thread %d doing section 1\n",tid); for (i=0; i<N; i++) { c[i] = a[i] + b[i]; printf("Thread %d: c[%d]= %f\n",tid,i,c[i]); } #pragma omp section { printf("Thread %d doing section 2\n",tid); for (i=0; i<N; i++) { d[i] = a[i] * b[i]; printf("Thread %d: d[%d]= %f\n",tid,i,d[i]); } } /* end of sections */ } /* end of parallel section */ 一个线程做这段另一个线程做这段

21 #include #define N 1000 int main(){ int I, tid; float a[N], b[N], c[N],d[N]; /* vectors initializations */ for (i=0; i < N; i++) a[i] = b[i] = i * 1.0; 嵌入上页那段代码； } 你可以尝试一下编译此代码

22 For Loop 编译制导语句  for 语句指定紧随它的循环语句必须由线程组并行执行； #pragma omp for for ( i = 0; …. ) 分割 for 循环的方法可由一个附加的调度 “schedule” 语句. Example schedule (static, chunk_size) 将 for 循环按 chunk_size 所指明的大小进行分割，且以轮转的方式分给线程 ( 静态 ). For 循环必须是简单规范的形式

23 schedule 子句描述如何将循环的迭代划分给线程组中的线程 1.schedule (static, chunk_size), 循环被静态划分为大小为 chunk 的块，以轮转的方式分给线程，如果没有指定 chunk 大小，迭代会尽可能的平均分配给每个线程 2.schedule (dynamic, chunk_size), 循环被动态划分为大小为 chunk 的块，动态分配给线程 ( 当有进程空闲就分配一块 ) ，如果没有指定 chunk 大小就默认为 1 3. schedule (guided,chunk_size) 4. schedule (runtime)

24 Example #pragma omp parallel shared(a,b,c,nthreads,chunk) private(i,tid) { tid = omp_get_thread_num(); if (tid == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads); } printf("Thread %d starting...\n",tid); #pragma omp for schedule(dynamic,chunk) for (i=0; i<N; i++) { c[i] = a[i] + b[i]; printf("Thread %d: c[%d]= %f\n",tid,i,c[i]); } } /* end of parallel section */ For loop Executed by one thread

25 Single 编译制导语句 single 编译制导语句指定内部代码只有线程组中的一个线程执行。线程组中没有执行 single 语句的线程会一直等待代码块的结束，使用 nowait 子句除外 #pragma omp single structured block

26 Combined Parallel Work-sharing Constructs 组合的并行共享任务结构 If a parallel directive is followed by a single for directive, it can be combined with similar effects. Parallel for 编译制导语句表明一个并行域包含一个独立的 for 语句 #pragma omp parallel for

27 #include #define N 1000 #define CHUNKSIZE 100 int main () { int i, chunk; float a[N], b[N], c[N]; /* Some initializations */ for (i=0; i < N; i++) a[i] = b[i] = i * 1.0; chunk = CHUNKSIZE; #pragma omp parallel for shared(a,b,c,chunk) private(i) schedule(static,chunk) for (i=0; i < n; i++) c[i] = a[i] + b[i]; } Parallel for 举例

28 parallel sections 编译制导语句 parallel sections 编译制导语句表明一个并行域包含单独的一个 sections 语句 #pragma omp parallel sections { #pragma omp section structured_block #pragma omp section structured_block. } 注： parallel for 和 parallel sections 都不允许使用 nowait 语句

29 Master Directive The master directive: 新起一行 #pragma omp master 新起一行 structured_block master 制导语句指定代码段只有主线程执行语句格式 master 制导语句指定代码段 structured_block 只有主线程执行语句格式  不同于共享任务结构，在它的结构末尾处没有隐含的 barrier( 开始也没有）；其他线程遇到该制导指令和其结构块时不予理会，继续向前执行。

30 Reduction 子句  使用指定的操作对其列表中出现的变量进行归约将迭代的结果组合成一个值返回，很像 MPI 的 MPI _Reduce(). 该子句可用在 parallel, for, 和 sections 制导指令中例如 sum = 0 #pragma omp parallel for reduction(+:sum) for (k = 0; k < 100; k++ ) { sum = sum + funct(k); }初始时，每个线程都保留一份私有拷贝在结构尾部根据指定的操作对线程中的相应变量进行归约，并更新改变量的全局值 Operation Variable

31 #include int main () { int i, n, chunk; float a[100], b[100], result; /* Some initializations */ n = 100; chunk = 10; result = 0.0; for (i=0; i < n; i++) { a[i] = i * 1.0; b[i] = i * 2.0; } #pragma omp parallel for default(shared) private(i) schedule(static,chunk) reduction(+:result) for (i=0; i < n; i++) result = result + (a[i] * b[i]); printf("Final result= %f\n",result); } Reduction 举例 : 向量点乘

32 Private variables 私有变量表示它列出的变量对于每个线程是局部的 Private(variable_list) clause – 表示它列出的变量对于每个线程是局部的 creates private copies of variables for each thread firstprivate clause - as private clause but initializes each copy to the values given immediately prior to parallel construct. lastprivate clause – as private but “the value of each lastprivate variable from the sequentially last iteration of the associated loop, or the lexically last section directive, is assigned to the variable’s original object.”

33 Synchronization Constructs Critical 邻接制导语句 critical 制导语句表明域中的代码一次只能执行一个线程其他线程被阻塞在临界区 #pragma omp critical name structured_block name is optional.

34 #include main() { int x; x = 0; #pragma omp parallel shared(x) { #pragma omp critical x = x + 1; } /* end of parallel section */ }

35 Barrier barrier 制导语句用来同步一个线程组中所有的线程barrier 制导语句用来同步一个线程组中所有的线程先到达的线程在此阻塞，等待其他线程先到达的线程在此阻塞，等待其他线程 barrier 语句最小代码必须是一个结构化的块barrier 语句最小代码必须是一个结构化的块语句格式 #pragma omp barrier newline #pragma omp barrier newline 错误正确 if (x == 0) #pragma omp barrier if (x == 0) { #pragma omp barrier }

36 制导语句 Atomic 制导语句该制导语句指定特定的存储单元将被原子更新 The atomic directive #pragma omp atomic expression_statement 可以高效地实现一个临界区，如果临界区只是简单地更新一个变量 (expression_statement ：加 1 ，减 1 ，及其他简单算术操作 ).

37 Flush 刷新制导语句 flush 制导语句用以标识一个同步点，用以确保所有的线程看到一致的存储器视图语句格式 #pragma omp flush (list) newline flush 将在下面几种情形下隐含运行， nowait 子句除外 barrier critical: 进入与退出部分 ordered: 进入与退出部分 parallel: 退出部分 for: 退出部分 sections: 退出部分 single: 退出部分

38 Ordered 子句 Used in conjunction with for and parallel for directives to cause an iteration to be executed in the order that it would have occurred if written as a sequential loop.Ordered制导语句指出其所包含循环的执行任何时候只能有一个线程执行被 ordered 所限定部分只能出现在 for 或者 parallel for 语句的动态范围中语句格式： #pragma omp ordered newline

39 More information Full information on OpenMP at http://openmp.org/wp/

40 OpenMP 计算实例：计算 pi

41 C 语言写的串行程序 /* Seriel Code */ static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0; step = 1.0/(double) num_steps; for (i=1;i<= num_steps; i++){ x = (i-0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; }

42 使用并行域并行化的程序 #include static long num_steps = 100000; double step; #define NUM_THREADS 2 void main () { int i; double x, pi, sum[NUM_THREADS]; step = 1.0/(double) num_steps; omp_set_num_threads(NUM_THREADS) #pragma omp parallel { double x; int id; id = omp_get_thraead_num(); for (i=id, sum[id]=0.0;i< num_steps; i=i+NUM_THREADS){ x = (i+0.5)*step; sum[id] += 4.0/(1.0+x*x); } for(i=0, pi=0.0;i<NUM_THREADS;i++) pi += sum[i] * step; }

43 使用共享任务结构并行化的程序 #include static long num_steps = 100000; double step; #define NUM_THREADS 2 void main () { int i; double x, pi, sum[NUM_THREADS]; step = 1.0/(double) num_steps; omp_set_num_threads(NUM_THREADS) #pragma omp parallel { double x; int id; id = omp_get_thraead_num(); sum[id] = 0; #pragma omp for for (i=id;i< num_steps; i++){ x = (i+0.5)*step; sum[id] += 4.0/(1.0+x*x); } for(i=0, pi=0.0;i<NUM_THREADS;i++) pi += sum[i] * step; }

44 使用 private 子句和 critical 部分并行化的程序 #include static long num_steps = 100000; double step; #define NUM_THREADS 2 void main () { int i; double x, sum, pi=0.0; step = 1.0/(double) num_steps; omp_set_num_threads(NUM_THREADS) #pragma omp parallel private (x, sum) { id = omp_get_thread_num(); for (i=id,sum=0.0;i< num_steps;i=i+NUM_THREADS){ x = (i+0.5)*step; sum += 4.0/(1.0+x*x); } #pragma omp critical pi += sum }

45 使用并行归约得出的并行程序 #include static long num_steps = 100000; double step; #define NUM_THREADS 2 void main () { int i; double x, pi, sum = 0.0; step = 1.0/(double) num_steps; omp_set_num_threads(NUM_THREADS) #pragma omp parallel for reduction(+:sum) private(x) for (i=1;i<= num_steps; i++){ x = (i-0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; }

1 Programming with Shared Memory 共享存储器程序设计 Part 2.

Similar presentations

Presentation on theme: "1 Programming with Shared Memory 共享存储器程序设计 Part 2."— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

1 Programming with Shared Memory 共享存储器程序设计 Part 2.

Similar presentations

Presentation on theme: "1 Programming with Shared Memory 共享存储器程序设计 Part 2."— Presentation transcript:

Similar presentations

About project

反馈