Presentation is loading. Please wait.

Presentation is loading. Please wait.

使用Intel® Thread Checker 纠正线程化中的错误

Similar presentations


Presentation on theme: "使用Intel® Thread Checker 纠正线程化中的错误"— Presentation transcript:

1 使用Intel® Thread Checker 纠正线程化中的错误
杨全胜 东南大学成贤学院计算机系

2 内容 什么是Intel® Thread Checker(线程检查器)? 使用Intel® Thread Checker 检测竞争条件
一些其他的线程化错误 检查库的线程安全性 Thread Checker的其他特性

3 什么是Intel® Thread Checker
是一个用在多线程应用程序上的调试工具. 能在Windows线程, POSIX线程和OpenMP线程应用程序中找出错误. 检测潜在的线程相关性错误,即使他们没有出现 他是Intel® VTuneTM 性能分析器中的一个插件,具有和性能分析器相同的外观,感觉和界面 能够减少错误检测和隔离的时间 你可以把Intel Thread Checker当做: 设计辅助工具 调试辅助工具 Introduce the slide: Multithreaded applications enhance the performance of your application by enabling parallel execution. However, multithreading has its disadvantages because you may introduce bugs while threading an error-free serial application. Introduce Intel Thread Checker: Intel Thread Checker detects potential threading related bugs even if they do not occur. Therefore, it helps you create correct and safe multithreaded applications. Intel Thread Checker is a debugging tool used on threaded applications. It can detect threading bugs in Windows threads, POSIX threads, and OpenMP threaded applications. It is a plug-in to VTune Performance Analyzer with the same look, feel, and interface as the VTune Analyzer environment. With traditional methods, locating threading bugs may take a very long time. In fact, debugging tools and techniques may even hide the effects of threading bugs, making it even more difficult to establish the cause of errors. However, with Intel Thread Checker, you can reduce the turnaround time for bug detection and isolation. Intel Thread Checker can be used as the following: Design aid: You can create a prototype application with OpenMP by adding appropriate pragmas. Intel Thread Checker can identify conflicts in the application and generate a report. This report helps you analyze the issues and find solutions during the design phase of your application. Debug aid: You can identify actual and potential bugs in threaded applications.

4 什么是Intel® Thread Checker
支持不同的编译器 Intel® C++和Fortran编译器,v7和更高的版本 Microsoft Visual C++, v6 Microsoft Visual C++.NET 2002, 2003, 和2005 版本 提供强大而直观的用户接口 标识出线程问题的细节 映射出潜在的错误 为用户定义同步原语提供API Explain the features of Intel Thread Checker: Supports various compilers: Intel Thread Checker works with compilers such as: Intel® C++ and Fortran compilers, v7 and higher versions Microsoft Visual C++, v6 Microsoft Visual C++ .NET 2002, 2003, and 2005 Editions Intel Thread Checker can be integrated into Microsoft Visual Studio .NET IDE. Provides a powerful and an intuitive user interface: Intel Thread Checker displays the threading diagnostics in a list. You can categorize, arrange, and sort the diagnostics according to your requirements. A summarizing histogram provides a visual comparison of the amount of diagnostics in any categories. Identifies the threading issues in detail: Intel Thread Checker identifies five types of threading issues: error, warning, caution, information, and remark. If the debug information is available, the tool identifies the source location. Intel Thread Checker also provides one-click help for diagnostics. Maps potential errors: Intel Thread Checker uses an advanced error-detection engine to identify data races and deadlocks. It helps you design effective threaded applications. You can also find errors that may not occur during your manual testing. Provides an application program interface (API) for user-defined synchronization primitives: Intel Thread Checker does not identify user-defined synchronization operations. You can use Intel Thread Checker API functions to identify the start and end of critical regions.

5 什么是Intel® Thread Checker
是你运行软件时候的一个动态的过程 监测运行的活动,检测线程的问题,比如数据竞争、死锁和线程挂起 监测下列情况: 线程和同步API 线程执行顺序 调度的影响结果 线程间存储器访问

6 什么是Intel® Thread Checker
检测指令: 检测指令增加到库,调用它记录线程执行、同步API和存储器访问的信息 工作量选择: 推荐使用可能的最小的数据集来降低应用程序的执行时间 工作量的选择是很重要的

7 什么是Intel® Thread Checker
检测指令 增加进来有利于Intel® Thread Checker库调用用于跟踪软件. 记录线程信息,比如线程执行顺序和以及调用同步API和存储访问的信息 增加了代码的大小和运行应用程序的时间 两种类型的检测指令: 二进制 源码

8 什么是Intel® Thread Checker
二进制检测指令 在运行的时候被加入到已经建立了2进制的模块中 有效的支持程序分析、调试、安全和模拟 可被任何支持的编译器用来编译软件 需要连接代码的时候使用/fixed:no 开关 运行应用程序 必须在Thread Checker中运行 当执行的时候,应用程序被添加检测指令 外部DLL在使用时做为检测指令添加 Details Binary instrumentation adds code to application at thread API calls, memory access points, etc. (This is why the /fixed:no link option is required.)

9 什么是Intel® Thread Checker
源码检测指令 引用源码检测指令 只支持Intel® 编译器,比如Intel C++或Fortran. 需要添加/Qtcheck 作为编译器开关来允许在线程应用中增加源码检测指令 执行应用程序 在VTune™ 环境中运行 在Windows* 命令行执行 数据收集到threadchecker.thr结果文件中 在VTune™ 环境中查看结果文件(.thr file) 附加的DLL不能被分析和增加检测指令 Details Only available from Intel compilers. Background If an application takes a long time to build, it may not be feasible to use source instrumentation on the whole of the source. However, if the binary was compiled and linked correctly, an initial pass can be done with binary instrumentation. This will at least reveal the modules that have threading errors. If they problem cannot be identified from these results, the specific source files involved can be recompiled with source instrumentation and the application re-run though TC. The source instrumentation will be able to give more details, like variable names, about the problems that were detected

10 什么是Intel® Thread Checker
工作指导 每个线程执行问题代码一次来作识别 使用可能的最小的工作数据集 最小化数据集大小 较小的映像尺寸 最小化循环迭代或次数 模拟几分钟,而不是模拟几天 最小化修改率 较低的帧率

11 内容 什么是Intel® Thread Checker 使用Intel® Thread Checker 检测竞争条件
一些其他的线程化错误 检查库的线程安全性 Thread Checker的其他特性

12 使用Intel® Thread Checker
编译 使用动态连接线程安全运行时库 (/MD, /MDd) 产生符号信息(/Zi, /ZI, /Z7) 禁止优化(/Od) Link 保持符号信息(/debug) 指定可再定位的代码段(/fixed:no) Details Use symbols so that Thread Checker can show source code. Aggressive levels of optimization can modify the order of code within an application. Disabling optimization will keep the binary code closer to the originally written source. Thus, with symbol information included, it will be much easier for Thread Checker to point out the lines of source that are involved in any threading errors that are identified. If a code displays threading errors with high-level optimizations (/O3), but doesn’t have any errors with optimization disabled, the problem is more likely with the compiler than with the threads in the application. (Besides, when using TC, we are not interested in the performance of the app, except to be sure we get some answers in a reasonable amount of time.)

13 使用Intel® Thread Checker

14 使用Intel® Thread Checker

15 使用Intel® Thread Checker

16 使用Intel® Thread Checker
The various severity categories in order of priority are: Severity Rank Name Description and Impact Examples 4 Error Indicates a likely or actual problem in your program. Errors have the highest priority impact, so you should look at this group first. Data races, deadlocks, and other serious issues fall into this category. 3 Warning Indicates situations that probably will not result in incorrect behavior of your program, but may benefit from fixes. Inaccessible memory. 2 Caution May or may not be an issue; indicates that something is unusual. A thread trying to release a lock which it does not own, or a notify operation that occurred when no other thread was waiting for it, making it a no-op. 1 Informational Conveys general information that is specific to your program. Messages indicating the amount of stack space allocated. Remark Conveys notes that generally do not contain specific data about your program. However, they may provide general information that may apply to your program. Too many errors to display.

17 使用Intel® Thread Checker
你可以按照下面的方法在诊断列表中将数据列分组: 任务 动作 Group diagnostics 拖放列的头到列表顶部的灰色区域. 筛选一些视图中的诊断 右键点击打开弹出式菜单并选择Filter Diagnostic. 然后你可以看到筛选的结果 排序诊断 点击列的头来根据那一列排序数据。缺省情况下,列被按内容成组 加一列 右键点击并选择Show Column. 删除一列 右键点击并选择Hide Column. 看相应的源代码 双击一个诊断,打开源码视图 理解诊断 右键点击一个诊断并选择Diagnostic Help. 理解列 右键点击一个列并选择Column Help.

18 使用Intel® Thread Checker
诊断分组视图

19 使用Intel® Thread Checker

20 使用Intel® Thread Checker
1) 右键点这里. . . 2) 更多的帮助!

21 内容 什么是Intel® Thread Checker 使用Intel® Thread Checker 检测竞争条件
一些其他的线程化错误 检查库的线程安全性 Thread Checker的其他特性

22 检测竞争条件 活动 1A: 发现可能的数据竞争 目标: 额外活动: 讨论的问题:
使用Intel® Thread Checker发现一个实际模型代码中的数据竞争 额外活动: 从一些诊断中检查源代码行 讨论的问题: 为什么在代码的某些行有这样的冲突?

23 检测竞争条件 三种类型的依赖: 在S1和S2间的流依赖(Flow dependence)或者写-读冲突
S2和S3间的反依赖(Anti-dependence)或读-写冲突 S3和S4间的输出依赖(Output dependence)或写-写冲突 S1: A=1.0; S2: B=A+3.14; S3: A=1/3*(C–D); S4: A=(B*3.8)/2.7;

24 检测竞争条件 流依赖或写-读冲突: 一个线程修改一个变量,随后另一线程读该变量
反依赖或读-写冲突: 一个线程读一个变量随后有别的线程修改该变量 输出依赖或写-写冲突: 一个线程要修改一个变量,随后另一个线程也修改这个变量

25 检测竞争条件 竞争条件 执行顺序是假定了,但却不能保证按这个顺序执行 多线程程序最常见的错误 不是在每一次都回出现的错误
多个线程对相同变量的并发访问 多线程程序最常见的错误 不是在每一次都回出现的错误

26 检测竞争条件 解决竞争条件 方法1: 在线程中的局部变量 什么时候用: 如何实现:
当一个被认为具有潜在的数据竞争的变量值只使用在并行区域内部的时候 当用于临时性的工作变量的时候 如何实现: 使用OpenMP scoping子句 在线程函数中声明变量 在线程栈中分配变量 使用线程局部存储器(TLS) API. Explain the first method to solve race conditions: One way to solve race conditions is to scope variables local to threads. This is a good solution when the value of the variable identified as involved in a potential data race is used inside the parallel region only. You can also use this method when dealing with temporary or work variables such as those used to keep partial sums. Explain the various ways to implement this method: Use the OpenMP scoping clauses: OpenMP provides scoping clauses such as private, which can make variables local to threads. Declare variables within threaded functions: You can explicitly declare variables that will remain local to each thread. Allocate variables on thread stack: If Intel Thread Checker identifies a variable as a potential data race, you can allocate that variable on thread stack by using the alloca() function. Use Thread Local Storage (TLS) API: The TLS API is present in Windows threads and Pthreads. It guarantees that storage is only accessible to each individual thread.

27 检测竞争条件 解决竞争条件 方法2: 用临界区控制访问 什么时候用: 如何实现:
当一个被认为有潜在数据竞争的变量值在并行区域 内部和外部被使用的时候 每个线程要求修改相同共享变量的时候 如何实现: 使用同步对象(mutex, semaphore和Critical Section). 使用同步结构(critical和atomic).

28 检测竞争条件 活动1B: 解决数据竞争 目标: 讨论的问题: 使用简单的线程技术解决先前实验中发现的数据竞争
线程代码的结果和串行程序的输出是一样的嘛?

29 内容 什么是Intel® Thread Checker 使用Intel® Thread Checker 检测竞争条件
一些其他的线程化错误 检查库的线程安全性 Thread Checker的其他特性

30 Thread Checker作为线程化助手
用下述方法使用Intel® Thread Checker作为线 程化的助手: 使用OpenMP 作为插入线程化的原型 编译,并在Intel Thread Checker中执行程序 审查诊断,以识别你源程序中的问题区域 重建代码或者根据诊断保护对共享变量的访问

31 内容 什么是Intel® Thread Checker 使用Intel® Thread Checker 检测竞争条件
一些其他的线程化错误 检查库的线程安全性 Thread Checker的其他特性

32 一些其他线程化的错误 死锁 死锁是当一个线程等待一个永远不会发生的事件的时候发生的情况 死锁的最常见原因是锁的层次

33 一些其他线程化的错误 死锁的例子 DWORD WINAPI threadA(LPVOID arg)
{ EnterCriticalSection(&L1); EnterCriticalSection(&L2); processA(data1, data2); LeaveCriticalSection(&L2); LeaveCriticalSection(&L1); return(0); } DWORD WINAPI threadB(LPVOID arg) { EnterCriticalSection(&L2); EnterCriticalSection(&L1); processB(data2, data1); LeaveCriticalSection(&L1); LeaveCriticalSection(&L2); return(0); } Present the first example code on the slide to illustrate a locking hierarchy. In the code sample, code in the threadA() function uses data elements, data1 and data2. The Critical Section L1 protects the data element data1, and the Critical Section L2 protects the data element data2. Thread A first calls EnterCriticalSection() with L1 and locks it. Then, thread A calls EnterCriticalSection() with L2 and locks it. After using data1 and data2 to execute the processA() function, thread A unlocks the Critical Sections L2 followed by L1. Present the second code sample on the slide. In the second code sample, thread B enters the threadB() function, which uses data1 and data2. Here again, the Critical Section L1 protects data1, and the Critical Section L2 protects the data element data2. However, in this case, thread B first calls EnterCriticalSection() with L2 and locks it. Then, thread B calls the EnterCriticalSection() function with L1 and locks it. After using data1 and data2 to execute the processB() function, thread B unlocks the Critical Sections L1 followed by L2. When threads A and B run in parallel, you may have the situation where thread A locks the Critical Section L1 and thread B locks the Critical Section L2, simultaneously. There is no conflict up to this point. However, when thread A tries to lock L2 and thread B tries to lock L1, conflict occurs. This is because thread B already holds L2, and thread A already holds L1. This results in a deadlock. If a single programmer writes both the threadA() and threadB() functions, such a deadlock is not likely to happen. However, if two programmers write the two functions separately, such a locking hierarchy is more likely to occur. ThreadA: L1, then L2 ThreadB: L2, then L1

34 一些其他线程化的错误 另一个死锁的例子 typedef struct { // some data things
SomeLockType mutex; } shape_t; shape_t Q[1024]; void swap(shape_t A,shape_t B) { lock(A.mutex); lock(B.mutex); // Swap data between A & B unlock(B.mutex); unlock(A.mutex); } Consider another example where an array has a large number of elements and threads have to frequently update elements. You can create a mutual exclusion on an element in the array by locking the entire array. This prevents other threads from accessing other elements in the array and, as a result, lowers performance. When you lock the entire array for one thread, you lock the application’s potential for parallel execution. To improve performance, you can add individual locks to individual elements. In this case, threads that do not update the same element will not interfere with each other and simultaneous updates may be done in parallel. In the declaration sample, the structure type, shape_t, contains some data elements and a lock. Also, there is an array Q[] of 1024 shape_t elements. Each element in Q[] has its own lock. Now, consider that you want to perform a swap operation of two elements in the array Q[]. Consider the following code sample, which displays the swap operation. In the code sample, the swap() function locks the two shape_t elements A and B, respectively. Then, it swaps data between A and B and unlocks B followed by A. As a result, there is no conflict if two threads, threads 1 and 2, try to execute the swap() function on different elements. Consider the following situation. Suppose thread 1 tries to swap the thirty-fourth element, Q[34], with the ninety-eighth element, Q[98], and at the same time, thread 2 tries to swap the ninety-eighth element, Q[98], with the thirty-fourth element, Q[34]. In such a scenario, the code does create a locking hierarchy. Thread 1 grabs the mutex at Q[34], and thread 2 grabs the mutex at Q[98]. Both these threads lock their first mutexes, which are then desired by the other thread. Therefore, even if you program correctly, potential locking hierarchies may still exist. Question for Discussion: Is there any way to ensure that such a locking hierarchy does not happen? Answer: Any solution requires checking the memory address of each lock and then locking it in some order, such as locking the lowest memory address first. If such a condition happens only one out of a hundred million times, it may add overhead. swap(Q[34],Q[98]); Thread 1 swap(Q[98],Q[34]); Thread 2 Grabs mutex 34 Grabs mutex 98

35 一些其他线程化的错误 线程停顿 线程停顿是指这种情况,当一个线程因为别的线程的原因而等待很长的时间
悬挂的锁能产生线程停顿。当一个线程锁住一个资源并且在线程释放资源之前由于异常而终止的时候出现悬挂的锁 ! 注意 确认线程在任何情况下(即使异常终止)都能释放所有的锁来 避免死锁和线程停顿

36 一些其他线程化的错误 线程停顿 你可能希望线程在下列情况下等待: 主线程产生工作线程并在它们并发执行的末尾等待它们
线程因为同步等待在一个栅障上

37 一些其他线程化的错误 永远没有释放的锁 怎么错了? int data;
DWORD WINAPI threadFunc(LPVOID arg) { int localData; EnterCriticalSection(&lock); if (data == DONE_FLAG) return(1); localData = data; LeaveCriticalSection(&lock); process(local_data); return(0); } 永远没有释放的锁

38 一些其他线程化的错误 活动2: 识别和定位死锁 目标: 讨论的问题:
寻找实际的和潜在的死锁并确定使用Intel® Thread Checker定位错误 讨论的问题: 任何线程软件开发需要什么样的创建(build)选项? 什么样的创建选项被要求用来创建二进制和源码检测指令? /Qtcheck be used for source instrumentation binary instrumentation not use it

39 内容 什么是Intel® Thread Checker 使用Intel® Thread Checker 检测竞争条件
一些其他的线程化错误 检查库的线程安全性 Thread Checker的其他特性

40 检查库的线程安全性 理解线程安全性 一个代码样本或例程是线程安全的,如果多个线程试图同时执行它的时候它的功能保持正确
要测试线程安全性,完成下面的工作: Intel® Thread Checker中使用OpenMP模拟来确定任何潜在的冲突 使用OpenMP子句在多个线程中产生并行执行

41 检查库的线程安全性 例子 – 检查线程安全性 在以下两者间检查安全问题: 设置子句来测试所有的排列 依然需要提供数据集来使用代码的相应部分
routine1()的多个实例 routine1()和routine2()实例 设置子句来测试所有的排列 依然需要提供数据集来使用代码的相应部分 #pragma omp parallel sections { #pragma omp section routine1(&data1); routine1(&data2); routine2(&data3); }

42 检查库的线程安全性 确认线程安全性的方法 可重入代码: 没有例程修改全局共享变量 互斥现象: 当例程修改时,共享变量被保护
如果第三方的库没有线程安全,怎么办? 可能需要控制线程访问库 ! 注意 较好的是写可重入代码而不是增加同步对象。 使用可重入代码改善性能,避免隐含的栅障和 其他的开销。 Explain the two ways to ensure thread-safety: Reentrant code: You can write your routines to be reentrant so that no globally shared variables are updated by the routine. Any variable that the routine changes must be local. You can write code in such a way that it can be interrupted during one task and reentered to perform another task. When the second task completes, the code can resume its original task. Mutual exclusion: If your code accesses or modifies shared variables, you can use mutual exclusion to avoid conflicts with other threads. Consider a situation where multiple threads try to access the shared stdout device to print messages. The printf library uses mutual exclusion to ensure that only one thread accesses stdout to print something. Highlight that it is better to write reentrant code than to add synchronization objects. Using reentrant code improves performance and avoids implicit barriers and potential overhead. Question for Discussion: How can you ensure that third-party libraries are thread-safe? Answer: You can: Read the library documentation. Contact library author, especially if documentation is unclear or does not address the issue. Test with Intel Thread Checker, especially if you do not trust the library author.

43 检查库的线程安全性 活动3: 为线程安全测试库: 使用Intel® Thread Checker来确定库是不是线程安全的

44 内容 什么是Intel® Thread Checker 使用Intel® Thread Checker 检测竞争条件
一些其他的线程化错误 检查库的线程安全性 Thread Checker的其他特性

45 Thread Checker的其他特性 检测指令级 检测指令级 描述 Full Image 模块中的每条指令都是用来检查诊断信息的检测指令
Custom Image 同Full Image. 而一个用户能禁止检测指令中选择函数 All Functions 为部分模块打开全检测指令来编译调试信息 Custom Functions 同All Functions. 而一个用户能禁止检测指令中选择函数 API Imports 只有系统API函数需要检测指令,用户代码没有 Module Imports 禁止检测指令. 这是系统映像的缺省设置,映像不基于再定位,也不包含调试信息.

46 Thread Checker的其他特性 检测指令级 较高的级别会增加存储应用和分析的时间, 但提供的分析更为详细
二进制检测指令比缺省的级别低,直到成功 人工调整检测指令的级别来增加速度或者控制收 集的信息量

47 Thread Checker的其他特性 最大的诊断数 如果你有5000诊断,你会做什么? 从哪开始调试? 所有的诊断消息同样重要/严重吗?
对组织和优先次序的建议 增加“1st Access”列 按“1st Access”成组 按“Short Description”列排序

48 Thread Checker的其他特性

49 Thread Checker的其他特性

50 将相同源码行的错误报告成组; 每个组可看成相同的问题
Thread Checker的其他特性 将相同源码行的错误报告成组; 每个组可看成相同的问题

51 按“Short description”排序
Thread Checker的其他特性 按“Short description”排序

52 Thread Checker的其他特性 有什么特点 容易引入线程错误 传统技术很难调试这些错误
Intel® Thread Checker 捕获这些错误 错误没发生就能被检测到 大大的减少调试时间 改善应用程序的健壮性


Download ppt "使用Intel® Thread Checker 纠正线程化中的错误"

Similar presentations


Ads by Google