Download presentation
Presentation is loading. Please wait.
1
Valgrind 使用方法与模拟重建测试 赵问问 中山大学物理学院 zhaoww2013@126.com 2019年6月20日 周四 下午
Valgrind 使用方法与模拟重建测试 赵问问 中山大学物理学院 2019年6月20日 周四 下午
2
概述 介绍valgrind简介,工具包与命令选项 使用valgrind检测简单C程序与错误分析 使用valgrind检测模拟重建过程
3
Valgrind简介 自由软件:内存调试、内存泄漏检测以及性能分析 未知行为 检测,函数和内存分析, 数据竞争条件侦测, 内存泄露检查工具
未知行为 检测,函数和内存分析, 数据竞争条件侦测, 内存泄露检查工具 速度拖慢10~50倍 $ sudo apt install valgrind $ sudo yum install valgrind 内 核
4
Valgrind 工具包 valgrind是一个调试和剖析的软件工具集 用法:valgrind (选项) (我的程序) (我的程序操作)
例如:valgrind --tool=toolname --leak-check=full ./a.out 选项:最重要的是六个工具包 memcheck : [默认]内存检测 cachegrind : 缓存和分支预测分析 callgrind : 在缓存模拟基础上,添加函数调用追踪、线程追踪 helgrind : 线程检测,条件竞争 massif : 堆剖析 lackey : 示例模板,可以创建自己的模板 none : 没有 kcachegrind 可视化
5
Valgrind重要的命令选项 最常用: --tool=toolname 指定valgrind使用什么工具
--leak-check=full 完全检查内存泄漏 --show-reachable=yes 显示内存泄漏地点 --trace-children=yes 跟入子进程 --log-file=filename 把信息输出到指定的文件 基础选项: -h,--help,--help-debug 帮助,开发人员调试选项 --version 软件内核版本,而非工具包版本号 -q,--quit 退出 -v,--verbose 显示详细信息,每多一个-v增加一个详细级别 -d 此软件开发人员有用
6
Valgrind重要的命令选项 更多选项: --log-socket= 指定输出消息到指定的IP、指定的端口
--xml= [default: no] 输出XML格式,适用Memcheck --num-callers= [default: 12] 指定追踪调用函数的层数 --error-limit= [default: yes] 设置最大报错的数量 --error-exitcode= [default: 0] 0:输出程序的返回值;非0:发现错误时返回此值 --suppressions= [default: $PREFIX/lib/valgrind/default.supp] 忽略指定的错误 --gen-suppressions= [default: no] 逐个错误输出,并打印忽略这个错误的方法 --max-stackframe= [default: ] 设置栈的最大值,如果栈指针的偏移超过这个数量,Valgrind则会认为程序是切换到了另外一个栈执行。 --separate-threads=[default:no] 是指是否按线程来分别统计,默认将所有线程的结果打到一个文件里;否则会按线程分别打印到不同文件里。
7
Valgrind测试简单C程序 #include <stdio.h> #include <stdlib.h>
int main() { char *p =(char *) malloc(8); sprintf(p, "%s", "test"); fprintf(stderr, "p:%s\n", p); return 0; } $ gcc –g malloc.c $ ls a.out malloc.c $ valgrind --leak-check=full --show-reachable=yes --trace-children=yes ./a.out
8
valgrind错误分析 {问题描述} at {地址、函数名、模块或代码行} by {地址、函数名、代码行}
==19468== Memcheck, a memory error detector ==19468== Copyright (C) , and GNU GPL'd, by Julian Seward et al. ==19468== Using Valgrind and LibVEX; rerun with -h for copyright info ==19468== Command: ./a.out ==19468== p:test ==19468== HEAP SUMMARY: ==19468== in use at exit: 8 bytes in 1 blocks ==19468== total heap usage: 1 allocs, 0 frees, 8 bytes allocated ==19468== 8 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==19468== at 0x4A07A2E: malloc (vg_replace_malloc.c:270) ==19468== by 0x400545: main (malloc.c:6) ==19468== LEAK SUMMARY: ==19468== definitely lost: 8 bytes in 1 blocks ==19468== indirectly lost: 0 bytes in 0 blocks ==19468== possibly lost: 0 bytes in 0 blocks ==19468== still reachable: 0 bytes in 0 blocks ==19468== suppressed: 0 bytes in 0 blocks ==19468== For counts of detected and suppressed errors, rerun with: -v ==19468== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 8 from 6) 1. copyright 版权声明 2. 异常读写报告 2.1 主线程异常读写 2.2 线程A异常读写报告 2.3 线程B异常读写报告 2... 其他线程 程序返回值 3. 堆内存泄露报告 3.1 堆内存使用情况概述(HEAP SUMMARY) 3.2 确信的内存泄露报告(definitely lost) 3.3 可疑内存操作报告 (show-reachable=no关闭) {问题描述} at {地址、函数名、模块或代码行} by {地址、函数名、代码行} by ...{逐层依次显示调用堆栈} Address 0x {描述地址的相对关系} (LEAK SUMMARY) 泄露情况概述
9
valgrind错误分析 1)definitely lost:肯定泄漏,内存没有被释放,且没有指针指向这里。
==8551== LEAK SUMMARY: ==8551== definitely lost: 0 bytes in 0 blocks ==8551== indirectly lost: 0 bytes in 0 blocks ==8551== possibly lost: ,062 bytes in 22,022 blocks ==8551== still reachable: 669,369 bytes in 22,103 blocks ==8551== suppressed: bytes in 0 blocks ==8551== ==8551== For counts of detected and suppressed errors, rerun with: -v ==8551== ERROR SUMMARY: 96 errors from 96 contexts (suppressed: 8 from 6) 1)definitely lost:肯定泄漏,内存没有被释放,且没有指针指向这里。 2) indirectly lost: 间接泄漏,指向该内存的指针都位于内存泄露处,只需修复“definitely lost”。 3) possibly lost: 可能泄漏,指针不是指向内存块头部、指针与该内存无关或二级指针。 4) still reachable: 程序运行完,仍旧有指针指向某内存,内存仍在使用中。 5) suppressed: 已被解决或者被忽略。
10
valgrind错误分析 非法读写,跨界访问 非法释放 使用未初始化的值 在系统调用中使用 未初始化或不可寻址的值
使用未初始化的值 在系统调用中使用 未初始化或不可寻址的值 invalid free() 使用不当的释放函数 释放堆块 Mismatched free()/delete/delete[] 重叠源和目标块 内存重叠 source and destination overlap in memcpy() Conditional jump or move depends on uninitialised value invalid read of size 4 syscall param write(buf) points to uninitilaised bytes
11
使用valgrind检测模拟过程-ihep
命令:valgrind --leak-check=full --show-reachable=yes --trace-children=yes --log-file=vallog boss.exe jobOptions_sim.txt 模拟500个事例⸆,登录节点: 普通模拟: 3 : 34.6 检测模拟:32 : 29.2 日志vallog:1,159,108行 9 倍 不使用工具 memcheck ⸆ /afs/ihep.ac.cn/users/z/zhaoww/workarea/TestRelease/TestRelease /run/jobOptions_sim.txt
12
使用valgrind检测模拟过程-天河二号
登录节点:500事例 jobOptions_sim.txt 抢占计算节点:500事例 jobID
13
总结 valgrind工具包和基本命令 日志输出的几种错误的分析 使用valgrind测试模拟重建过程
(valgrind --leak-check=full --show-reachable=yes --trace-children=yes --log-file=log program options ) 日志输出的几种错误的分析 使用valgrind测试模拟重建过程 在天河二号运行模拟与重建,对比ihep上的运行时间
14
谢谢 赵问问 2019年6月20日 周四 下午
16
Backup
17
copyright 版权声明 2. 异常读写报告 2.1 主线程异常读写 2.2 线程A异常读写报告 2.3 线程B异常读写报告 2... 其他线程 3. 堆内存泄露报告 3.1 堆内存使用情况概述(HEAP SUMMARY) 3.2 确信的内存泄露报告(definitely lost) 3.3 可疑内存操作报告 (show-reachable=no关闭) 3.4 泄露情况概述(LEAK SUMMARY) definitely lost:内存没有被释放,且没有任何指针指向这里。肯定泄漏了。报告给出的堆栈是内存被分配时的调用堆栈,它可以基本明确内存是由什么业务逻辑创建的。 still reachable:是说内存没有被释放,尽管如此仍有指针指向,内存仍在使用中,这可以不算泄露。(程序退出时仍在工作的异步系统调用?) possibly lost:是说可能有泄漏,一般是有二级指针(指针的指针)等复杂情况不易于追踪时出现。 suppressed:统计了使用valgrind的某些参数取消了特定库的某些错误,会被归结到这里 ==8551== LEAK SUMMARY: ==8551== definitely lost: 0 bytes in 0 blocks ==8551== indirectly lost: 0 bytes in 0 blocks ==8551== possibly lost: 850,062 bytes in 22,022 blocks ==8551== still reachable: 669,369 bytes in 22,103 blocks ==8551== suppressed: 0 bytes in 0 blocks
18
异常释放 Invalid free() / delete / delete[] / realloc()
at 0x402B06C: free (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x : main (main.cpp:24) Address 0x41f23a0 is 888 bytes inside a block of size 1,024 alloc’d at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x : main (main.cpp:17) int main(int argc, char *argv[]) { char* bigBuff = (char*)malloc(1024); char* offsetBuff = bigBuff + 888; free(offsetBuff); } free() / delete / delete[] / realloc() 四种中的任一种,这里是free的非法释放。在描述地址的相对关系时,使用了一个句子,句子的格式是:Address 0x???????? is {x} bytes {inside/before/after} a block of size {y} {alloc’d/free’d} 它表示了释放的地址与一个y长度块的相对位置关系。如果地址位于块前,则用before,位于块内则用inside,块后则是after。而最后的alloc’d代表这个y长度的块处于有效状态,其分配时的栈如下;而free’d代表y长度块已删除,其删除时的栈如下。 所以上面的报告可以解释为:地址0x41f23a0位于一个长度1024的有效块内+888处,其分配时的调用堆栈如下。
19
非法读写 Invalid write of size 4 at 0x8048490: main (main.cpp:19)
Address 0x41f2428 is 0 bytes after a block of size 1,024 alloc'd at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x : main (main.cpp:17) Invalid read of size 4 at 0x804849B: main (main.cpp:20) int main(int argc, char *argv[]) { char* bigBuff = (char*)malloc(1024); uint64_t* bigNum = (uint64_t*)(bigBuff+1020); *bigNum = 0x AABBCCDD; printf("bigNum is %llu\n",*bigNum); free(bigBuff); } 对一个内存区的使用超过了分配的大小时,可以触发Invalid write/read,同时被告知长度。本例中uint64_t有8字节长,访问超出了4字节。如果将bigBuff+1020改成bigBuff-20,那么报告中会准确的告诉你Address xxx is 20 bytes before a block of … 另外一个有趣的现象是,我发现对uint64_t的非法访问会产生2次4字节长度非法访问的报告,这说明了什么?
20
不匹配的释放 Mismatched free() / delete / delete []
at 0x402A8DC: operator delete[](void*) (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x80484FB: main (main.cpp:19) Address 0x is 0 bytes inside a block of size 1,024 alloc'd at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x80484E4: main (main.cpp:18) Use of uninitialised value of size 4 at 0x416E0DB: _itoa_word (_itoa.c:195) by 0x417221A: vfprintf (vfprintf.c:1629) by 0x4178B2E: printf (printf.c:35) by 0x41454D2: (below main) (libc-start.c:226) int main(int argc, char *argv[]) { int unused; char* bigBuff = (char*)malloc(1024); delete[] bigBuff; printf("unused=%d",unused); } //gcc过不了 #include <stdio.h> #include <string.h> int main() { int i; if(i == 0) { printf("[%d]\n", i); } return 0; } //gcc可以过 不管malloc分配后用delete还是delete[],又或者是new[]之后粗心用delete释放,都会得到Mismatched free() / delete / delete []报告,且报告主体内容基本一致。
21
使用未初始的值 Conditional jump or move depends on uninitialised value(s)
==14667== at 0x4004B0: main (uninitial.c:7) ==14667== ==14667== Use of uninitialised value of size 8 ==14667== at 0x317B24397B: _itoa_word (in /lib64/libc-2.12.so) ==14667== by 0x317B246532: vfprintf (in /lib64/libc-2.12.so) ==14667== by 0x317B24F069: printf (in /lib64/libc-2.12.so) ==14667== by 0x4004C8: main (uninitial.c:9) ==14667== Conditional jump or move depends on uninitialised value(s) ==14667== at 0x317B244FC3: vfprintf (in /lib64/libc-2.12.so) #include <stdio.h> #include <string.h> int main() { int i; if(i == 0) { printf("[%d]\n", i); } return 0; } //gcc、g++可以过 上例中int unused并未赋值即被使用,得到了Use of uninitialised value of size 4的报告,这样的问题通常不致命,但是也需要排除。 可以观察到一个有趣情况,堆栈最后一层首次出现了 (below main),它表示代码位于main函数以外被执行,也并非来自于线程,我还不能明确解释这种现象
22
静态构造和释放 Invalid write of size 4
at 0x804857B: GlobalClass::GlobalClass() (main.cpp:21) by 0x804850F: __static_initialization_and_destruction_0(int, int) (main.cpp:31) by 0x : _GLOBAL__sub_I_g_globalClass (main.cpp:55) by 0x : __libc_csu_init (in /home/jinzeyu/codelocal/build-mcsample-Desktop_Qt_5_3_GCC_32bit-Debug/mcsample) by 0x : (below main) (libc-start.c:185) Address 0x41f2030 is 8 bytes inside a block of size 10 alloc'd at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x : GlobalClass::GlobalClass() (main.cpp:20) at 0x80485B9: GlobalClass::~GlobalClass() (main.cpp:27) by 0x4079B80: __run_exit_handlers (exit.c:78) by 0x4079C0C: exit (exit.c:100) by 0x40604DA: (below main) (libc-start.c:258) Address 0x41f2070 is 8 bytes inside a block of size 10 alloc'd by 0x80485AF: GlobalClass::~GlobalClass() (main.cpp:26) 静态构造和释放 class GlobalClass { public: GlobalClass() char* buf = (char*)malloc(10); *(int*)(buf+8) = 100; free(buf); } ~GlobalClass() void fake(){} } g_globalClass; int main(int argc, char *argv[]) g_globalClass.fake(); 静态类的构造和释放都在main之外,所以都出现了(below main)的字样,堆栈的函数名也很好的证实了这两个过程。这里我联想到了另一个问题,就是静态构造的顺序不一定按预期,强烈建议静态对象之间不要有依赖关系。
23
Valgrind测试 bash : ls -l ==22583== Memcheck, a memory error detector
==22583== Copyright (C) , and GNU GPL'd, by Julian Seward et al. ==22583== Using Valgrind and LibVEX; rerun with -h for copyright info ==22583== Command: ls -l ==22583== total 20 -rwxr-xr-x 1 zhaoww physics 9125 Jun 16 23:27 a.out -rw-r--r-- 1 zhaoww physics 187 Jun 16 23:16 malloc.c -rw-r--r-- 1 zhaoww physics 130 Jun 15 19:40 readme ==22583== HEAP SUMMARY: ==22583== in use at exit: 19,453 bytes in 9 blocks ==22583== total heap usage: 208 allocs, 199 frees, 81,854 bytes allocated ==22583== 22 bytes in 3 blocks are still reachable in loss record 1 of 7 ==22583== at 0x4A07A2E: malloc (vg_replace_malloc.c:270) ==22583== by 0x4118D8: ??? (in /bin/ls) ==22583== by 0x41190B: ??? (in /bin/ls) ==22583== by 0x403AFC: ??? (in /bin/ls) ==22583== by 0x40817E: ??? (in /bin/ls) ==22583== by 0x408B2C: ??? (in /bin/ls) ==22583== by 0x313041ED1F: (below main) (in /lib64/libc-2.12.so) ==22583== 23 bytes in 1 blocks are still reachable in loss record 2 of 7 ==22583== by 0x40E3CB: ??? (in /bin/ls) ==22583== by 0x403844: ??? (in /bin/ls) ==22583== by 0x404377: ??? (in /bin/ls) ==22583== 24 bytes in 1 blocks are still reachable in loss record 3 of 7 ==22583== by 0x40E22B: ??? (in /bin/ls) ==22583== by 0x40447F: ??? (in /bin/ls) ==22583== 56 bytes in 1 blocks are still reachable in loss record 4 of 7 ==22583== by 0x40E8DA: ??? (in /bin/ls) ==22583== by 0x408983: ??? (in /bin/ls) ==22583== 56 bytes in 1 blocks are still reachable in loss record 5 of 7 ==22583== by 0x4089E9: ??? (in /bin/ls) ==22583== 72 bytes in 1 blocks are still reachable in loss record 6 of 7 ==22583== by 0x40475E: ??? (in /bin/ls) ==22583== by 0x40829F: ??? (in /bin/ls) ==22583== 19,200 bytes in 1 blocks are still reachable in loss record 7 of 7 ==22583== by 0x408AA6: ??? (in /bin/ls) ==22583== LEAK SUMMARY: ==22583== definitely lost: 0 bytes in 0 blocks ==22583== indirectly lost: 0 bytes in 0 blocks ==22583== possibly lost: 0 bytes in 0 blocks ==22583== still reachable: 19,453 bytes in 9 blocks ==22583== suppressed: 0 bytes in 0 blocks ==22583== For counts of detected and suppressed errors, rerun with: -v ==22583== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 6)
Similar presentations