Valgrind 使用方法与模拟重建测试 赵问问 中山大学物理学院 zhaoww2013@126.com 2019年6月20日 周四 下午 http://www.valgrind.org Valgrind 使用方法与模拟重建测试 赵问问 中山大学物理学院 zhaoww2013@126.com 2019年6月20日 周四 下午
概述 介绍valgrind简介,工具包与命令选项 使用valgrind检测简单C程序与错误分析 使用valgrind检测模拟重建过程
Valgrind简介 自由软件:内存调试、内存泄漏检测以及性能分析 未知行为 检测,函数和内存分析, 数据竞争条件侦测, 内存泄露检查工具 未知行为 检测,函数和内存分析, 数据竞争条件侦测, 内存泄露检查工具 速度拖慢10~50倍 $ sudo apt install valgrind $ sudo yum install valgrind 内 核 https://blog.csdn.net/zerokkqq/article/details/79742060
Valgrind 工具包 valgrind是一个调试和剖析的软件工具集 用法:valgrind (选项) (我的程序) (我的程序操作) 例如:valgrind --tool=toolname --leak-check=full ./a.out 选项:最重要的是六个工具包 memcheck : [默认]内存检测 cachegrind : 缓存和分支预测分析 callgrind : 在缓存模拟基础上,添加函数调用追踪、线程追踪 helgrind : 线程检测,条件竞争 massif : 堆剖析 lackey : 示例模板,可以创建自己的模板 none : 没有 kcachegrind 可视化 http://www.valgrind.org/docs/manual/manual.html
Valgrind重要的命令选项 最常用: --tool=toolname 指定valgrind使用什么工具 --leak-check=full 完全检查内存泄漏 --show-reachable=yes 显示内存泄漏地点 --trace-children=yes 跟入子进程 --log-file=filename 把信息输出到指定的文件 基础选项: -h,--help,--help-debug 帮助,开发人员调试选项 --version 软件内核版本,而非工具包版本号 -q,--quit 退出 -v,--verbose 显示详细信息,每多一个-v增加一个详细级别 -d 此软件开发人员有用
Valgrind重要的命令选项 更多选项: --log-socket= 指定输出消息到指定的IP、指定的端口 --xml= [default: no] 输出XML格式,适用Memcheck --num-callers= [default: 12] 指定追踪调用函数的层数 --error-limit= [default: yes] 设置最大报错的数量 --error-exitcode= [default: 0] 0:输出程序的返回值;非0:发现错误时返回此值 --suppressions= [default: $PREFIX/lib/valgrind/default.supp] 忽略指定的错误 --gen-suppressions= [default: no] 逐个错误输出,并打印忽略这个错误的方法 --max-stackframe= [default: 2000000] 设置栈的最大值,如果栈指针的偏移超过这个数量,Valgrind则会认为程序是切换到了另外一个栈执行。 --separate-threads=[default:no] 是指是否按线程来分别统计,默认将所有线程的结果打到一个文件里;否则会按线程分别打印到不同文件里。
Valgrind测试简单C程序 #include <stdio.h> #include <stdlib.h> int main() { char *p =(char *) malloc(8); sprintf(p, "%s", "test"); fprintf(stderr, "p:%s\n", p); return 0; } $ gcc –g malloc.c $ ls a.out malloc.c $ valgrind --leak-check=full --show-reachable=yes --trace-children=yes ./a.out
valgrind错误分析 {问题描述} at {地址、函数名、模块或代码行} by {地址、函数名、代码行} ==19468== Memcheck, a memory error detector ==19468== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==19468== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==19468== Command: ./a.out ==19468== p:test ==19468== HEAP SUMMARY: ==19468== in use at exit: 8 bytes in 1 blocks ==19468== total heap usage: 1 allocs, 0 frees, 8 bytes allocated ==19468== 8 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==19468== at 0x4A07A2E: malloc (vg_replace_malloc.c:270) ==19468== by 0x400545: main (malloc.c:6) ==19468== LEAK SUMMARY: ==19468== definitely lost: 8 bytes in 1 blocks ==19468== indirectly lost: 0 bytes in 0 blocks ==19468== possibly lost: 0 bytes in 0 blocks ==19468== still reachable: 0 bytes in 0 blocks ==19468== suppressed: 0 bytes in 0 blocks ==19468== For counts of detected and suppressed errors, rerun with: -v ==19468== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 8 from 6) 1. copyright 版权声明 2. 异常读写报告 2.1 主线程异常读写 2.2 线程A异常读写报告 2.3 线程B异常读写报告 2... 其他线程 程序返回值 3. 堆内存泄露报告 3.1 堆内存使用情况概述(HEAP SUMMARY) 3.2 确信的内存泄露报告(definitely lost) 3.3 可疑内存操作报告 (show-reachable=no关闭) {问题描述} at {地址、函数名、模块或代码行} by {地址、函数名、代码行} by ...{逐层依次显示调用堆栈} Address 0x1234567 {描述地址的相对关系} (LEAK SUMMARY) 泄露情况概述 https://blog.csdn.net/jinzeyu_cn/article/details/45969877
valgrind错误分析 1)definitely lost:肯定泄漏,内存没有被释放,且没有指针指向这里。 ==8551== LEAK SUMMARY: ==8551== definitely lost: 0 bytes in 0 blocks ==8551== indirectly lost: 0 bytes in 0 blocks ==8551== possibly lost: 850,062 bytes in 22,022 blocks ==8551== still reachable: 669,369 bytes in 22,103 blocks ==8551== suppressed: 0 bytes in 0 blocks ==8551== ==8551== For counts of detected and suppressed errors, rerun with: -v ==8551== ERROR SUMMARY: 96 errors from 96 contexts (suppressed: 8 from 6) 1)definitely lost:肯定泄漏,内存没有被释放,且没有指针指向这里。 2) indirectly lost: 间接泄漏,指向该内存的指针都位于内存泄露处,只需修复“definitely lost”。 3) possibly lost: 可能泄漏,指针不是指向内存块头部、指针与该内存无关或二级指针。 4) still reachable: 程序运行完,仍旧有指针指向某内存,内存仍在使用中。 5) suppressed: 已被解决或者被忽略。 https://blog.csdn.net/louObaichu/article/details/45507365
valgrind错误分析 非法读写,跨界访问 非法释放 使用未初始化的值 在系统调用中使用 未初始化或不可寻址的值 使用未初始化的值 在系统调用中使用 未初始化或不可寻址的值 invalid free() 使用不当的释放函数 释放堆块 Mismatched free()/delete/delete[] 重叠源和目标块 内存重叠 source and destination overlap in memcpy() Conditional jump or move depends on uninitialised value invalid read of size 4 syscall param write(buf) points to uninitilaised bytes
使用valgrind检测模拟过程-ihep 命令:valgrind --leak-check=full --show-reachable=yes --trace-children=yes --log-file=vallog boss.exe jobOptions_sim.txt 模拟500个事例⸆,登录节点: 普通模拟: 3 : 34.6 检测模拟:32 : 29.2 日志vallog:1,159,108行 9 倍 不使用工具 memcheck ⸆ /afs/ihep.ac.cn/users/z/zhaoww/workarea/TestRelease/TestRelease-00-00-86/run/jobOptions_sim.txt
使用valgrind检测模拟过程-天河二号 登录节点:500事例 jobOptions_sim.txt 抢占计算节点:500事例 jobID 1813424
总结 valgrind工具包和基本命令 日志输出的几种错误的分析 使用valgrind测试模拟重建过程 (valgrind --leak-check=full --show-reachable=yes --trace-children=yes --log-file=log program options ) 日志输出的几种错误的分析 使用valgrind测试模拟重建过程 在天河二号运行模拟与重建,对比ihep上的运行时间
谢谢 赵问问 zhaoww2013@126.com 2019年6月20日 周四 下午
Backup
copyright 版权声明 2. 异常读写报告 2.1 主线程异常读写 2.2 线程A异常读写报告 2.3 线程B异常读写报告 2... 其他线程 3. 堆内存泄露报告 3.1 堆内存使用情况概述(HEAP SUMMARY) 3.2 确信的内存泄露报告(definitely lost) 3.3 可疑内存操作报告 (show-reachable=no关闭) 3.4 泄露情况概述(LEAK SUMMARY) definitely lost:内存没有被释放,且没有任何指针指向这里。肯定泄漏了。报告给出的堆栈是内存被分配时的调用堆栈,它可以基本明确内存是由什么业务逻辑创建的。 still reachable:是说内存没有被释放,尽管如此仍有指针指向,内存仍在使用中,这可以不算泄露。(程序退出时仍在工作的异步系统调用?) possibly lost:是说可能有泄漏,一般是有二级指针(指针的指针)等复杂情况不易于追踪时出现。 suppressed:统计了使用valgrind的某些参数取消了特定库的某些错误,会被归结到这里 ==8551== LEAK SUMMARY: ==8551== definitely lost: 0 bytes in 0 blocks ==8551== indirectly lost: 0 bytes in 0 blocks ==8551== possibly lost: 850,062 bytes in 22,022 blocks ==8551== still reachable: 669,369 bytes in 22,103 blocks ==8551== suppressed: 0 bytes in 0 blocks
异常释放 Invalid free() / delete / delete[] / realloc() at 0x402B06C: free (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x8048461: main (main.cpp:24) Address 0x41f23a0 is 888 bytes inside a block of size 1,024 alloc’d at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x8048444: main (main.cpp:17) int main(int argc, char *argv[]) { char* bigBuff = (char*)malloc(1024); char* offsetBuff = bigBuff + 888; free(offsetBuff); } free() / delete / delete[] / realloc() 四种中的任一种,这里是free的非法释放。在描述地址的相对关系时,使用了一个句子,句子的格式是:Address 0x???????? is {x} bytes {inside/before/after} a block of size {y} {alloc’d/free’d} 它表示了释放的地址与一个y长度块的相对位置关系。如果地址位于块前,则用before,位于块内则用inside,块后则是after。而最后的alloc’d代表这个y长度的块处于有效状态,其分配时的栈如下;而free’d代表y长度块已删除,其删除时的栈如下。 所以上面的报告可以解释为:地址0x41f23a0位于一个长度1024的有效块内+888处,其分配时的调用堆栈如下。
非法读写 Invalid write of size 4 at 0x8048490: main (main.cpp:19) Address 0x41f2428 is 0 bytes after a block of size 1,024 alloc'd at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x8048474: main (main.cpp:17) Invalid read of size 4 at 0x804849B: main (main.cpp:20) int main(int argc, char *argv[]) { char* bigBuff = (char*)malloc(1024); uint64_t* bigNum = (uint64_t*)(bigBuff+1020); *bigNum = 0x12345678AABBCCDD; printf("bigNum is %llu\n",*bigNum); free(bigBuff); } 对一个内存区的使用超过了分配的大小时,可以触发Invalid write/read,同时被告知长度。本例中uint64_t有8字节长,访问超出了4字节。如果将bigBuff+1020改成bigBuff-20,那么报告中会准确的告诉你Address xxx is 20 bytes before a block of … 另外一个有趣的现象是,我发现对uint64_t的非法访问会产生2次4字节长度非法访问的报告,这说明了什么?
不匹配的释放 Mismatched free() / delete / delete [] at 0x402A8DC: operator delete[](void*) (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x80484FB: main (main.cpp:19) Address 0x4323028 is 0 bytes inside a block of size 1,024 alloc'd at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x80484E4: main (main.cpp:18) Use of uninitialised value of size 4 at 0x416E0DB: _itoa_word (_itoa.c:195) by 0x417221A: vfprintf (vfprintf.c:1629) by 0x4178B2E: printf (printf.c:35) by 0x41454D2: (below main) (libc-start.c:226) int main(int argc, char *argv[]) { int unused; char* bigBuff = (char*)malloc(1024); delete[] bigBuff; printf("unused=%d",unused); } //gcc过不了 #include <stdio.h> #include <string.h> int main() { int i; if(i == 0) { printf("[%d]\n", i); } return 0; } //gcc可以过 不管malloc分配后用delete还是delete[],又或者是new[]之后粗心用delete释放,都会得到Mismatched free() / delete / delete []报告,且报告主体内容基本一致。
使用未初始的值 Conditional jump or move depends on uninitialised value(s) ==14667== at 0x4004B0: main (uninitial.c:7) ==14667== ==14667== Use of uninitialised value of size 8 ==14667== at 0x317B24397B: _itoa_word (in /lib64/libc-2.12.so) ==14667== by 0x317B246532: vfprintf (in /lib64/libc-2.12.so) ==14667== by 0x317B24F069: printf (in /lib64/libc-2.12.so) ==14667== by 0x4004C8: main (uninitial.c:9) ==14667== Conditional jump or move depends on uninitialised value(s) ==14667== at 0x317B244FC3: vfprintf (in /lib64/libc-2.12.so) #include <stdio.h> #include <string.h> int main() { int i; if(i == 0) { printf("[%d]\n", i); } return 0; } //gcc、g++可以过 上例中int unused并未赋值即被使用,得到了Use of uninitialised value of size 4的报告,这样的问题通常不致命,但是也需要排除。 可以观察到一个有趣情况,堆栈最后一层首次出现了 (below main),它表示代码位于main函数以外被执行,也并非来自于线程,我还不能明确解释这种现象
静态构造和释放 Invalid write of size 4 at 0x804857B: GlobalClass::GlobalClass() (main.cpp:21) by 0x804850F: __static_initialization_and_destruction_0(int, int) (main.cpp:31) by 0x8048551: _GLOBAL__sub_I_g_globalClass (main.cpp:55) by 0x8048631: __libc_csu_init (in /home/jinzeyu/codelocal/build-mcsample-Desktop_Qt_5_3_GCC_32bit-Debug/mcsample) by 0x4060469: (below main) (libc-start.c:185) Address 0x41f2030 is 8 bytes inside a block of size 10 alloc'd at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x8048571: GlobalClass::GlobalClass() (main.cpp:20) at 0x80485B9: GlobalClass::~GlobalClass() (main.cpp:27) by 0x4079B80: __run_exit_handlers (exit.c:78) by 0x4079C0C: exit (exit.c:100) by 0x40604DA: (below main) (libc-start.c:258) Address 0x41f2070 is 8 bytes inside a block of size 10 alloc'd by 0x80485AF: GlobalClass::~GlobalClass() (main.cpp:26) 静态构造和释放 class GlobalClass { public: GlobalClass() char* buf = (char*)malloc(10); *(int*)(buf+8) = 100; free(buf); } ~GlobalClass() void fake(){} } g_globalClass; int main(int argc, char *argv[]) g_globalClass.fake(); 静态类的构造和释放都在main之外,所以都出现了(below main)的字样,堆栈的函数名也很好的证实了这两个过程。这里我联想到了另一个问题,就是静态构造的顺序不一定按预期,强烈建议静态对象之间不要有依赖关系。
Valgrind测试 bash : ls -l ==22583== Memcheck, a memory error detector ==22583== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==22583== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==22583== Command: ls -l ==22583== total 20 -rwxr-xr-x 1 zhaoww physics 9125 Jun 16 23:27 a.out -rw-r--r-- 1 zhaoww physics 187 Jun 16 23:16 malloc.c -rw-r--r-- 1 zhaoww physics 130 Jun 15 19:40 readme ==22583== HEAP SUMMARY: ==22583== in use at exit: 19,453 bytes in 9 blocks ==22583== total heap usage: 208 allocs, 199 frees, 81,854 bytes allocated ==22583== 22 bytes in 3 blocks are still reachable in loss record 1 of 7 ==22583== at 0x4A07A2E: malloc (vg_replace_malloc.c:270) ==22583== by 0x4118D8: ??? (in /bin/ls) ==22583== by 0x41190B: ??? (in /bin/ls) ==22583== by 0x403AFC: ??? (in /bin/ls) ==22583== by 0x40817E: ??? (in /bin/ls) ==22583== by 0x408B2C: ??? (in /bin/ls) ==22583== by 0x313041ED1F: (below main) (in /lib64/libc-2.12.so) ==22583== 23 bytes in 1 blocks are still reachable in loss record 2 of 7 ==22583== by 0x40E3CB: ??? (in /bin/ls) ==22583== by 0x403844: ??? (in /bin/ls) ==22583== by 0x404377: ??? (in /bin/ls) ==22583== 24 bytes in 1 blocks are still reachable in loss record 3 of 7 ==22583== by 0x40E22B: ??? (in /bin/ls) ==22583== by 0x40447F: ??? (in /bin/ls) ==22583== 56 bytes in 1 blocks are still reachable in loss record 4 of 7 ==22583== by 0x40E8DA: ??? (in /bin/ls) ==22583== by 0x408983: ??? (in /bin/ls) ==22583== 56 bytes in 1 blocks are still reachable in loss record 5 of 7 ==22583== by 0x4089E9: ??? (in /bin/ls) ==22583== 72 bytes in 1 blocks are still reachable in loss record 6 of 7 ==22583== by 0x40475E: ??? (in /bin/ls) ==22583== by 0x40829F: ??? (in /bin/ls) ==22583== 19,200 bytes in 1 blocks are still reachable in loss record 7 of 7 ==22583== by 0x408AA6: ??? (in /bin/ls) ==22583== LEAK SUMMARY: ==22583== definitely lost: 0 bytes in 0 blocks ==22583== indirectly lost: 0 bytes in 0 blocks ==22583== possibly lost: 0 bytes in 0 blocks ==22583== still reachable: 19,453 bytes in 9 blocks ==22583== suppressed: 0 bytes in 0 blocks ==22583== For counts of detected and suppressed errors, rerun with: -v ==22583== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 6)