TNQ400-02
理解 Windows 2000 ® 和 NT4 系统和进程活动 © 1994-2000 David A. Solomon And Jamie E. Hanrahan
演讲必备条件 这个演讲假设您理解以下基础知识: 这是一个300级的讲座 操作系统概念的基础知识(虚拟内存、进程和多任务) 基本Windows NT和Windows 2000使用和管理 这是一个300级的讲座
今天您将学到以下知识 查看进程细节例如打开文件句柄、I/O活动、DLL使用和安全 操作系统帐户和应用程序CPU时间(包括中断) 在系统进程树中识别每一个系统进程 将Windows NT服务映射到正在运行该服务的进程上 将核心态运行的系统线程活动匹配到驱动程序或者拥有线程的OS组件 The main objective of this session is to help you, the IT Professional, understand what’s running and why on your Windows NT & Windows 2000 servers and workstations. Using a variety of Microsoft and 3rd party tools, we’ll look inside an NT process to see what resources it is using, such as open files, I/O activity, and DLL usage. If the system feels slow, it’s important to be able to ascertain what exactly is running and why, which is why this session will give you the information you need to differentiate operating system CPU utilization vs that of applications. To get an accurate picture, you also need to know the internals of how NT accounts for CPU time. Finally, if something is running, and it’s not something you ran (e.g. it’s an NT system process), what exactly is its purpose and why is it running? That covers the last three bullets.
议事日程 工具概述 理解进程和线程活动 理解CPU时间统计 理解系统进程 进程崩溃
工具列表 工具 可执行文件 相关工具包 性能监视 Sysmon Windows 2000 (Perfmon in NT4) 工具 可执行文件 相关工具包 性能监视 Sysmon Windows 2000 (Perfmon in NT4) 注册表编辑器 RegEdt32 Windows 2000 进程查看器 pviewer Windows 2000 支持工具 任务列表 tlist Windows 2000 支持工具 依赖关系浏览 depends Windows 2000 支持工具 打开句柄 oh Windows 2000 服务器资源工具箱 QuickSlice qslice Windows 2000 服务器资源工具箱 句柄查看器 handleex www.sysinternals.com 列表DLL listdlls www.sysinternals.com 文件监视器 filemon www.sysinternals.com 注册表监视器 regmon www.sysinternals.com 进程状态 pstat NT4 资源工具箱或平台SDK This session will use a number of tools to dig into the internal state of Windows 2000. The tools come from three sources: Windows 2000 Windows 2000 Support Tools (separate installation on Windows 2000 product CD – see \support\tools) Windows 2000 Server Resource Kit Some of the tools come from a site that many of you may have visited before: www.sysinternals.com. These tools, created by Mark Russinovich, were created by reverse engineering the Windows NT/2000 binaries. The only caveat on using these tools is that many involve the use of a device driver to gain access to internal operating system information that is not available through supported or normal, documented interfaces. Therefore, the usual caveats about adding device drivers apply here – drivers can bypass security and can crash the system if they have bugs.
议事日程 工具概述 理解进程和线程活动 理解CPU时间统计 理解系统进程 进程和系统崩溃
进程和线程 什么是进程? 什么是线程? 每一个进程启动时带有一个线程 线程 代表了运行程序的一个实例 每一个进程有一个私有的内存地址空间 进程地址空间 线程 什么是进程? 代表了运行程序的一个实例 每一个进程有一个私有的内存地址空间 什么是线程? 进程内的一个执行上下文 进程内的所有线程共享相同的进程地址空间 每一个进程启动时带有一个线程 运行程序的“主”函数 可以在同一个进程中创建其他的线程 可以创建额外的进程 线程 线程 On NT, threads run – processes don’t run. Processes are an instance of an executable image. Creating a process creates a private process address space to contain the image and its private memory. The initial thread begins execution at the main entry point in the program. Once running, the initial thread can create other threads. Illustration: a process is like a sandbox. Threads are the kids in the sandbox. However, a process sandbox is “protected” – it has acrylic walls and a ceiling that prevents the kids in the sandbox from throwing sand outside the box. So there is no way for a thread in one process to “throw sand” (affect the memory space) of another process (unless they agree to share memory). Just as kids in the same sandbox can through sand in each others eyes, threads can corrupt each others’ data structures. Advantage of dividing a program into multiple threads: give user illusion of better throughput (user interface responds quickly while application does background processing) AND if running on a multiprocessor system, the threads in the application can run on multiple CPUs simultaneously (of course, priorities of threads dictate who runs and when). For details on processes and threads, see: - Programming Applications for Microsoft Windows (Microsoft Press, by Jeffrey Richter): Processes & Threads chapter - Inside Windows NT, 2nd edition (Microsoft Press, by David Solomon): Processes & Threads chapter 系统地址空间
, 每个线程的核心态模式堆栈,Win32K.Sys 32-位虚拟地址空间 00000000 .EXE 代码 全局 每个线程的用户模式堆栈 进程堆栈 .DLL 代码 每个进程单独的地址空间,可以在用户态或核心态访问 2 GB的进程空间 进程地址空间不可以被其他进程直接访问 2 GB的系统范围空间 在这个空间中加载操作系统,并且存在于每一个进程地址空间 “操作系统”没有进程(尽管有一些进程服务于操作系统,但是它们或多或少都是在“后台”运行) 7FFFFFFF 80000000 每个进程的空间,只能在核心态下访问 Exec, Kernel, HAL, 驱动程序 , 每个线程的核心态模式堆栈,Win32K.Sys 文件系统缓存页面池 非页面池 This slide shows the layout of the virtual 4GB 32-bit “sandbox”: Processes get half (2GB) Each process “thinks” it can grow to 2GB in size (code + data) – of course, not all of the process sandbox may fit in physical memory; data may therefore get paged out to the paging file; code is only brought into memory when needed (and is NOT written out to the paging file – the memory is just re-used if needed, and the code re-read from disk if needed again later). Operating system takes half (2GB) For details on the address space layout, see Inside Windows NT, 2nd edition (Memory Management chapter). C0000000 进程页面表 超级空间 系统范围,只可以在核心态下访问 FFFFFFFF 7
CreateJobObject/OpenJobObject Windows 2000作业对象 新的内核对象定义了一组相关进程 CreateJobObject/OpenJobObject 可以指定作业范围的属性、资源分配额和安全限制 资源分配额:总的和目前CPU时间、总的和活动进程、每一个进程和作业的CPU时间、最小和最大的工作集合 属性:CPU相似性、优先级、调度级别 安全限制:没有管理员令牌,只有受限令牌、指定令牌、过滤令牌,不能访问作业外面的窗口,不能读/写剪贴板 Windows 2000 adds a new kernel object called the job object. A job allows grouping several processes together that share a common set of limits or quotas. The job object can even be useful when there is only a single process in the job, because there are some new controls and quotas that can only be applied to a job, but that may be useful to apply to a single process. For example: the CPU scheduling class attribute allows control of the length of the timeslice (the length of time NT allows a thread to run before allowing another thread at the same priority to run). Jobs were really a foundation primitive added to Windows 2000 to enable building other applications, such as a class scheduler or more sophisticated batch scheduling system. In fact, there are no tools in Windows 2000 Professional, Server, or Advanced Server to create jobs – to utilize them requires additional 3rd party software. However, Windows 2000 Datacenter will include a tool to create and manage jobs.
查看进程信息 许多重叠的工具! 运行的映象的名字可能不能说明它是什么 如果系统速度降低,第一个问题是:系统正在运行什么进程? 他们都显示进程和线程信息的不同部分 但是有一些工具可以显示其他工具不能显示的内容 运行的映象的名字可能不能说明它是什么 发现EXE在磁盘的位置 PS.VBS (或者 pslist.exe)可以显示绝对路径 如果系统速度降低,第一个问题是:系统正在运行什么进程? 一个快速的方法:运行任务管理器,按CPU使用率排序 Looking a process and thread information on NT is a mess, because there are so many overlapping tools that all display various subsets of the process and thread information. In fact, each of these tools on this slide show ONE thing that the others do not show (!). They all show the process list – some show the thread list. But each shows some tidbit not shown by the other tools. We will use a number of these tools during this session.
任务管理器 启动: Ctrl+Shift+Esc;或者Ctrl+Alt+Del;或者在任务栏的空白处右击 与其他进程显示实用程序重叠的部分 除了Win 16进程信息,只有这里可以看(在进程选项卡上单击选项->显示16-位任务) 应用程序选项卡:最顶层可见的窗口列表 只有线程拥有窗口(在窗口上右击并选择“查看进程” ) 进程选项卡:进程列表 可以使用查看->选项栏数 在标题栏上单击按该列排序 在进程名字上右击来改变进程的优先级、结束进程树(在Windows 2000新增加的功能),或者(在MP上)CPU分配 性能选项卡:NT性能计数器的子集 Let’s start with the most basic process information display tool – Task Manager. SEE DEMO SCRIPT #1
进程查看器 支持工具中的Pviewer.exe 显示线程详细信息 可以显示远程进程列表 每一个线程的起始地址 每一个线程的CPU时间 但是不能杀死远程进程 使用exec.vbs或者资源工具箱中的rkill SEE DEMO SCRIPT #2 AUDIENCE QUESTION: How can threads have a context switch count (a context switch is the number of times a thread was selected to run) that is greater than zero, but have zero or very little CPU time? (Find a thread in a process that matches this criteria) ANSWER: The way NT accounts for CPU time (which we will explain later) – if a thread is not running when the internal interval clock timer fires, it may never get “charged” for the last time interval (more details later)
使用TLIST /T查看进程层次结构 理解进程的父亲可以帮助确认进程的来源和作用 tlist /t 显示继承关系树 Windows 2000 如果父进程不是活动的,进程左对齐 例如,不能查看创建者 例如: explorer.exe的父进程是死 的 (它实际是由 userinit.exe启动的,然后父进程退出) Windows 2000 可以显示父进程标识号 任务管理器有一个“结束进程树” 在进程上右击 Unless you saw the output from the TLIST /T command, you would never know that NT keeps track of process parent/child relationships. But, it does (sort of) – actually, it only keeps track of who your parent is (but not your grandparent or grandchildren). SEE DEMO SCRIPT #3 and #4
查看打开句柄 句柄泄露可以作为系统内存泄露显示! 任务管理器可以显示总的进程句柄 资源工具箱“Oh”工具(第一次运行可以设置一个NT全局标记并需要重新启动—查看资源工具箱中的gflags.exe ) www.sysinternals.com (使用一个设备驱动程序)的handleex (图形用户界面)或 nthandle (控制台) Another basic piece of process information that may be important to examine to troubleshoot is what files (or other objects) are open by which processes? For example, if you get a “file locked” error and want to find out which process has the file open, how can you do it? The only way to find out local files that are open by local processes (vs files open by remote network clients, which can be viewed looking at the files in use by each network share) is to use either the Resource Kit “oh.exe” tool or the “handleex.exe” tool from www.sysinternals.com. SEE DEMO SCRIPT #5
查看DLL使用 资源工具箱中的Depends.exe 显示从EXE到DLL的静态链接 也可以“构造”进程并且显示动态DLL加载 Another key piece of process information is what DLLs are being used by a running process (and/or which DLLs *would* be used if you ran a particular EXE). The Dependency Walker tool in the Windows 2000 Support Tools addresses this need. SEE DEMO SCRIPT #6
查看DLL使用 要诊断 DLL 冲突,您需要知道哪一个 DLLs被加载以及从哪加载的 tlist <进程名>或者 tlist <进程标识号> 列出了 DLL,但是不包括路径 www.sysinternals.com 的listdlls 可以显示全部路径 也显示 .EXE的全路径 –对于跟踪进程的实质是十分重要的! SEE DEMO SCRIPT #7
I/O活动 如何隔离系统I/O活动? 使用系统性能对象中的I/O计数器来得到所有的数据 使用Windows 2000中的新的进程I/O计数器来发现进程 使用 Filemon (www.sysinternals.com) 来发现哪个进程在访问哪个文件 不要忘记正常的文件I/O就象页面式I/O(由于缓存管理的设计) Windows 2000 supports per-process I/O counters, so now you can pinpoint which process(es) are responsible for disk and network I/O activity (NT4 only kept system-wide counters). SEE DEMO SCRIPT#8 and #9
注册表活动 注册表应该在一个稳定的系统中保持“稳定” 例如,它不应该是一个频繁访问的数据库 运行 RegMon (www.sysinternals.com) 来确定您的注册表(大多数情况下)是不活动的 The Registry plays a key role in the operation of NT. Understanding where in the Registry various administrative tools reference can sometimes be important in troubleshooting. Also, sometimes in looking at registry activity on a “stable system”, you may find some system processes making regular queries to registry locations, looking for changes. This is “not the right thing” – meaning that the registry is not meant to be queried regularly – there are other ways for programs to be notified of registry changes (if you see a process doing regular registry queries on a steady-state system, consider filing a bug report with the vendor!). The Registry Monitor tool from www.sysinternals.com is an excellent way to look at system registry activity. SEE DEMO SCRIPT#10
小测验(进程和线程) NT的调度单元是什么? 线程在运行中,但是却没有占用任何CPU时间,为什么? 进程沙箱的尺寸是多大? A: 线程 A: 2 gigabytes
议事日程 工具概述 理解进程和线程活动 理解CPU时间统计 理解系统进程 进程和系统崩溃
核心态对用户态模式 一个处理器状态 相关的线程 计数器: 控制内存访问 每一个内存页面都标记着需要访问时处理器的状态 保护系统不受用户侵害 保护用户进程相互侵害 系统本身可以访问自己 ”代码区标记“不可以在任何模式下写” 控制执行特权指令的能力 一个Windows NT抽象 Intel: Ring 0, Ring 3 相关的线程 线程可以从用户态切换到核心态,而且反之亦然 保存的上下文的一部分和注册表,等等。 不影响调度 计数器: “特权时间”和“用户时间” 四级粒度:线程、进程、处理器和系统 [from Inside Windows NT, 2nd edition, Chapter 1] To protect user applications from accessing and/or modifying critical operating system data, Windows NT uses two processor access modes (even if the processor on which Windows NT is running supports more than two): user mode and kernel mode. User application code runs in user mode, whereas operating system code (such as system services and device drivers) runs in kernel mode. Kernel mode refers to a mode of execution in a processor that grants access to all system memory and all CPU instructions. By providing the operating system software with a higher privilege level than application software has, the processor provides a necessary foundation for operating system designers to ensure that a misbehaving application can’t disrupt the stability of the system as a whole. NOTE The architecture of the x86 processor defines four privilege levels, or rings, to protect system code and data from being overwritten either inadvertently or maliciously by code of lesser privilege. Windows NT uses privilege level 0 (or ring 0) for kernel mode and privilege level 3 (or ring 3) for user mode. The reason Windows NT uses only two levels is to maintain source code portability across the RISC-based architectures supported by Windows NT, since all mainstream RISC-based processors have only two privilege levels. Thus, it’s normal that a user thread spends part of its time executing in user mode and part in kernel mode. In fact, because the bulk of the graphics and windowing system also runs in kernel mode, graphics-intensive applications will spend more of their time in kernel mode than in user mode. An easy way to test this is to run a graphics-intensive application such as Microsoft Paint or Microsoft Pinball and watch the time split between user mode and kernel mode using one of a number of utilities, such as Process Viewer or Performance Monitor. For example, although each Win32 process has its own private memory space, the operating system shares a single virtual address space. Each page in virtual memory is tagged as to what access mode the processor must be in to read and/or write the page. Pages in system space (the upper half of the 4-GB virtual address space, from x80000000 through xFFFFFFFF) can be accessed only from kernel mode, whereas all pages in the user address space (the lower half, addresses x00000000 through x7FFFFFFF) are accessible from user mode. Read-only pages (such as those that contain executable code) are not writeable from any mode. Windows NT doesn’t provide any protection for components running in kernel mode. In other words, once in kernel mode, system code has complete access to system space memory and can bypass Windows NT security to access objects. Because the bulk of the Windows NT operating system code runs in kernel mode, it is vital that it be carefully designed and tested to ensure that it doesn’t violate system security. This lack of protection also emphasizes the need to take care when loading a third-party device driver since once in kernel mode, the software has complete access to all operating system data. Let’s now talk about the 3 reasons why NT spends time in kernel mode executing operating system or driver code.
进入核心模式 Windows 2000在下列三种情况下切换到核心态模式 1. 用户态下的请求 2. 外部设备的中断请求 1. 用户态下的请求 通过系统服务调度机制 核心态代码运行在请求线程的上下文中 2. 外部设备的中断请求 Windows NT支持的中断调度器唤醒中断服务例程 ISR运行在被中断的线程的上下文中 (所谓的 “任意的线程上下文”) ISR 经常请求一个 “DPC例程”的执行,它运行在核心态模式中 中断处理时间不包括在被中断线程的时间片内 3. 核心态模式系统线程 系统中的一些线程始终保持在核心态模式(大部分在“系统”进程中) 调度的、优先的等等,象任何其他线程 NT spends time in kernel mode for 3 reasons: An application makes a system call (e.g. opening a file, reading/writing data, allocating memory, creating processes or threads, etc) – the time spent inside the operating system to execute the function is charged to the kernel mode (or privileged) time of the thread. When an interrupt occurs from a hardware device (this will be explained in detail in the next few slides) When it is executing a part of NT or a device driver that was set up to run as a “system thread” (to be explained in detail in the next section on “System Processes”). The reason it’s important to understand these three reasons is so that you can diagnose a busy system – if the CPU is spending time in kernel mode, it isn’t (directly) executing application code. So, the question becomes: “What’s going on and why?”. We’re going to see how exactly to trace kernel mode CPU time back to the reason for it.
检查进程CPU时间 查看进程在核心态下的时间可以告诉您进程现在在做什么 100%用户时间:应用程序占用 部分用户时间、部分核心态时间:做系统调用 使用 qslice.exe (资源工具箱) 或者 PerfMon SEE DEMO SCRIPT#11
中断调度 !注意,没有线程或者进程 上下文 关闭中断 记录机器状态 (陷阱帧)来允许恢复 比较屏蔽吗并且降低中断优先级 用户态或核心态代码 核心态模式 !注意,没有线程或者进程 上下文 中断调度例程 中断! 关闭中断 记录机器状态 (陷阱帧)来允许恢复 比较屏蔽吗并且降低中断优先级 寻找和调用合适的ISR 打开中断 恢复机器状态(包括模式和中断) 中断服务例程 告诉设备停止中断 审问设备状态、启动设备下一个操作,等等。 请求一个 DPC 返回调用者 Hardware-generated interrupts typically originate from I/O devices that must notify the processor when they need service. Interrupt-driven devices allow the operating system to get the maximum use out of the processor by overlapping central processing with I/O operations. The processor starts an I/O transfer to or from a device and then executes other threads while the device completes the transfer. When the device is finished, it interrupts the processor for service. Pointing devices, printers, keyboards, disk drives, and network cards are generally interrupt driven. System software can also generate interrupts. For example, the kernel can issue a software interrupt to initiate thread dispatching and to asynchronously break into the execution of a thread. The kernel can also disable interrupts so that the processor isn’t interrupted, but it does so only infrequently—at critical moments while it’s processing an interrupt or dispatching an exception, for example. A submodule of the kernel’s trap handler, called the interrupt dispatcher, responds to interrupts. It determines the source of an interrupt and transfers control either to an external routine (the ISR) that handles the interrupt or to an internal kernel routine that responds to the interrupt. Device drivers supply ISRs to service device interrupts, and the kernel provides interrupt handling routines for other types of interrupts. An interrupt is an asynchronous event (one that can occur at any time) that is unrelated to what the processor is executing. Interrupts are generated primarily by I/O devices, processor clocks, or timers, and they can be enabled (turned on) or disabled (turned off). When an interrupt occurs, the trap handler disables interrupts briefly while it records the machine state (information that would be wiped out if another interrupt or exception occurred). It creates a trap frame in which it stores the execution state of the interrupted thread. This information allows the kernel to resume execution of the thread after handling the interrupt or the exception. Then, the kernel transfers control to the interrupt service routine (ISR) that the device driver provided for the interrupting device.
中断优先级 IRQL =中断请求优先级 关于其他中断的中断优先级 不同的中断源有不同的IRQL 与IRQ不同 在多处理器系统中每一个CPU可以有不同的IRQL 31 低 APC Dispatch/DPC 设备 1 . 设备 n 时钟 内中断 电源故障 高 30 29 硬件中断 28 The interrupt dispatcher maps hardware-interrupt levels onto a standard set of interrupt request levels (IRQLs) recognized by the operating system. IRQL priority levels have a completely different meaning than thread-scheduling priorities (which also happen to run from zero to 31). A scheduling priority is an attribute of a thread, whereas an IRQL is an attribute of an interrupt source, such as a keyboard or a mouse (normal threads run at IRQL zero (sometimes 1)). Note that the IRQL levels are not the same as interrupt request levels (IRQs) on an x86 system—the x86 architecture doesn’t implement the concept of IRQLs in hardware (Alpha did). IRQLs rank interrupts by priority. Interrupts are serviced in priority order, and a higher-priority interrupt preempts the servicing of a lower-priority interrupt. On x86 systems, external I/O interrupts actually come into one of the lines on an interrupt controller. The controller in turn interrupts the processor on a single line. Once the processor is interrupted, it queries the controller to get the interrupt vector. The processor uses this vector to index into the hardware IDT and to transfer control to the appropriate interrupt dispatch routine. Although the x86 architecture can support up to 256 interrupt lines, the number of lines a particular machine can support is determined by the design of the interrupt controller the machine uses. Most x86 PCs have interrupt controllers that use 16 interrupt lines. 可延迟软件中断 2 1 正常的线程执行
(DPC)延迟过程调用 “工作请求”列表 用来从更高(设备)中断优先级到低优先级(分配)的延迟过程调用 隐式的通过请求时间来排列(FIFO) 用来从更高(设备)中断优先级到低优先级(分配)的延迟过程调用 主要用于驱动程序“后中断”功能 用于时间片结束和计数器到时 队列首 DPC 对象 DPC 对象 DPC 对象 When a device interrupts, as noted on the previous slide, the kernel’s interrupt dispatcher transfers control to the device driver’s interrupt service routine (ISR). In the Windows NT I/O model, ISRs run at a high device interrupt request level (IRQL). Usually, they try and limit the time spent at device IRQL to avoid blocking lower-level interrupts unnecessarily. In order for a driver to finish the work of an I/O request at a lower, less severe interrupt level, NT provides a way for a device driver to ask the system to call it later at a lower IRQL, called a Deferred Procedure Call (DPC). The functions are called deferred because they might not execute immediately. An ISR queues a deferred procedure call (DPC), which runs at a lower IRQL, to execute the remainder of interrupt processing. (Only drivers for interrupt-driven devices have ISRs; a file system, for example, doesn’t have one.) Because DPCs are generally queued by software running at a higher IRQL, the requested interrupt doesn’t surface until the kernel lowers the IRQL. So once there are no more higher pending interrupts, the DPC queue is run (e.g. the drivers are called back in turn to complete their work). DfrdCtx SysArg1 SysArg2 XydriverDpcRtn(DpcObj, DfrdCtx,SysArg1,SysArg2) { // ... }
统计核心态模式时间 “处理器时间” = 处理器总的工作时间 (等于经过的时间-空闲时间) “处理器时间” = “用户时间” + “特权时间” “特权时间” = 用于核心态模式的时间 “特权时间”包括 : 中断时间 DPC 时间 注意:中断和DPC不包括在任何进程或线程的时间片内 Using Performance Monitor, you can watch the percentage of time your system spends on handling interrupts and DPCs. The processor object and the system object both have % Interrupt Time and % DPC Time counters, which means you can monitor the activity on a per-CPU or a systemwide basis. These objects also have counters to measure the number of interrupts and DPCs per second. One situation in which you might want to look at these counters is if your system is spending an inordinate amount of time in kernel mode and you can’t attribute all the kernel-mode CPU time to processes. If total kernel-mode time is greater than the total kernel time of all processes, the remaining time has to be interrupts or DPCs, because time spent at interrupt level and DPC level is not charged to any thread or process. SEE DEMO SCRIPT#12 Screen snapshot from: Programs | Administrative Tools | Performance Monitor click on “+” button, or select Edit | Add to chart . . .
小测验(时间统计) 如果系统运行速度降低,并且没有进程在运行,这是怎么回事? A:中断 – 在PerfMon 中查看interrupts/sec
议事日程 工具概述 理解进程和线程活动 理解CPU时间统计 理解系统进程 进程和系统崩溃 Understanding the system process tree helps to troubleshoot system activity. Why? Because if a process is running, and it’s not something you ran (e.g. it’s a system process), it helps to understand what the role of that process is, hence why it may be running.
进程创建层次结构 tlist.exe (资源工具箱中)可以显示创建层次结构 (“tlist /t”) 如果父进程是死的,进程左对齐 例如,如果创建者不在了,就不能看见创建者 例如,explorer.exe的父进程死了 (它实际由userinit.exe启动,然后父进程退出) Using TLIST /T, we will dissect the system process tree to try and understand what each process is and what it does. The reason we will use TLIST /T is that knowing the parent of a process tells you to some degree where the process fits into the picture (e.g. if its parent is the service controller, the process is a service process). System processes divide into three basic categories: · Server processes that are installed as proper Windows 2000 services, such as the Event Log, RAS, IIS, etc. Many add-on server applications, such as Microsoft SQL Server and Microsoft Exchange Server, also include components that run as NT services. · Special system support processes, such as the idle process, system process, logon process and the session manager (these are not true Windows NT services, that is, they are not started by the service controller). · Environment subsystems, which expose the native operating system services to user applications, thus providing an operating system environment, or personality. Windows 2000 ships with three environment subsystems: Win32, POSIX, and OS/2 1.2. The main subsystem is Win32 (process name csrss.exe). Let’s start from top to bottom looking at the output of a TLIST /T [DEMO: do a TLIST/T in a command prompt and have that visible as you go through the next slides]
系统进程树 前两个进程不是实时进程 不运行用户模式 .EXE (没有映像名) 因此,每一个实用程序起一个名字 (空闲) 进程标识为 0 (空闲) 进程标识为 0 装入系统映像的一部分 空闲线程的原始(不是实时进程或实时线程) (系统) 进程标识为8(在Win2000 和 NT4中,是进程标识为2) 内核定义线程的原始(非实时进程) 线程 0(例程名为Phase1Initialization) 装入第一个“实时”进程, smss.exe(然后变为零页线程) If you look at the TLIST /T output, the first process, ID#0, is called “System Process”. But, despite the name, this first process is actually the System Idle process. Unfortunately, different utilities report the name of the Idle process differently (Task Manager calls it System Idle Process, Process Viewer calls it Idle, TLIST calls it System Process, etc.). The idle process is the fake process that accumulates idle CPU cycles. On a multiprocessor system, there is one idle thread per CPU – thus, it is easy to look at the CPU time for the individual threads in the idle process and see how “hard” your multiple CPUs have been working since the system was booted (e.g. if CPU 1 hasn’t carried much load, its Idle thread will have much more CPU time than the Idle thread for CPU 0). The second process, ID#8, is the home for a set of special threads called kernel-mode system threads – let’s explore these next.
系统线程 在操作系统和驱动程序中的子程序,需要作为实时线程来运行 系统线程出现在什么进程中? 例如需要和其他系统活动并行运行,定时等待,执行后台的“housekeeping” 工作 详细情况,见PsCreateSystemThread 的DDK文档 系统线程出现在什么进程中? NT4:“系统”进程(PID 2) Windows 2000: windowing系统线程在“csrss.exe” (Win32 子系统进程)中—在“系统”(PID 8)中 System threads are subroutines in the operating system or a device driver that are scheduled as real threads, along with the other normal user threads in the system. NT and some device drivers create system threads when the system boots. They are used to perform operations that need to take place concurrently with other system activity, such as issuing and waiting for I/Os or other objects or polling a device. In NT4, all system threads appeared in the fake process called “System” (id#2). In Windows 2000, all system threads except those created by the window system device driver (win32k.sys) are in this special “System” process (which on Win2000 is id#8). The other ones are in the Windowing subsystem process, csrss.exe (more details coming on this process). Let’s consider a few examples of system threads as used by NT.
系统线程的例子 核心操作系统(NTOSKRNL.EXE) 文件服务器 (SRV.SYS) 软盘驱动程序 (FLOPPY.SYS) 修改页复写器 平衡设置管理器 交易者(内核堆栈,工作设置) 高速缓冲存储器管理器复写器 零页线程(线程0,优先级0) 一般工作者线程池(ExQueueWorkItem) 文件服务器 (SRV.SYS) 软盘驱动程序 (FLOPPY.SYS) For example, the memory manager uses system threads to implement such functions as writing dirty pages to the page file or mapped files, swapping processes in and out of memory, and so forth. The kernel creates a system thread called the balance set manager that wakes up once per second to check and possibly initiate various scheduling and memory management–related events. The cache manager also uses system threads to implement both read-ahead and write-behind I/Os. The file server device driver (SRV.SYS) uses system threads to respond to network I/O requests for file data. Even the floppy driver has a system thread to poll the floppy device. AUDIENCE QUESTION: Therefore, if the system process is running, what do you know? ANSWER: Nothing! It could be a piece of NT, part of a driver, networking, cache activity, whatever. So, the challenge is to somehow map the thread that is running to the component that created it, so you can get some idea of what’s going on. That’s what we’ll do next.
识别系统线程 要实际地了解正在进行什么,必须找到该线程“属于”哪个驱动程序 1. 使用PerfMon来监测个别线程活动 2. 在Pviewer中获得相关线程和查询“开始地址”(线程函数地址) 3. 运行\ntreskit\pstat来找到线程属于哪个驱动程序(查找哪个驱动程序在线程的开始地址附近开始—可能必须计算驱动程序的结束地址) When you’re troubleshooting or going through a system analysis, it’s useful to be able to map the execution of individual system threads back to the driver or even to the subroutine that contains the code. For example, on a heavily loaded file server, the System process will likely be consuming considerable CPU time. But the knowledge that when the System process is running “some system thread” is running isn’t enough to determine which device driver or operating system component is running. To find which driver (or OS component) created a system thread requires the use of three utilities: Performance Monitor, Process Viewer, and a tool that was in the NT4 Resource Kit but is not in the Windows 2000 Resource Kit called pstat.exe (this tool is shipped with the Platform SDK, if you are an MSDN Subscriber). Let’s go through the steps to map system thread activity – GO TO DEMO SCRIPT #13.
识别系统线程 如果是NTOSKRNL.EXE线程,必须找到子程序的名字 详细内容,参见Windows NT技术内幕第 二章,第二版 通过内核调试器转储NTOSKRNL.DBG (或者NTKRNLMP.DBG) —注释:每一个服务包的变量值不同 2. 查找地址 详细内容,参见Windows NT技术内幕第 二章,第二版 在如下站点免费提供 http://mspress.microsoft.com/prod/books/sampchap/1312.htm If the start address of a system thread falls within the “driver” NTOSKRNL.EXE (the first driver in the list of drivers in the memory map at the end of the pstat.exe output), then what do you know? Only that it is a piece of NT that is running. If you really want to find out what specific component, you can actually determine the name of the subroutine by looking it up in the list of global symbols contained in the associated symbol table file NTOSKRNL.DBG. The symbols are provided on the Windows 2000 Customer Diagnostics CD-ROM in the \support\symbols directory. There is a special installation procedure to install them. Although we aren’t going to do this during this session, to generate the list of the global symbols in NTOSKRNL and their values, you can use run the kernel debugger I386KD.EXE, which is installed when you install the Windows 2000 Debugging Tools (also part of the Windows 2000 Customer Diagnostics CD) and either connect to a live system or open a crash dump file. Once you are in I386KD.EXE with just NTOSKRNL.DBG loaded, type: x * Before typing x *, use the !logopen command to create a log file of your kernel-debugging session. That way, you can save the output in a file and then search for the addresses in question. This technique is detailed in Chapter 2 of Inside Windows NT, 2nd edition (this chapter is available for free on the Microsoft Press web site).
系统进程树(cont.) smss.exe 对话管理器 第一个创建的进程 引入参数 HKLM\System\CurrentControlSet\Control\Session Manager 装入所需的子系统(csrss) ,然后winlogon csrss.exe Win32 子系统 winlogon.exe 登录进程:装入services.exe 和 lsass.exe 显示登录对话框(“键入CTRL+ALT+DEL ,登录) 当有人登入,运行在 HKLM\Software\Microsoft\Windows NT\WinLogon\Userinit 中的进程(通常只是userinit.exe) services.exe 服务控制器:也是几项服务的出发点 服务的开始进程不是services.exe的一部分 (由 HKLM\System\CurrentControlSet\Services驱动) lsass.exe 本地安全验证服务器(打开SAM) userinit.exe 登陆之后启动。启动外壳(通常是Explorer.exe —见 HKLM\Software\Microsoft\ Windows NT\CurrentVersion\WinLogon\Shell) 装入配置文件,恢复驱动器标识符映象,然后退出(因此,浏览器单独显示) explorer.exe 和它的孩子是所有交互式应用的创建者 Now let’s get back to analyzing the system process tree. So far, we only identified the first two processes: the Idle process and the System process. QUESTION: Looking at the TLIST /T output [switch back to that window], what is the first process that is running a real .EXE? ANSWER: smss.exe Session Manager (SMSS) The session manager (SMSS.EXE) is the first user-mode process created in the system. Besides performing a number of key system initialization steps, the session manager acts as a switch and monitor between applications and debuggers. Much of the configuration information in the registry that drives the initialization steps of SMSS can be found under \System…\Control\Session Manager. You’ll find it interesting to examine the kinds of data stored there. (For a description of the keys and values, see the Registry Entries help file in the Windows 2000 Resource Kit.) Looking at the TLIST/T output, SMSS has two children: - csrss.exe - winlogon.exe We’ll describe csrss on the next slide – let’s cover winlogon and its children first. Logon (WINLOGON) The Windows NT logon process, WINLOGON, handles interactive user logons and logoffs. WINLOGON is notified of a user logon request when the secure attention sequence (SAS) keystroke combination is entered. The default SAS on Windows NT is the combination Ctrl-Alt-Delete. The reason for the SAS is to protect users from password-capture programs that simulate the logon process. Once the username and password have been captured, they are sent to the local security authentication server process (LSASS.EXE) to be validated. If they match, a process named USERINIT.EXE is created. This process loads the user profile and runs the shell, defined in the registry by default to be EXPLORER.EXE. Then USERINIT exits. This is the reason EXPLORER is shown in the TLIST/T output with no parent—its parent has died, and as explained earlier, Tlist left-justifies processes whose parent is not running. (In reality, EXPLORER is the grandchild of WINLOGON.) Local Security Authentication Server (LSASS) The local security authentication server process receives authentication requests from WINLOGON and calls the appropriate authentication package (implemented as a DLL) to perform the actual verification, such as checking whether a password matches what is stored in the SAM (the part of the registry that contains the definition of the users and groups). Upon a successful authentication, LSASS generates an access token object that contains the user’s security profile. WINLOGON then uses this access token to create the initial shell process. Processes launched from the shell then by default inherit this access token. In addition, the NetLogon service lives in LSASS.
Win32子系统进程 (csrss.exe) 包括window系统的用户模式部分 不常用的调用: 主体在WIN32K.SYS (内核模式驱动程序)中 不常用的调用: 进程创建和删除 线程创建和删除 获得临时文件名 驱动器标识符 文件系统转向器的安全检查 Window控制台(字符单元)应用程序的管理 16位DOS支持(NTVDM.EXE) CSRSS is the Win32 subsystem process. It contains support for: · Console (text) windows · Creating and deleting processes and threads · Portions of the support for 16-bit virtual DOS machine (VDM) processes · Other miscellaneous functions, such as GetTempFile, DefineDosDevice, ExitWindowsEx, and several natural language support functions CSRSS stands for “client/server run-time subsystem”—but all the subsystems are client/server run-time subsystems. The reason for the name is that originally, the OS/2 and POSIX subsystems ran as threads inside a single subsystem process called CSRSS. When POSIX & OS/2 were removed, the process name was not changed, even though now it only contains the Win32 subsystem.
服务进程 安装时间 管理/维护 注册表 Setup程序通知服务控制器有关服务事项 在系统启动时,服务控制器读注册表,按照所要求的启动服务 控制面板可以启动和终止服务,及改变启动参数 服务 控制器 注册表 Setup 程序 CreateService Service processes are processes defined in the registry to be started by the Windows NT service control manager, services.exe. (Although in the registry Windows NT device drivers are also called “services,” we are referring to services that run as user-mode processes, not kernel-mode device drivers.) Services are defined in the registry under HKLM\System\CurrentControlSet\Services. The Resource Kit registry entries help file documents the subkeys and values for services. Services are really just Win32 programs that call special system calls to interact with the service controller, such as registering their successful startup, responding to status requests, or pausing or shutting down the service. A number of Windows NT components are implemented as services, such as the spooler, event log, support for remote procedure call, RAS, and various other networking components. Services are started and stopped by the service controller, a special system process running the image SERVICES.EXE that is responsible for starting, stopping, and interacting with service processes. As mentioned earlier, using the tlist /t command makes it easy to see which of the processes are service processes: if the parent is services.exe, it is a service process. Services have four different identifications: the process name you see running on the system, the internal name in the registry, the display name shown in Control Panel, and the description (new in Windows 2000) that describes the function of the service in one to two sentences. (Not all services have a display name—if a service doesn’t have a display name, the internal name is shown.) Windows 2000 makes it easy to find out which .EXE contains each service – just right click on the service and select properties – the image name is shown. In NT4, to map a running service process back to the actual service that is started, you had to search the registry for the image name to find the service(s) contained in that image. Let’s look at the installed services on my machine: DEMO SCRIPT #14 服务 进程 控制 面板
将服务进程映射为服务名 如果正在运行一个服务进程,如何发现包含哪项服务? 并非始终的一对一映射 Tlist /S - 显示每个服务进程中有哪些服务 并非始终的一对一映射 一些服务进程包含不止一项服务 例如services.exe包括事件日志、工作站和服务器服务 There isn’t always one-to-one mapping between service process and running services, however, because some services share a process with other services. In the registry, the type code indicates whether the service runs in its own process or shares a process with other services in the image. The TLIST /S command shows which services are running in each process (not all processes contain services). DEMO SCRIPT #15
议程 工具预览 理解进程和线程活动 理解CPU时间统计 理解系统进程 进程和系统崩溃
进程崩溃 注册表定义的未处理例外行为 零售NT系统缺省的是 Auto=1; Debugger=DRWTSN32.EXE HKLM\Software\Microsoft \Windows NT\CurrentVersion \AeDebug Debugger=应用程序崩溃时运行调试器的文件名 Auto 1=立即运行调试器 0=首先询问用户 零售NT系统缺省的是 Auto=1; Debugger=DRWTSN32.EXE 使用VC++缺省的是:Auto=0, Debugger=MSDEV.EXE Has anywhere here ever seen this message box? Let’s look at why a Dr. Watson occurs, and what you should do about it. Either hardware or software can generate exceptions and interrupts. For example, a bus error exception is caused by a hardware problem, whereas a divide-by-zero exception is the result of a software bug. When a program encounters an exception (like an invalid memory reference, or a divide by zero), it is given a chance to handle the error and continue execution. If the program does not handle the exception, Windows 2000 takes over and invokes the default unhandled exception filter. This system function looks in the registry under HKLM\Software\Microsoft\Windows NT\CurrentVersion\AeDebug key to determine whether to run a debugger immediately or to ask the user first. The default “debugger” on Windows NT4 and Windows 2000 is DRWTSN32.EXE (Dr. Watson), which isn’t really a debugger but rather a postmortem tool that captures the state of the application “crash”, records it in a log file, and creates a crash dump file containing the memory space of the process that is dying. If you have a developer tool such as Visual C++ installed, the debugger that is to be run is changed to MSDEV.EXE so you can debug programs that incur unhandled exceptions. DEMO SCRIPT #16
Dr. Watson 缺省地自动运行该程序 可以运行DRWTSN32.EXE来进行用户设置 日志文件(“drwtsn32.log”) 在\Documents和Settings\All Users\Documents\DrWatson 目录下 If you run DrWtsn32.exe interactively, it comes up with a configuration window so you can see the system default settings. DEMO SCRIPT #17 While the log file is appended to, the crash dump file is overwritten each time. So unless you have a procedure to rename this file and copy it off, only the most recent process crash will exist on the system. This dump file can be opened by the Windows Debugger windbg.exe, which comes with the Windows 2000 Debugger Tools (on the Windows 2000 Customer Diagnostics CD). It was not included with NT4 (it also ships on the Platform SDK and also the Windows 2000 Device Driver Kit or DDK). The dump file can be used by the vendor who owns the .EXE to debug why the process crashed. Very few vendors ask for this file – you should insist they take it!!
高级用户转储 Windows 2000调试工具的一部分 允许对运行过程进行映像 包括在Windows 2000 用户诊断CD中, Windows 2000 DDK, Windows Platform SDK 允许对运行过程进行映像 在命令行使用 通过按下预定义的“热键”(在图形用户界面(GUI)不响应的情况下有用) Dr Watson only runs when a process dies due to an unhandled exception. If a process is hung (e.g. doesn’t respond to user input), you can use the new Enhanced User Dump facility in Windows 2000 to snapshot a running process and create a memory crash dump file to send to the vendor to analyze why the program was hung (instead of just killing the process and telling the vendor the application hung).
问题与解答
详细信息参见以下网址 TechNet website: Microsoft® Official Curriculum www.microsoft.com/technet/ Microsoft® Official Curriculum www.microsoft.com/train_cert/ Technology Center on Windows 2000 www.microsoft.com/technet/win2000 IT Professionals User Groups in your area www.microsoft.com/technet/usergroup/default.asp Other Windows 2000 OS Internals Information: Books Inside Windows NT (Solomon, MS Press) Windows 2000 Resource Kit books (MS Press) www.microsoft.com/hwdev - hardware developers and driver writers www.sysinternals.com - Windows NT internals articles and tools
演讲信誉 作者: David Solomon (www.solsem.com) 制作者/编辑:Ken Kubota 感谢Jamie Hanrahan (www.cmkrnl.com),他合著了Windows NT 内部技术讲座,从该讲座中得到这些幻灯片 制作者/编辑:Ken Kubota 同时感谢Windows 2000 开发组中的成员,他们提供了一些内部问题的解答,并对本书进行了评论,同时提供了源代码 特别是:Dave Cutler, Lou Perazzoli, Mark Lucovsky, Tom Miller, Gary Kimura, Landy Wang, Rob Short, Andre Vachon
作者介绍 David Solomon 是 Windows NT技术内幕,第二版( Microsoft Press) 和Windows NT for OpenVMS Professionals (Digital Press)的作者 在Digital工作了十四年,后十年作为VMS操作系统开发组中的开发人员 在1992年创建Windows NT开发人员培训公司 为工业会议中的合格演讲者(WinDev, Tech•Ed, 软件开发, DECUS...) 是对MSWIN32技术支持Microsoft® MVP奖项的获得者