WinDbg the easy way
- Setup and configuration
- CDB command line basics
- Solving real life problems
- Batch files
What is your favourite debugger? If you would ask me this question, I would probably reply that it is “Visual Studio + WinDbg”. I like Visual Studio for its natural and productive interface. I like that it allows me to get the necessary information quickly and, well, visually. But unfortunately, some kinds of information cannot be easily obtained with Visual Studio debugger. For example, what if I need to know which thread is holding a particular critical section? Or which function occupies most of the space on the stack? Here comes WinDbg. Its commands can provide answers to these and many other interesting questions that arise during debugging sessions. And I even do not have to close Visual Studio to attach WinDbg to the target application – thanks to WinDbg's support for noninvasive debugging (discussed later in this article), we can take advantage of Visual Studio GUI and WinDbg commands at the same time.
The only problem is that WinDbg is not too easy to use. It takes time to adapt to its user interface, and even more time to master the commands. But what if you need it today, right now, to debug an urgent problem? Is there a quick and easy way? Yes, there is. CDB, the little brother of WinDbg, exposes the same functionality with a simple, command line based interface. In this article, I will show you how to take advantage of CDB and start using it right now to complement Visual Studio debugger. You will see how to setup and configure CDB, and how to use it to solve real life problems. In addition, I will provide you with a set of batch files, which will hide most of the remaining complexities of CDB's command line interface and save you a lot of typing.
Of course, before we can start using CDB, we have to install and configure it. WinDbg and CDB are distributed as part of Debugging Tools for Windows package, which can be downloaded here. The installation is simple, and unless you are going to develop applications with the help of WinDbg SDK, you can simply accept the default settings. (But if you are going to use the SDK, you have to select custom setup and enable SDK installation; it is also recommended to use an installation directory whose name does not contain spaces). After the installation has been completed, the installation directory should contain all the necessary files, including WinDbg (windbg.exe) and CDB (cdb.exe).
Debugging Tools also support “xcopy” style of installation. After you have installed it on one machine, you do not necessarily have to run the setup again to install it on other machines. It is enough to collect all the files in the installation directory and copy them onto the target machine or a network share.
Some important WinDbg commands cannot function properly without access to up-to-date symbols for operating system DLLs. In the past, we could obtain the necessary symbols by downloading a large package from Microsoft's FTP server. It was time consuming, and symbols could easily become outdated (and therefore useless) after installing an update for the operating system. Fortunately, nowadays there is a much simpler way to obtain symbols – symbol server. This technology, supported by WinDbg and Visual Studio debuggers, allows to download up-to-date symbols on demand from a server maintained by Microsoft. With symbol server, we do not have to download the complete symbol package, because the debugger knows which DLLs it is going to inspect, and therefore can download symbols only for those DLLs. If symbols become outdated after an operating system update has been installed, the debugger will notice it and download the necessary symbols again.
To make symbol server feature work, we should let the debugger know the path to the symbol server. The simplest way to do it is to specify the symbol server path in _NT_SYMBOL_PATH environment variable. The following path should be used: "srv*c:\symbolcache*http://msdl.microsoft.com/download/symbols" (c:\symbolcache directory will be used as a cache for symbol files downloaded from the symbol server; you can use any local or network directory path that is valid on your system). For example:
After you have set _NT_SYMBOL_PATH environment variable to the proper value, symbol server feature is ready for use. More information about symbol server technology, related settings and, if necessary, troubleshooting tips can be found in WinDbg documentation (Debuggers | Symbols section).
|Additional configuration steps are needed to access symbol server from behind a proxy server that requires you to log in. See CDB and proxy servers section of this article for more information.|
When we start learning a new debugger, the first question usually is: how to start the debugging session? As most debuggers do, CDB allows us to start a new instance of the application or attach to an already running process. Starting a new instance is as simple as the following:
If we want to attach to an already running process, one of the following options can be used:
|This option allows to attach to the process with the specified process id. The process id can be obtained from Task Manager or other similar tool.|
|This option allows to attach to the process with the specified name of its main executable file (.exe). This option is usually more convenient than “-p Pid”, because we usually know the name of our application's main executable, and do not have to look for it in Task Manager. But this option cannot be used if more than one process with the given executable name is currently running (CDB will report an error).|
|This option allows to attach to the process that contains the specified service. For example, if you want to attach to, say, Windows Management Instrumentation service, you should use WinMgmt as the service name.|
CDB can also be used to analyze crash dumps. To open a crash dump, use -z option:
cdb -z DumpFile
cdb -z c:\myapp.dmp
After we have started a new debugging session, CDB displays its own command prompt. You can use this prompt to enter and execute any command supported by CDB.
'q' command ends debugging session and exits CDB:
0:000> q quit: >
|Warning: When you end the debugging session and exit CDB, the debuggee will also be terminated by the operating system. If you want to exit CDB and keep the debuggee running, you can use .detach command (supported only in Windows XP and newer operating systems), or use CDB in noninvasive mode (discussed below).|
While it is possible to use CDB command prompt to execute debugger commands, it is often faster to specify the necessary commands on the command line, using -c option.
cdb -pn myapp.exe -c "command1;command2"
(commands are separated with semicolons)
For example, the following command line will attach CDB to our application, display the list of loaded modules, and exit:
cdb -pn myapp.exe -c "lm;q"
Note the use of 'q' command at the end of the command list – it allows us to automatically close CDB after all debugger commands have been executed.
By default, when we use CDB to debug an already running process, it attaches as a fully functional debugger (using Win32 Debugging API). It is possible to set breakpoints, step through the code, get notified about various debugging events (such as exceptions, module load/unload, thread start/exit, and so on). But Visual Studio debugger allows us to do the same, and offers a much better user interface. In addition, only one debugger can be attached to the process at a time. Does it mean that if we are already debugging an application with Visual Studio debugger, we cannot use CDB to obtain additional information about the same application? No, it doesn't, because CDB also supports noninvasive debugging mode.
When CDB attaches to the target process in noninvasive mode, it does not use Win32 Debugging API. Instead, it simply suspends all threads in the target process and starts executing commands specified by the user. After all commands have been executed, just before CDB itself terminates, it resumes the suspended threads. As a result, the target process can continue running as if it wasn't debugged at all. Even if the target process is already being debugged by a fully functional debugger like Visual Studio, CDB still can attach to it in noninvasive mode and obtain the necessary information. After CDB has finished its work and detached, we can continue debugging the application in Visual Studio debugger.
How to use CDB in noninvasive mode? Using -pv command line option. For example, the following command will attach to our application noninvasively, display the list of loaded modules, and exit. The application will continue running.
cdb -pv -pn myapp.exe -c "lm;q"
The output of CDB commands can be long, and it can be inconvenient to read it from the console window. It would be much better to save the output to a log file, and CDB allows us to do it with the help of -loga and -logo options ('-loga <filename>' appends the output to the end of the specified file, while '-logo <filename>' overwrites the file if it already exists).
Lets enhance our sample command (that lists modules in the target process) with logging capability and save the output to out.txt file in the current directory:
cdb -pv -pn myapp.exe -logo out.txt -c "lm;q"
Another important command line option exposed by CDB is -lines. This option turns on the source line information support, which, for example, allows CDB to display source file names and line numbers when reporting call stacks. (By default, source line support is turned off, and CDB does not display source file/line information).
If you are going to use CDB from behind a proxy server that requires you to log in, symbol server access will not work by default. The reason is that in the default configuration CDB is not allowed to show proxy server's login prompt when it is trying to connect to the symbol server. To change this behavior and make symbol server access work, two commands should be added to the beginning of the command list:
cdb -pv -pn myapp.exe -logo out.txt -c "!sym prompts;.reload;lm;q"
When CDB launches a new application, attaches to an existing process, or opens a crash dump, it shows a sequence of startup messages. These messages are followed by the output of CDB commands (which can be specified using -c option, or entered manually). Usually the startup messages serve only informational purposes; but if something goes wrong, they will contain the description of the problem, sometimes followed by recommendations on how to solve it.
For example, the following output contains a message that informs us that symbol path is not set, and as a result some debugger commands may not work:
D:\Progs\DbgTools>cdb myapp.exe Microsoft (R) Windows Debugger Version 6.5.0003.7 Copyright (c) Microsoft Corporation. All rights reserved. CommandLine: myapp.exe Symbol search path is: *** Invalid *** **************************************************************************** * Symbol loading may be unreliable without a symbol search path. * * Use .symfix to have the debugger choose a symbol path. * * After setting your symbol path, use .reload to refresh symbol locations. * ****************************************************************************
Here is the list of CDB command line templates that we will use throughout the remainder of this article (we will always use the same templates, and usually only the list of commands inside -c option will change, depending on the problem we are trying to solve).
Attach to a running process (by process id) in noninvasive mode, execute a set of commands and save the output in out.txt file:
cdb -pv -p <processid> -logo out.txt -lines -c "command1;command2;...;commandN;q"
Attach to a running process (by executable name) in noninvasive mode, execute a set of commands and save the output in out.txt file:
cdb -pv -pn <exename> -logo out.txt -lines -c "command1;command2;...;commandN;q"
Attach to a running process (by service name) in noninvasive mode, execute a set of commands and save the output in out.txt file:
cdb -pv -psn <servicename> -logo out.txt -lines -c "command1;command2;...;commandN;q"
Open a crash dump file, execute a set of commands and save the output to out.txt file:
cdb -z <dumpfile> -logo out.txt -lines -c "command1;command2;...;commandN;q"
If we are going to use CDB from behind a proxy server that requires us to login, two additional commands should be added to make symbol server access work. For example:
cdb -pv -pn <exename> -logo out.txt -lines -c "!sym prompts;.reload;command1;command2;...;commandN;q"
Looks like a lot of typing? Not really. Later in this article I will present a set of batch files, which will hide the repeating command line options and minimize the amount of information you have to enter manually.
When our application appears hung or unresponsive, the natural question is: what is it currently doing? Where it got stuck? Of course, we can attach Visual Studio debugger to the application and inspect the call stacks of all threads. But we can do the same with CDB, and much quicker. The following command attaches CDB to the application noninvasively, prints all call stacks to the console and to the log file, and exits:
cdb -pv -pn myapp.exe -logo out.txt -lines -c "~*kb;q"
('kb' command asks CDB to print the call stack of the current thread; '~*' prefix asks the debugger to repeat 'kb' for all existing threads in the process).
DeadLockDemo.cpp file contains a sample application that demonstrates a typical deadlock scenario. If you compile it and run, its worker threads will get stuck very soon. If we run the abovementioned command to see what the application's threads are doing, we will see something like the following (here, and in the future, startup messages are omitted):
. 0 Id: 6fc.4fc Suspend: 1 Teb: 7ffdf000 Unfrozen ChildEBP RetAddr Args to Child 0012fdf8 7c90d85c 7c8023ed 00000000 0012fe2c ntdll!KiFastSystemCallRet 0012fdfc 7c8023ed 00000000 0012fe2c 0012ff54 ntdll!NtDelayExecution+0xc 0012fe54 7c802451 0036ee80 00000000 0012ff54 kernel32!SleepEx+0x61 0012fe64 004308a9 0036ee80 a0f63080 01c63442 kernel32!Sleep+0xf 0012ff54 00432342 00000001 003336e8 003337c8 DeadLockDemo!wmain+0xd9 [c:\tests\deadlockdemo\deadlockdemo.cpp @ 154] 0012ffb8 004320fd 0012fff0 7c816d4f a0f63080 DeadLockDemo!__tmainCRTStartup+0x232 [f:\rtm\vctools\crt_bld\self_x86\crt\src\crt0.c @ 318] 0012ffc0 7c816d4f a0f63080 01c63442 7ffdd000 DeadLockDemo!wmainCRTStartup+0xd [f:\rtm\vctools\crt_bld\self_x86\crt\src\crt0.c @ 187] 0012fff0 00000000 0042e5aa 00000000 78746341 kernel32!BaseProcessStart+0x23 1 Id: 6fc.3d8 Suspend: 1 Teb: 7ffde000 Unfrozen ChildEBP RetAddr Args to Child 005afc14 7c90e9c0 7c91901b 000007d4 00000000 ntdll!KiFastSystemCallRet 005afc18 7c91901b 000007d4 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc 005afca0 7c90104b 004a0638 00430b7f 004a0638 ntdll!RtlpWaitForCriticalSection+0x132 005afca8 00430b7f 004a0638 005afe6c 005afe78 ntdll!RtlEnterCriticalSection+0x46 005afd8c 00430b15 005aff60 005afe78 003330a0 DeadLockDemo!CCriticalSection::Lock+0x2f [c:\tests\deadlockdemo\deadlockdemo.cpp @ 62] 005afe6c 004309f1 004a0638 f3d065d5 00334fc8 DeadLockDemo!CCritSecLock::CCritSecLock+0x35 [c:\tests\deadlockdemo\deadlockdemo.cpp @ 90] 005aff6c 004311b1 00000000 f3d06511 00334fc8 DeadLockDemo!ThreadOne+0xa1 [c:\tests\deadlockdemo\deadlockdemo.cpp @ 182] 005affa8 00431122 00000000 005affec 7c80b50b DeadLockDemo!_callthreadstartex+0x51 [f:\rtm\vctools\crt_bld\self_x86\crt\src\threadex.c @ 348] 005affb4 7c80b50b 003330a0 00334fc8 00330001 DeadLockDemo!_threadstartex+0xa2 [f:\rtm\vctools\crt_bld\self_x86\crt\src\threadex.c @ 331] 005affec 00000000 00431080 003330a0 00000000 kernel32!BaseThreadStart+0x37 2 Id: 6fc.284 Suspend: 1 Teb: 7ffdc000 Unfrozen ChildEBP RetAddr Args to Child 006afc14 7c90e9c0 7c91901b 000007d8 00000000 ntdll!KiFastSystemCallRet 006afc18 7c91901b 000007d8 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc 006afca0 7c90104b 004a0620 00430b7f 004a0620 ntdll!RtlpWaitForCriticalSection+0x132 006afca8 00430b7f 004a0620 006afe6c 006afe78 ntdll!RtlEnterCriticalSection+0x46 006afd8c 00430b15 006aff60 006afe78 003332e0 DeadLockDemo!CCriticalSection::Lock+0x2f [c:\tests\deadlockdemo\deadlockdemo.cpp @ 62] 006afe6c 00430d11 004a0620 f3e065d5 00334fc8 DeadLockDemo!CCritSecLock::CCritSecLock+0x35 [c:\tests\deadlockdemo\deadlockdemo.cpp @ 90] 006aff6c 004311b1 00000000 f3e06511 00334fc8 DeadLockDemo!ThreadTwo+0xa1 [c:\tests\deadlockdemo\deadlockdemo.cpp @ 202] 006affa8 00431122 00000000 006affec 7c80b50b DeadLockDemo!_callthreadstartex+0x51 [f:\rtm\vctools\crt_bld\self_x86\crt\src\threadex.c @ 348] 006affb4 7c80b50b 003332e0 00334fc8 00330001 DeadLockDemo!_threadstartex+0xa2 [f:\rtm\vctools\crt_bld\self_x86\crt\src\threadex.c @ 331] 006affec 00000000 00431080 003332e0 00000000 kernel32!BaseThreadStart+0x37
The call stack (and source line numbers) suggest that ThreadOne is holding critical section CritSecOne and is waiting for critical section CritSecTwo, while ThreadTwo is holding critical section CritSecTwo and is waiting for critical section CritSecOne. This is an example of the classical “lock acquisition order” deadlock, where two threads need to acquire the same set of synchronization objects and do it in different order. If you want to avoid deadlocks of this kind, make sure that all threads acquire the necessary synchronization objects in the same order (in the sample, both ThreadOne and ThreadTwo could agree to acquire CritSecOne first and CritSecTwo next to avoid the deadlock).
|By default, 'kb' command displays only the first 20 frames of the call stack. If you want to see a larger number of stack frames, you can explicitly override this limit (e.g., 'kb100' command asks the debugger to display up to 100 stack frames). In a live WinDbg session, it is also possible to use .kframes command to change the default limit for all subsequent commands.|
Our sample application contained only three simple threads, and it wasn't difficult to identify the ones responsible for the deadlock. In large applications, it can be more difficult to identify the suspicious threads and prove their guilt. How should we approach it? In most cases, we already know a thread that isn't functioning properly (otherwise, how could we notice that the application is misbehaving?). Usually this thread is waiting on a synchronization object that is not available by some reason. Why is this object not available? Very often we can answer this question if we know which thread is currently holding this object (owns it, in other words). If the object happens to be a critical section, !locks command can help us to identify its current owner. When used without parameters, this command displays the list of critical sections that are currently held by the application's threads. Free critical sections are not included in the output.
Let's see !locks command in action:
cdb -pv -pn myapp.exe -logo out.txt -lines -c "!locks;q"
Here is the output of this command (also for DeadLockDemo.cpp sample):
CritSec DeadLockDemo!CritSecOne+0 at 004A0620 LockCount 1 RecursionCount 1 OwningThread 3d8 EntryCount 1 ContentionCount 1 *** Locked CritSec DeadLockDemo!CritSecTwo+0 at 004A0638 LockCount 1 RecursionCount 1 OwningThread 284 EntryCount 1 ContentionCount 1 *** Locked Scanned 40 critical sections
Looking at the output of !locks command (OwningThread field in particular), we can conclude that critical section CritSecOne is held by the thread whose id is 0x3d8, and critical section CritSecTwo is held by thread 0x284. The output of 'kb' command (in the previous picture) allows to identify the threads with these ids.
If the application uses other kinds of synchronization objects (e.g. mutexes), it is more difficult to identify their owners (kernel debugger is required), and I will reserve it for a future article.
For most kinds of software applications, too high CPU consumption (up to 100% on a single-CPU system, according to Task Manager) is a clear sign of a bug. Usually it means that one of the application's threads has entered an infinite loop. Of course, the natural way to debug this problem is to attach Visual Studio debugger to the process and check what the offending thread is doing. But how can we determine which thread to check? CDB offers us an easy and convenient solution - !runaway command. When used without parameters, this command displays the times spent by each of the application's threads executing user mode code (additional parameters can also show the times spent in kernel mode, and the times elapsed since the moment when a thread was started).
Here is how to use this command with CDB:
cdb -pv -pn myapp.exe -logo out.txt -c "!runaway;q"
Here is a sample output of !runaway command:
0:000> !runaway User Mode Time Thread Time 1:358 0 days 0:00:47.408 2:150 0 days 0:00:03.495 0:d8 0 days 0:00:00.000
It looks like the thread with id 0x358 utilizes most of the CPU time. But this information is not yet enough to prove that thread 0x358 is guilty, because the command displays the CPU time spent by the thread during its whole lifetime. What we need is to see how the threads' CPU times change. Let's run the same command again. This time, we could see something like the following:
0:000> !runaway User Mode Time Thread Time 1:358 0 days 0:00:47.408 2:150 0 days 0:00:06.859 0:d8 0 days 0:00:00.000
Now we should compare this output with the output from the previous run, and find the thread whose CPU time has increased the most. In the sample application, it definitely is the thread 0x150. Now we can attach Visual Studio debugger to the application, switch to this thread and check why it is spinning.
CDB can also be very useful when we want to find the reason of a stack overflow exception. Of course, uncontrolled recursion is the most typical reason of stack overflows, and it is usually enough to look at the call stack of the offending thread to find the place where it went out of control. Visual Studio can do it just fine, so why use CDB? Lets think about more complicated cases. For example, what if our application contains an algorithm that relies on recursion? We put significant efforts into designing the algorithm and keeping recursion under control in all possible situations, but sometimes the stack still overflows. Why? Probably, because some functions used by the algorithm occupy too much space on the stack under certain conditions. How can we determine the amount of stack space occupied by a function? Unfortunately, Visual Studio debugger does not offer an easy way to do it.
It is also possible that an application raises a stack overflow exception even when the call stack does not show any signs of recursion. For example, take a look at StackOvfDemo.cpp sample. If you compile it and run under debugger, stack overflow will soon occur. But the call stack at the moment of the exception looks innocent:
StackOvfDemo.exe!_woutput StackOvfDemo.exe!wprintf StackOvfDemo.exe!ProcessStringW StackOvfDemo.exe!ProcessStrings StackOvfDemo.exe!main StackOvfDemo.exe!mainCRTStartup KERNEL32.DLL!_BaseProcessStart@4
Obviously, one of the functions on the call stack is using too much stack space. But how can we find this function? Of course, with the help of CDB – its 'kf' command allows to display the number of bytes occupied by every function on the call stack. While the application is still stopped in Visual Studio debugger, lets run the following command:
cdb -pv -pn stackovfdemo.exe -logo out.txt -c "~*kf;q"
(Be aware that by default 'kf' reports only the last 20 frames on the call stack, as we have already discussed in Debugging Deadlocks section. If you want to display more than 20 frames, change ~*kf to, for example, ~*kf1000. Also note that ~*kf will report the call stacks of all threads. If the application contains lots of threads, it can be undesirable, and the command can be changed to '~~[tid]kf', where 'tid' is the thread id of the target thread (for example, '~~[0x3a8]kf'))
This command would display something like this:
. 0 Id: 210.3a8 Suspend: 1 Teb: 7ffde000 Unfrozen Memory ChildEBP RetAddr 00033440 0041aca5 StackOvfDemo!_woutput+0x22 44 00033484 00415eed StackOvfDemo!wprintf+0x85 d8 0003355c 00415cc5 StackOvfDemo!ProcessStringW+0x2d fc878 0012fdd4 00415a44 StackOvfDemo!ProcessStrings+0xe5 108 0012fedc 0041c043 StackOvfDemo!main+0x64 e4 0012ffc0 7c4e87f5 StackOvfDemo!mainCRTStartup+0x183 30 0012fff0 00000000 KERNEL32!BaseProcessStart+0x3d
Pay attention to the first column – it reports the number of bytes occupied by the corresponding function on the stack. Obviously, ProcessStrings function is using the lion's share of the available stack space, and is therefore responsible for stack overflow.
If you wonder why ProcessStrings function requires so much space on the stack, here is the explanation.
This function uses ATL's A2W macro to convert strings from ANSI to Unicode, and this macro uses _alloca
function internally to allocate memory on the stack. The memory allocated with _alloca is released only
when its caller (ProcessStrings in this case) returns. Until ProcessStrings returns control, every subsequent
call to A2W (and therefore _alloca) will allocate additional space on the stack, quickly exhausting it.
Bottom line: avoid using _alloca in a loop.