Everyone is aware of the problem of discovering the causes of a bug when it’s only present in one environment and, if it’s Production, the problem is even bigger, even if you have a solid error logging system in place.
Recently we faced this same situation and we didn’t have any clues to help us, only that the w3wp process was dying and the ASP.NET session remained locked. After some thought, we arrived at the conclusion that there was an infinite loop somewhere, and we had a vague idea of the “zone” of code where this was happening, but we couldn’t reproduce it in any other environment even after several hours of testing.
Ideally we needed to know the exact line of code that causing the problem, but how? The only way was using a debugger on Production web servers, but installing Visual Studio on them was not an option. So I searched for a command-line debugger for Microsoft.NET, and I found, among others, Mdgb.exe in the .NET SDK 2.0 (you should download and install it anyway as it’s free, harmless and relatively small).
The procedure to use it in a similar situation is really easy with the following series of commands:
- prepare to reproduce the bug
- start mdbg
a[ttach]to the w3wp process (discover it or
l[ist] appdomainsand see IIS metabase); important: when you attach to a process, mdbg stops it so pay attention not to harm your customers
golet the process run again (use it as soon as possible after the attach)
- reproduce the bug
- At exception,
w[here]to obtain the stack trace (remember to upload the .pdb files with your .dll files if you want the line numbers).
Here is a sample output for the exception that was produced:
/LM/W3SVC/120002/root STOP: Unhandled Exception thrown Exception=System.StackOverflowException _className= _exceptionMethod= _exceptionMethodString= _message= _data= _innerException= _helpURL= _stackTrace= _stackTraceString= _remoteStackTraceString= _remoteStackIndex=0 _dynamicMethods= _HResult=-2147023895 _source= _xptrs=0 _xcode=-532459699 This is unhandled exception, continuing will end the process IP: 0 @ System.Enum.System.IConvertible.ToBoolean - MAPPING_APPROXIMATE
Here is a sample output for the interesting part of a (long) stack trace:
... 4037. Project.Core.Competition.Revenue.ComputeAccounts (Revenue.cs:785) 4038. Project.Core.Competition.Revenue.get_CreditAccount (Revenue.cs:144) 4039. Project.Core.Competition.Revenue.ComputeAccountsMatch (Revenue.cs:860) 4040. Project.Core.Competition.Revenue.ComputeAccounts (Revenue.cs:785) 4041. Project.Core.Competition.Revenue.UpdateStatusValidation (Revenue.cs:360) 4042. Project.Web.UI.PageBase.PageResourceChangeStatusValidation (PageBase.cs:2473) 4043. Project.Web.UI.PageBase.PageResourceChangeStatusValidation (PageBase.cs:2383) 4044. Project.Web.UI.PageBase.Page_ResourceChangeStatusValidation (PageBase.cs:2213) 4045. Project.Web.UI.PageBase.Page_OperationChangeStatusValidation (PageBase.cs:4331) 4046. Project.Web.UI.PageBase.PageLoad_Operations (PageBase.cs:3398) ... 4086. System.Web.Hosting.ISAPIRuntime.ProcessRequest (source line information unavailable)
As you can see, now we have the exact line of code with a low-impact method. Again, it’s important to note that the impact is low not only on the running application but also on prerequisites, because only the .NET SDK should be installed on the computer before using the debugger.