Wednesday, June 24, 2009

CT Code Camp Recap: Part 1

.NET Troubleshooting in a Production Environment - Polina Cherkasova

This session was a review of different strategies for identifying problems that are discovered after software has been deployed to the production environment. Polina categorized the severity of issues from unexpected behaviors (e.g. code working as designed but not as the user believes it should) through complete application failure (i.e. smoking crater where the server used to be).

One of the strategies to diagnose a problem was to have the production environment run in debug mode. However, this approach results in an application that is not identical to the production build and can skew the results. Besides, the problem may not be something which can be reproduced at will.

Polina spent a fair amount of time advocating for using an on-the-fly debugger. The debugger she was using was AVIcode ART (www.art4dotnet.com). It was about 3/4 of the way through the presentation before I realized Polina worked for AVIcode. So her advocacy made sense in that light. Her demo was compelling, though. I liked the fact you just configure the application to monitor the application (without having to code additional instrumentation into the application) and it can report on a variety of conditions with varying thresholds of sensitivity.

Another strategy is using application logs using tools such as log4net, Entity Library, Event Log, nLog, etc. This approach allows for information to be retrieved at run time but requires development effort and usually only deals with handled exceptions. Missing from her analysis was coding robust event logging into the application. Doing this will provide the person diagnosing a problem with additional information for identifying what lead to the problem being researched. However, like error logs, this requires development effort and getting the signal to noise ratio right can be a challenge (so only events which are truly helpful are captured).