A simple technique to make your logs more useful when debugging

This article is part of my Confessions of an Unintentional CTO book, which is currently available to read for free online.

It’s not exactly breaking news that bugs are easier to squash when all the relevant information is at hand. For the most proactive programmers, this insight ought to suggest program designs that now only retain information important to the debugging process but also make said information easy to locate at the right time.

To put this all in context, let’s say you received an email from a disgruntled customer. Apparently your system didn’t allow their payment to go through. All this customer saw was a bland and perfectly uninformative message on your credit card page: “Your Payment Could Not Be Completed At This Time”. And as it turns out, neither your database nor your logs nor your exception reports contain any other clues as to what went wrong.

Now what features should your code have had to make solving this bug easy? The obvious answer is it should have retained the failure code/failure reason returned by your payment processor. Additionally it should have made this info easy to find.

It’s all well and good to sit around after the fact and wish you had logged this error code. But as a builder of software systems, you want to get things right the first time. As such, you’ll be asking yourself, “What principles should I follow so that I won’t lose potentially important information?” A good answer in this scenario would be: “I should log any and all (non-sensitive) information returned by a third-party collaborator—and I will learn what information is available by inspecting the API documentation for every third-party API method my system calls”.

Applying this principle to our specific example would lead you to modifying your payment_finalisation method so that it includes something like this:

if ! payment_approved?
# Would print: "50010"

Having read the code, are you happy with its current formulation?

On the plus side, the error code will now be available to you somewhere in the logs. But the “somewhere” part in the last sentence hints at a brand new problem: How are you to locate and identify this info within your logs? If you inspect the code, you’ll see that all you’ve logged is a simple number, the numerical range of which you know nothing about! What’s more is there may well be other log statements from other functions that also print bare numbers. Given all this, how are you to map a particular failed payment with a particular error code in the logs? Sure, you could scroll through the logs around the time the error appeared. Or you could filter the log entries by IP address. But both of these approaches are fraught with difficulties: What if two failed payments happened around the same time? What if two requests shared the same IP address (e.g. because they were both from computers on the same college network)?

The foolproof solution to all this messiness is to get sophisticated about how you write log statements. In particular, logged information ought to always be accompanied by identifying tokens that can be easily found with a search; I call these log_tokens, and here’s how I would rewrite the above to take advantage of this idea:

if ! payment_approved?
  logger.warn(“paypal_error_code for customer:#{customer.id} is #{paypal_response["error_code"]}”)
# Would print: "paypal_error_code for customer:3401 is 50010"

Notice two points about the above:

  1. I write “paypal_error_code” in snake_case instead of the usual English language style (“PayPal Response Code”). This is because I want to avoid the difficulties and ambiguities inherent to searching for terms containing spaces. Within many search tools (such as the one in the third-party logging service I use), spaces are treated oddly. For example, my logging service inserts an invisible “OR” operator between each word, meaning that a search for “PayPal Response Code” would return log lines containing just the word “PayPal” or just the word code “Code”, etc.—not what I had in mind.

  2. I append additional identifying information—in this case the customer’s ID In your case, order numbers/subscription numbers/iPhone device numbers might be more appropriate. This extra information helps you quickly disambiguate log entries whenever the server is processing a few similar requests at the same time.

More Articles: Click here for full archive

Adminimisation: Tips for Building Great Admin Dashboards

How to build a robust administrative area that'll help you automate away the boring and routine tasks involved in running a web site

Why I Stay Behind the Version Rat Race

Whereas updates *sometimes* bring improvements, they are *guaranteed* to bring changes. And, as any programmer knows, software is allergic to change.

A Guide to Testing Web Applications for Developers Who Care about Overall Profitability

Presented in the form of 10 axioms