Making Logs Meaningful: The Role of Mapped Diagnostic Context

Logs in any computing program can be used to track its execution and how it is faring at the moment. Based on my experience here at QBurst, let me show you how to make meaningful logs in any backend application with proper contextual information. This post will specifically take a look at logging in Java and see how logs can be made meaningful, and then go on to introduce Mapped Diagnostic Context (MDC) logging.

Take a look at the log excerpts from a website’s backend server that received requests to save a user setting from two users at the same time:

We have marked the logs for the user with id 1832 above. But when there are more users or more logs, identifying the logs of a particular user can be difficult. We can log the user ids too and that seemed to be sufficient, or so we thought.

We can see the limitation of this solution, if say, we want to distinguish between the two log traces of the same user that happened at about the same time:

So there it was—we can no longer distinguish between the two logs traces here. We had to look at it in a different way. To address the problem, we thought of adding to the log an identifier corresponding to every other HTTP request hitting our server. We can create such an identifier for each request to distinguish the context, but it would make the log more verbose as we’d have to repeat that for all the logs.

This brought us to Mapped Diagnostic Context (MDC), which does most of the work behind the scenes with little intervention from the developer. 

Let’s see how to set up MDC in a Java Spring Boot project. If you are using Log4j or SLF4j/Logback you can use:

If you are using Log4j2 you can use:

Now, retrieve “myProp” in the log pattern using %X{myProp}. For example, you can use this:

With MDC, we get these logs. Note the hash after the timestamp in each of the logs. Yes, this looks scalable. 

Note that there is a unique identifier added to the logs, which identifies the log traces that are generated from the same request. And it’s much better when we put the user identifier in the MDC context too. Now we have contextual information even in the logs from “MyLogger” and “MyInterceptor”.

A few things to note:

  • Logs can be enriched with information regardless of where the actual logging occurred. You needn’t pass around all the contextual information (like the “user” data in the above example).
  • MDC works well in concurrent backend systems too. If you add a property to MDC from a parent Java thread, the MDC properties are inherited by all the threads that are spawned from the parent thread. You can think of MDC as a ThreadLocal variable that is not shared across multiple threads or an InheritableThreadLocal variable, which is inherited by the child threads.
  • When dealing with microservices, tracing the logs and monitoring them is complex and vital at the same time. If you are writing services with Spring Boot, you can use Spring Cloud Sleuth, which automatically injects a unique id to a single request or job (called trace-id). A trace id can be split into multiple “spans” if needed. Sleuth has the option to integrate with aggregators like Zipkin.

Now let’s take a complicated concurrent system using Spring Batch framework and check if MDC works well there.

MDC in a Spring Batch Application

We have a small Spring Batch program that runs in a web application. The object of the program is to process a large sequence of documents from Mongo database and extract meaningful data in the shortest time possible and eventually save the inferences back to the Mongo database. We had a simple batch processing system set up for this.

Making Logs Meaningful: Role of Mapped Diagnostic Context
Architecture of a Spring Batch processing system.

In our case, the batch job is a multi-threaded program and consists of a single step (which again is multi-threaded). In this step, we have a reader, a processor and a writer. The reader takes an input, the processor processes it, and the writer handles the output. The processor is also a multi-threaded program. So that’s how it’s been set up.

So in org.springframework.batch.core.configuration.annotation.BatchJobConfigurer we configured a custom JobLauncher. Here, we populated the context for the whole job, which would be available across the step threads, reader, processor threads, writer and the various job listeners and even the HTTP interceptors:

This had to work but it didn’t. We were baffled. It took us a while to realize what was happening here. The multi-threading here is a Spring-flavored java.util.concurrent.ThreadPoolExecutor basically based on Java’s Executor framework. The Executor framework reduces the overhead of creating threads by reusing a “thread-pool” and executing tasks on them. Because of this, the MDC context would not be inherited and so we have to do that explicitly. The solution was to use a TaskDecorator that copies the MDC context to the current Runnable task:

And apply the Decorator task around the SimpleAsyncTaskExecutor:

Now the logs are more readable with all the context data when we run our Spring Batch program:

You can see how the idea of having the business context in the logging framework is useful for tracing the interactions with your program. You can distinguish interleaved log output from different sources and MDC is very useful here. While we saw an example of a Java-Spring system here, the idea of MDC has been around for some time in most languages.