What is distributed tracing?
Distributed tracing is a mechanism to trace errors across your microservice architecture. To understand the relevance of distributed tracing, let us understand how distributed architecture or microservice architecture works.
Understanding Microservices
Microservices are loosely coupled independent services, that work in tandem to execute a function or a request. The communication between individual services happen via interconnected APIs. Since each service works independently, dependencies between services are reduced and this favours scalability and faster code deployments.
Identifying and Eliminating Bottlenecks
But complex applications tend to bring complex problems. In a microservice architecture, when an error occurs, it is difficult to trace back to the root cause of the error because of the interconnected services. So rather than looking at how to resolve the error, finding the error becomes a challenge in itself.
And it wont help to just look at the root cause without enough context. Since every service is independent on it's own, correlating the logs, metrics, and traces of the involved services and narrowing down to the exact issue may take time and effort, more than anticipated.
This is where distributed tracing comes handy.
How Distributed Tracing Works
Distributed tracing enables you to identify the exact line of occurrence of a error in a complex architecture. With distributed tracing, the application transactions are captured using request and response headers.
A trace header gets added from the original request to subsequent requests and thus creating a link through out the entire transaction that can be traced back to the origin.
Real-time Use Case
For instance, a payment transaction might have failed due to multiple reasons, it might be as simple as incorrect user input, or issue in payment gateway or due to a database component failure in the backend.
To identify what caused a failure, the data has to be correlated across all the interconnected services, involved in the corresponding transaction. While this may take hours surfing through logs, with matching the timestamp and data across services, distributed tracing tracks the API flow throughout your services and makes it easy to identify the root cause of a single transaction failure.