Distributed Tracing with OpenTelemetry - Part II
In the previous article, we learned what distributed tracing is, why it is necessary, how to do tracing, encountered challenges with existing tracing tools, and finally discovered that there is a more mature option available for the industry to adopt in terms of telemetry and observability.
In this article, we will be trying to understand OpenTelemetry in more depth.
To begin, we will examine how OpenTelemetry addresses some of the issues confronting the observability ecosystem.
- OpenTelemetry provides specification standards for the industry to adopt for all three key signals, namely Traces, Metrics, and Logs.
- It provides a reference implementation of the specifications in multiple languages so that the adoption is easy for the developer community. It means developers across popular languages can instrument code using the Otel libraries which will produce signals in standard formats.
It provides a more loosely coupled Observability stack architecture, which reduces coupling with vendor agents and allows consumers to change the tools as per their requirements.
It provides a set of tools and SDKs to customize the instrumentation requirements for many languages. (check Understanding OpenTelemetry Libraries section below)
It also provides Auto Instrumentation capability (which is a big win), wherein developers do not need to code any instrumentation logic, and all of it is done by the OpenTelemetry agent in a magical way.
Auto Instrumentation, also called as Zero Code Instrumentation, means without any additional code required by developers, OTEL can instrument by injecting bytecode if we use the language-supported auto-instrumentation library. There is support for Java, so it becomes easy to use without doing any additional code in your application.
This feature saves a huge amount of time in instrumenting your codebase.
Provides Rich Telemetry data which is an important requirement for building greater insights into the system under observation.
Provides a rich set of plugins (receivers, exporters) to help with incremental adoption by building adapters that can work with most of the visualization tools.
To give more color to these advantages and benefits, let's try to see a typical OpenTelemetry architecture.
In the diagram above, you can see following things:
Instead of using proprietary agents of any FOSS/Vendor tools in microservices, we are now using OpenTelemetry agents/libraries. This keeps the client codebase free of any vendor or tool specific native instrumentation libraries.
The signal data is communicated via OTLP protocol which is the standard OpenTelemetry uses.
It also supports GRPC, HTTP as the medium of communication.
All the signal data is sent to a Collector component which is considered to be the heart of the system. It is optional , but any matured and complex implementation will need an OpenTelemetry collector component in the architecture.
All the data gets processed via the Collector and then sent to different observability backends. More on Collectors later, but collector is responsible to receive, process and send the signal data to the target visualization tool.
Application telemetry data can now be exported to multiple backends , depending on the requirements. Also, note that you can plugin various out of box exporters for target backend.
For example - Jaeger backend accepts both jaeger format as well as OTLP format.
But Zipkin needs data in Zipkin format, so Zipkin exporter translates the trace data from OTLP to Zipkin native format. In the diagram , you can also see that we can configure the exporters directly from the agent library without routing the traces via collector, but such implementations are for simplified requirements only.
You can also choose SigNoz as your observability backend. SigNoz is an open source observability platform built to support OpenTelemetry natively. It provides metrics monitoring, ditributed tracing, and logs management under a single pane of glass. As SigNoz is open source, it can be self hosted, and the installation comes packed with an OpenTelemetry Collector.
Once your application is instrumented with OTel libraries, you can configure the exporter to send telemetry data directly to SigNoz.
Understanding OpenTelemetry Libraries
Core libraries of OpenTelemetry
The API is the bare-bones interface for instrumentation and does not provide any implementation. Third-party libraries or vendors can instrument their code using the API.
The SDK is a complete language library that provides implementations of the API so we can instrument our code manually ( if required). It is what we pull directly into our applications. It doesn’t implement exporters, which are separate libraries that have a dependency on the SDK.
Exporters are components that are an extension of the OpenTelemetry package. Exporters are libraries that send the instrumented telemetry data to backends. Vendors may be required to provide an implementation in order to convert data from OTLP to its native format. Ideally, most vendors should start working with the OTLP format in the long run for best portability.
What is Auto Instrumentation ?
And here the magic of OpenTelemetry unfolds. Developers will typically need to write instrumentation code to instrument their application code which can be hard and laborious. This is where Open Telemetry's Auto Instrumentation shines. It is basically a telemetry collection method that do not require the end user to modify the application’s source code. Methods vary by programming language, and examples include code manipulation (during compilation or at runtime), monkey patching, or running eBPF programs.
For most typical use cases, you won’t need to write instrumentation code yourself. You can use the language-specific Auto Instrumentation library and relax!
But it may happen that if you need to add custom span information or custom data attributes to spans, then you can use the manual instrumentation approach (i.e., using the API and SDK ) to write your own instrumentation logic.
Demo of Distributed Tracing with OpenTelemetry in a Spring Boot application
Let's work with an example to see OpenTelemetry in Action.
The sample Spring Boot Java application will have three microservices and a service registry
- user-service
- orders-service
- payment-service
- discovery-service (eureka server - service registry)
Here’s the architecture of the sample Java application along with OpenTelemetry and SigNoz.
Pre-requisites
- Java 8 or newer
- MySql 8
- SigNoz
- Maven
Installing SigNoz
SigNoz can be installed on macOS or Linux computers in just three steps by using a simple install script.
The install script automatically installs Docker Engine on Linux. However, on macOS, you must manually install Docker Engine before running the install script.
git clone -b main https://github.com/SigNoz/signoz.git
cd signoz/deploy/
./install.sh
You can visit our documentation for instructions on how to install SigNoz using Docker Swarm and Helm Charts.
When you are done installing SigNoz, you can access the UI at http://localhost:3301
Installing MySql
Download MySQL community version from here based on your operating system.
Once installation is complete, run the below commands to create a database for our sample nodejs app.
➜ ~ mysql -u root
mysql> create database signoz;
mysql> use signoz;
Installing Maven
To install maven follow below steps:
cd ~
mkdir maven
cd maven
curl -L https://dlcdn.apache.org/maven/maven-3/3.8.4/binaries/apache-maven-3.8.4-bin.zip -o maven.zip
unzip maven.zip
echo -n '\n export PATH=~/maven/apache-maven-3.8.4/bin:$PATH' >> ~/.zshrc
source ~/.zshrc
Verify maven using below command mvn -version
Running sample application
Below are the steps to run the sample Java application with OpenTelemetry:
Clone the sample Spring Boot app
We will be using a sample java app at this GitHub repo.git clone https://github.com/SigNoz/distributed-tracing-java-sample.git
cd distributed-tracing-java-sampleRun service discovery with Eureka Server
cd discovery-server
mvn clean install -Dmaven.test.skip
docker build -t discovery-service:1.0.1 .
docker run -d --name discovery-service -p 8761:8761 discovery-service:1.0.1You can go to http://localhost:8761/ and make sure your discover service registry with Eureka server is up and running.
Setting up Opentelemetry agent
For instrumenting Java applications, OpenTelemetry has a very handy Java JAR agent that can be attached to any Java 8+ application. The JAR agent can detect a number of popular libraries and frameworks and instrument it right out of the box. You don't need to add any code for that.Download the latest version of the Java JAR agent, and copy jar agent file in your application code. We have placed the agent under the folder named
agents
.Setting up SigNoz as the OpenTelemetry backend
To set up OpenTelemetry to collect and export telemetry data, you need to specify OTLP (OpenTelemetry Protocol) endpoint. It consists of the IP of the machine where SigNoz is installed and the port number at which SigNoz listens. OTLP endpoint for SigNoz -
<IP of the machine>:4317
If you have installed SigNoz on your local machine, then your endpoint is
127.0.0.1:4317
.Create a start.sh script with below environment variables and move it to scripts folder. Notice that we have updated the OTLP endpoint under
-Dotel.exporter.otlp.traces.endpoint=http://localhost:4317
.JAVA_OPTS="${JAVA_OPTS} \
-Xms${JAVA_XMS} \
-Xmx${JAVA_XMX} \
-Dapplication.name=user-service-java \
-Dotel.traces.exporter=otlp \
-Dotel.resource.attributes=service.name=user-service-java \
-Dotel.exporter.otlp.traces.endpoint=http://localhost:4317 \
-Dotel.service.name=user-service-java \
-Dotel.javaagent.debug=false \
-javaagent:../agents/opentelemetry-javaagent.jar"
Run the microservices
Now you need to run your microservices. Runusers-service
:cd user-service
mvn clean install -Dmaven.test.skip # Build user-service jar
cd scripts
sh ./start.sh # Run user-service with OTEL java agentOpen a new tab of your terminal, and run
payment-service
:cd payment-service
mvn clean install -Dmaven.test.skip
cd scripts
sh ./start.shOpen a new tab of your terminal, and run
order-service
:cd order-service
mvn clean install -Dmaven.test.skip
cd scripts
sh ./start.sh
Confirm table creation
After running the services, check if the tablesORDERS
andUSERS
are created using the commands below:mysql> use signoz;
mysql> show tables;
Visualizing traces data with SigNoz dashboards
To visualize the traces data with SigNoz, we first need to generate some user data by interacting with the spring boot application.
Generating user data by interacting with the sample app
You need to generate some user data to see how it appears in the SigNoz dashboard. The sample application comes with an UI to interact with the app. Use the below command in the root folder to launch the UI:
npm install -g serve
serve -l 5000 u
Use the buttons to interact with the app and generate some data. For example, click Create User
button to create a new user in the MySQL db.
Now go to SigNoz dashboard, you will notice the list of service names that we configured:
- user-service
- order-service
- payment-service
You can play around with the dashboard to see what data is captured. Below is a handy guide on how to use the SigNoz dashboard to see the captured data.
How to use SigNoz dashboard to analyze traces
The traces tab of the SigNoz dashboard provides powerful filters to analyze the traces data. You can use a number of filters to see traces data across many dimensions. For example:
See the count of requests by service and HTTP Status code
Run aggregates on your tracing data
You can run aggregates like avg, max, min, 50th percentile, 90th percentile on your tracing data to get analyze performance issues in your application.
Get more granular details by grouping traces data
You can also see these aggregates in more granular detail by grouping them by service name, operation, HTTP URL, HTTP method, etc.
Identify latency issues with Flamegraphs and Gantt charts
You can inspect each span in the table with Flamegraphs and Gantt charts to see a complete breakdown of the request. Establishing a sequential flow of the user request along with info on time taken by each part of the request can help identify latency issues quickly. Let’s see how it works in the case of our sample Spring Boot app.
Go to operation filter on the left navigation and apply two filters /payment/transfer/id/{id}/amount/{amount}
and service name payment-service
. Click on the single event listed in the table as shown below:
You will be able to see the Flamegraph of the selected span which shows how the request traveled between the `payment-service` and the `user-service`. You can also use the Gantt chart to analyze each event in detail.
SigNoz also provides a detailed view of common semantic conventions like HTTP, network, and other attributes. The end-to-end tracing of user requests can help you to identify latency issues quickly.
Conclusion
Distributed tracing is a powerful and critical toolkit for developers creating applications based on microservices architecture. Using OpenTelemetry, you can implement distributed tracing easily for your distributed application. It also makes your instrumentation future proof, as you avoid any vendor lock-in.
OpenTelemetry and SigNoz provide a great open-source solution to implement distributed tracing for your applications. You can try out SigNoz by visiting its GitHub repo 👇
If you have any questions or need any help in setting things up, join our slack community and ping us in #support
channel.
Related Posts