Trace Kubernetes applications effectively

Tracing is the process of following the trail of an application's code or data flow to understand bottlenecks in performance or find the root cause of an error—for example, a request made to the app by a user. The value received when tracking a single service’s activity progress in an entire system will be lower than that of tracking multiple services. Tracking multiple systems is referred to as distributed tracing.

Modern systems are often built as microservices instead of monolithic applications. The downsides of microservices architecture are its complexity and potential hidden events, like the reasons why requests fail. Distributed tracing helps DevOps teams and SREs track events and easily detect what’s gone wrong by monitoring requests and how data is exchanged. In this article, we will cover how tracing is done in applications that run on Kubernetes, a widely adopted microservices platform.

Tracing on Kubernetes

Effective tracing is an important aspect of monitoring and debugging applications deployed on Kubernetes. It helps developers keep tabs on their application performance, identify issues, and troubleshoot problems.

One of the keys to effective tracing is using a distributed tracing tool, such as Zipkin or Jaeger. These tools present the complete flow from the initial request to the final response in the application. They provide detailed information about the various components and services involved in the request-response flow, including performance and any errors or exceptions that may have occurred.

Therefore, to initiate tracing on Kubernetes, you must install and configure a distributed tracing tool. This typically involves deploying the tool as a containerized application on your Kubernetes cluster and configuring your application to send trace data to the tool. You can then view and analyze the trace data using the tool's web interface or API.

Another critical element of effective tracing is to instrument the application code to generate trace spans. A span is a unit of work in an application, such as a database query or an HTTP request. A trace consists of multiple spans, and each span can have an unlimited number of child spans under it.

If you want to know how a service or application is performing, observe the trail of the spans comprising it, along with the metadata.

To instrument your code with trace spans, use a tracing library like OpenTelemetry. These libraries provide APIs for creating trace spans and adding metadata to them. With this metadata, you can then filter and analyze your trace data.

Useful tools for tracing in cloud-native applications

In this section, we will cover popular tools for implementing distributed tracing. Most of these are open source, but they are also well-maintained and used in industrial applications. Open-source instrumentation provides interoperability with other open-source software. It is also well-maintained by a community of developers who actively adopt it in production environments, and it’s easy to learn via documentation.

OpenTelemetry

OpenTelemetry was created as a combination of OpenTracing and OpenCensus libraries.

OpenTracing is an open-source specification for distributed tracing that defines a standard set of APIs for instrumenting applications and libraries for various programming languages. It aims to provide a vendor-neutral way to implement distributed tracing in microservices-based applications.

While OpenTelemetry has a general observability use case, it is among the best in tracing implementation. OpenTelemetry provides standardized vendor-agnostic software development kits (SDKs), APIs, and tools to collect and send telemetry data to a preferred observability backend. One benefit of using OpenTelemetry over a bespoke tracing tooling is that it’s a robust library that doesn’t require installation of a separate software for different needs.

The developers of OpenTelemetry seemingly adopted every necessary feature to keep dependency on third-party software minimal and make OpenTelemetry more robust.

Zipkin

Zipkin was developed by Twitter under an open-source software license. It uses Dapper-style tracing technique which involves adding unique identifiers, called “trace IDs”, to request as they follow through different services in a system. Zipkin comes with the following components:

  • A backend for trace analysis
  • A collector/daemon process
  • Client libraries
  • Integration support for popular RPC and frameworks.

Jaeger

Jaeger is another open-source distributed tracing tool that you can use to store, visualize, and filter distributed traces. We will use Jaeger as the distributed tracing client in this article to demonstrate the adoption of tracing in Kubernetes applications.

Jaeger is an Uber Technologies project that is now a CNCF graduate project.

It has components such as:

  • Agents
  • Client libraries
  • Collector
  • Ingester

Jaeger can be used for the following services:

  • Context propagation
  • Service dependency analysis
  • Distributed monitoring
  • Root cause analysis
  • Performance or latency optimization

Pixie

Pixie is yet another open-source Kubernetes observability tool. It is a CNCF sandbox project, and it makes use of eBPF to automatically capture telemetry data with no dependence on manual instrumentation.

Site24x7

Site24x7 Tracing is a commercially distributed tracing tool for Kubernetes applications. The main advantages of using Site24x7 Tracing over open-source tools include:

  • Enhanced performance and scalability: Site24x7 Tracing is designed to handle large volumes of trace data with minimal overhead, making it suitable for use in high-traffic environments.
  • Advanced visualization and analysis capabilities: Site24x7 Tracing provides a rich set of visualization and analysis tools to help you understand the performance and behavior of your distributed system.
  • Professional support: Site24x7 provides technical support over chat, phone, and email for multiple time zones. Site24x7 will also be rolling out a premium support service shortly.

Open-source tools are great for customization, development setups, and experimentation. However, they can be a nightmare to work with if you want to combine several of them to get a complete solution. Using a professional tool designed for enterprises with round-the-clock tech support spares users the constant hassle of having to find and fix bugs that come with integrating multiple tools.

Site24x7 will get you started with monitoring within minutes of signing up, with 24/7 tech support and an active community to help users with any questions or problems they might have. You can customize your workflow from the data being sent to the dashboards you want to view.

Implementing tracing with Jaeger and OpenTelemetry in Kubernetes

To demonstrate how to implement tracing with Jaeger and OpenTelemetry in a Kubernetes application, we’ll follow these steps:

  • Create the application to be traced with the OpenTelemetry and Jaeger clients added.
  • Test the application locally to confirm it works as expected.
  • Implement tracing with Jaeger on Docker.
  • Containerize and deploy the application on Kubernetes with Jaeger UI connected.

Creating the application

For our example, we will use Go to build a web application and name it k8sTrace, then create a main.go file with the contents below:

package main 
import (
"context"
"fmt"
"log"
"net/http"
"os"
"time"

"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/exporters/jaeger"
"go.opentelemetry.io/otel/sdk/resource"
traceSDK "go.opentelemetry.io/otel/sdk/trace"
semConv "go.opentelemetry.io/otel/semconv/v1.7.0"
)

const (
service = "k8sTrace"
environment = "development"
id = 1
)

func tracerProvider(url string) (*traceSDK.TracerProvider, error)
{
// Create the Jaeger exporter
exp, err :=
jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(url)
))
if err != nil {
return nil, err
}
tp := traceSDK.NewTracerProvider(
// Always be sure to batch in production.
traceSDK.WithBatcher(exp),
// Record information about this application in a Resource.
traceSDK.WithResource(resource.NewWithAttributes(
semConv.SchemaURL,
semConv.ServiceNameKey.String(service),
attribute.String("environment", environment),
attribute.Int64("ID", id),
)),
)
return tp, nil
}
func main() {
// Tracer destination.
tp, err :=
tracerProvider("<https://localhost:14268/api/traces>")
if err != nil {
log.Fatal(err)
}
// Register our TracerProvider as the global so any imported instrumentation in the future will default to using it.
otel.SetTracerProvider(tp)

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

// Cleanly shutdown and flush telemetry when the application exits.
defer func(ctx context.Context) {
// Do not make the application hang when it is shutdown.
ctx, cancel = context.WithTimeout(ctx, time.Second*5)
defer cancel()
if err := tp.Shutdown(ctx); err != nil {
log.Fatal(err)
}
}(ctx)

tr := tp.Tracer("component-main")

ctx, span := tr.Start(ctx, "hello")
defer span.End()
// HTTP Handlers
helloHandler := func(w http.ResponseWriter, r *http.Request)
{
// Use the global TracerProvider
tr := otel.Tracer("hello-handler")
_, span := tr.Start(ctx, "hello")
span.SetAttributes(attribute.Key("testset").String("value"))
defer span.End()

yourName := os.Getenv("MY_NAME")
fmt.Fprintf(w, "Hello %q!", yourName)
}

otelHandler :=
otelhttp.NewHandler(http.HandlerFunc(helloHandler), "Hello")
http.Handle("/", otelHandler)

log.Println("Listening on localhost:3000")

log.Fatal(http.ListenAndServe(":3000", nil))
}

Testing the application locally

Next, we will test the code locally. To do so, we’ll provide an environment variable to satisfy MY_NAME and then run the code

export MY_NAME="Alice" ; go run main.go

To confirm that the application works locally, visit the port on http://localhost:3000.

The application works locally Fig 1: The application works locally

Next, we’ll set up distributed tracing with Jaeger.

Starting Jaeger locally with Docker

We need to initialize Jaeger before running the application via Kubernetes. Run the command below while the Docker engine is running:

$ docker run -d --name jaeger \\ 
-e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \\
-p 5775:5775/udp \\
-p 6831:6831/udp \\
-p 6832:6832/udp \\
-p 5778:5778 \\
-p 16686:16686 \\
-p 14268:14268 \\
-p 9411:9411 \\
jaegertracing/all-in-one:1.6

Visit the Jaeger UI at http://localhost:16686 to see the default home page:

Jaeger UI home page Fig 2: Jaeger UI home page

Now that we have the Jaeger UI running as a daemon via Docker, we will deploy the application on Kubernetes and run it.

Containerizing the application and deploying on Kubernetes

To deploy the application on Kubernetes, we’ll first build the image for the application. Write a Dockerfile for the application. The Dockerfile content used in this article is as follows:

FROM golang:1.19-alpine AS build 
ADD . /src
WORKDIR /src
RUN go get -d -v -t
RUN GOOS=linux GOARCH=amd64 go build -v -o k8strace
FROM alpine:3.17.2
EXPOSE 8080
CMD ["k8strace"]
ENV VERSION 1.1.4
COPY --from=build /src/k8strace /usr/local/bin/k8strace
RUN chmod +x /usr/local/bin/k8strace

Next, run the build command:

docker build -t k8strace .
            

This will create a Docker image for the application based on the Dockerfile content.

Next, push the Docker image to the Docker container registry. If you’re deploying on AWS or GCP, you can push the image to the container registry on those platforms.

docker tag k8strace your_name/k8strace:1.0 
docker push your_name/k8strace:1.0

This will upload the Docker image to the Docker container registry where Kubernetes can access it.

Now we’ll create a Kubernetes deployment manifest for the application. The contents of the file, which we’ll name k8s-deployment.yaml, are as follows:

apiVersion: apps/v1 
kind: Deployment
metadata:
name: k8strace
spec:
replicas: 1
selector:
matchLabels:
app: k8strace
template:
metadata:
labels:
app: k8strace
spec:
containers:
- name: k8strace
image: your_name/k8strace:1.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
env:
- name: MY_NAME
value: "Bob"
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /readiness
port: 8080
scheme: HTTP
initialDelaySeconds: 5
timeoutSeconds: 1

The manifest will create a deployment with a single replica of the application. The env field specifies the environment variable needed by the application (i.e., MY_NAME set to “Bob”).

Make sure minikube is started:

minikube start

Apply the Kubernetes deployment manifest with this command:

kubectl apply -f k8s-deployment.yaml
            

This will create the Kubernetes deployment using the manifest written above. Once the application is deployed, Kubernetes will automatically create a pod to run the application. Expose the application for external access using the below command:

kubectl expose deployment k8strace --type=LoadBalancer --port=80 --target-port=8080 
Kubernetes deployment manifests applied and application exposed Fig 3: Kubernetes deployment manifests applied and application exposed

You can access the application by navigating to the load balancer’s external IP address in a web browser. To get the IP address, run the following:

kubectl get services k8strace -o wide

You can then forward the Jaeger UI port to your local machine using the command

kubectl port-forward deployment/k8strace 16686:16686 

This will allow you to view the Jaeger UI from a web browser by navigating to http://localhost:16686. Search for the traces of the application k8straces—once you find them, you can view the details of individual spans and traces, including the duration, tags, and logs associated with each span.

Conclusion

Effective tracing is key to monitoring and debugging applications deployed on Kubernetes. You can gain deeper insight into application performance by using a tracing tool, instrumenting your code with trace spans, and taking advantage of Kubernetes' native support for tracing.

It is advisable to regularly review and analyze your trace data to identify trends and patterns. Adopting this practice can help you identify potential issues before they grow into major problems and ensure that your applications continue to run smoothly.

Was this article helpful?

Related Articles

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 "Learn" portal. Get paid for your writing.

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.

Apply Now
Write For Us