Monitoring and Tracing

Learn about the monitoring and tracing features available in NGINX Service Mesh.

Overview

NGINX Service Mesh can integrate with your Prometheus, Grafana, and tracing backends for exporting metrics and tracing data.

If you do not have Prometheus, Grafana, or tracing backends deployed, you can follow the Observability Tutorial to deploy a basic demo setup.

Important:
In order to prevent automatic sidecar injection into your Prometheus, Grafana, and tracing deployments, they should be deployed in a namespace where auto-injection is disabled. Alternatively, you can disable auto-injection for these deployments specifically by adding the injector.nsm.nginx.com/auto-inject: disabled label to the PodTemplateSpec of the deployments.

Prometheus

Warning:
We do not currently support Prometheus deployments running with TLS encryption.

To use NGINX Service Mesh with your Prometheus deployment:

  1. Connect your existing Prometheus Deployment to NGINX Service Mesh:

    The expected address format is <service-name>.<namespace>:<port>

    • At deployment:

      Run the nginx-meshctl deploy command with the --prometheus-address flag.

      For example:

      nginx-meshctl deploy ...  --prometheus-address "my-prometheus.example-namespace:9090"
      
    • At runtime:

      You can use the NGINX Service Mesh API to update the Prometheus address that the control plane uses to get metrics.

      The example below demonstrates how to update the meshconfig resource:

      spec:
         prometheusAddress: my-prometheus.example-namespace:9090
      
  2. Add the nginx-mesh-sidecars scrape config to your Prometheus configuration. If you are deploying NGINX Plus Ingress Controller with the NGINX Service Mesh, add the nginx-plus-ingress scrape config as well. Consult the Metrics section of the NGINX Ingress Controller Deployment tutorial for more information about the metrics collected.

See Also:
For more information on how to view and understand the metrics that we track, see our Prometheus Metrics guide.

Grafana

The custom NGINX Service Mesh Grafana dashboard NGINX Mesh Top can be imported into your Grafana instance. For instructions and a list of features, see the Grafana example in the nginx-service-mesh GitHub repo.

Tracing with OpenTelemetry

NGINX Service Mesh can export tracing data using OpenTelemetry. Tracing data can be exported to an OpenTelemetry Protocol (OTLP) gRPC Exporter, such as the OpenTelemetry (OTEL) Collector, which can then export data to one or more upstream collectors like Jaeger, DataDog, LightStep, or many others. Before installing the mesh, deploy an OTEL Collector that is configured to export data to an upstream collector that you have already deployed or have access to.

Tracing relies on the trace headers passed through each microservice in an application in order to build a full trace of a request. If you don’t configure your app to pass trace headers, you’ll get disjointed traces that are more difficult to understand. See the OpenTelemetry Instrumentation guide for information on how to instrument your application to pass trace headers.

  • At deployment:

    Use the --telemetry-exporters flag to point the mesh to your OTLP exporter:

    nginx-meshctl deploy ... --telemetry-exporters "type=otlp,host=otel-collector.example-namespace.svc,port=4317"
    

    You can set the desired sampler ratio to use for tracing by adding the --telemetry-sampler-ratio flag to your deploy command. The sampler ratio must be a float between 0 and 1. The sampler ratio sets the probability of sampling a span; this means that a sampler ratio of 0.1 sets a 10% probability the span is sampled. For example:

    nginx-meshctl deploy ... --telemetry-sampler-ratio 0.1
    
  • At runtime:

    You can use the NGINX Service Mesh API to update the telemetry configuration.

    The example below demonstrates how to update the meshconfig resource:

    telemetry:
       exporters:
          otlp:
             host: otel-collector.example-namespace.svc
             port: 4317
       samplerRatio: 0.1
    

If configured correctly, tracing data that is generated or propagated by the NGINX Service Mesh sidecar will be exported to the OTEL Collector, and then exported to the upstream collector(s), as shown in the following example diagram:

OpenTelemetry Data Flow
Tracing data flow using the OpenTelemetry Collector