Prometheus Metrics

Overview

F5 NGINX Service Mesh integrates with Prometheus for metrics and Grafana for visualizations.

To configure NGINX Service Mesh to use Prometheus when deploying, refer to the Monitoring and Tracing guide for instructions.

The mesh supports the SMI spec, including traffic metrics. The NGINX Service Mesh creates an extension API Server and shim that query Prometheus and return the results in a traffic metrics format. See SMI Traffic Metrics for more information.

Occasionally metrics are reset when the nginx-mesh-sidecar reloads NGINX Plus. If traffic is flowing and you fail to see metrics, retry after 30 seconds.

If you are deploying NGINX Plus Ingress Controller with the NGINX Service Mesh, make sure to configure the NGINX Plus Ingress Controller to export metrics. Refer to the Metrics section of the NGINX Plus Ingress Controller Deployment tutorial for instructions.

Prometheus Metrics

The NGINX Service Mesh sidecar exposes the following metrics in Prometheus format via the /metrics path on port 8887:

NGINX Plus metrics.
upstream_server_response_latency_ms: a histogram of upstream server response latencies in milliseconds. The response time is the time from when NGINX establishes a connection to an upstream server to when the last byte of the response body is received by NGINX.

All metrics have the namespace nginxplus, for example nginxplus_http_requests_total and nginxplus_upstream_server_response_latency_ms_count.

Examples

This section includes a set of example metrics that you may plug into your existing Prometheus-based tooling to gain insights into the traffic flowing through your applications.

HTTP

View the rate of requests currently flowing:
irate(nginxplus_http_requests_total[30s])

View unsuccessful response codes of your applications:

nginxplus_upstream_server_responses{code=~"3xx|4xx|5xx"}

This can be used to form more complex queries such as current success rate:

sum(irate(nginxplus_upstream_server_responses{code=~"1xx|2xx"}[30s])) by (app, version) / sum(irate(nginxplus_upstream_server_responses[30s])) by (app, version)

UDP/TCP

View the current throughput of clients sending to upstreams:
irate(nginxplus_stream_upstream_server_sent[30s])
You can also see the total number of connections made:
nginxplus_stream_upstream_server_connections

(TCP Only): NGINX Service Mesh exposes a whole host of latency information for TCP connections:

nginxplus_stream_upstream_server_connect_time

nginxplus_stream_upstream_server_first_byte_time

nginxplus_stream_upstream_server_response_time

Labels

All metrics have the following labels:

Metric Name	Description
job	Prometheus job name. All metrics scraped from an nginx-mesh-sidecar have a job name of `nginx-mesh-sidecars`, and all metrics scraped from an NGINX Plus Ingress Controller have a job name of `nginx-plus-ingress`.
pod	Name of the Pod.
namespace	Namespace where the Pod resides.
instance	Address of the Pod.
pod_template_hash	Value of the pod-template-hash Kubernetes label.
deployment, statefulset, or daemonset	Name of the Deployment, StatefulSet, or DaemonSet that the Pod belongs to.

Metrics for upstream servers, such as nginxplus_upstream_server_requests, have these additional labels:

Metric Name	Description
code	Response code of the upstream server. For NGINX Plus metrics, the code will be one of the following: 1xx, 2xx, 3xx, 4xx, or 5xx. For the `upstream_server_response_latency_ms` metrics, the code is the specific response code, such as 201.
upstream	Name of the upstream server group.
server	Address of the upstream server selected by NGINX.

Metrics for outgoing requests have the following destination labels:

Metric Name	Description
dst_pod	Name of the Pod that the request was sent to.
dst_service	Name of the Service that the request was sent to.
dst_deployment, dst_statefulset, or dst_daemonset	Name of the Deployment, StatefulSet, or DaemonSet that the request was sent to.
dst_namespace	Namespace that the request was sent to.

Metrics exported by NGINX Plus Ingress Controller have these additional labels:

Metric Name	Description
ingress	Set to true if ingress traffic is enabled.
egress	Set to true if egress traffic is enabled.
class	Ingress class of the NGINX Plus Ingress Controller.
resource_type	Type of resource: VirtualServer, VirtualServerRoute, or Ingress.
resource_name	Name of the VirtualServer, VirtualServerRoute, or Ingress resource.
resource_namespace	Namespace of the resource. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use `dst_namespace` for queries and filters.
service	Service the request was sent to. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use `dst_service` for queries and filters.
pod_name	Name of the Pod that the request was sent to. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use `dst_pod` for queries and filters.

Filter Prometheus Metrics using Labels

Here are some examples of how you can use the labels above to filter your Prometheus metrics:

Find all upstream server responses with server side errors for deployment productpage-v1 in namespace prod:
nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code="5xx"}
Find all upstream server responses with successful response codes for deployment productpage-v1 in namespace prod:
nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code=~"1xx|2xx"}

Find the p99 latency of all requests sent from deployment productpage-v1 in namespace prod to service details in namespace prod over the last 30 seconds:

histogram_quantile(0.99, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details"}[30s])) by (le))

Find the p90 latency of all requests sent from deployment productpage-v1 in namespace prod to service details in namespace prod over the last 30 seconds, excluding 301 response codes:

histogram_quantile(0.90, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code!="301"}[30s])) by (le))

Find the p50 latency of all successful(response codes of 200, or 201) requests sent from deployment productpage-v1 in namespace prod to service details in namespace prod over the last 30 seconds:
histogram_quantile(0.50, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code=~"200|201"}[30s])) by (le))
Find all active connections for the NGINX Plus Ingress Controller:
nginxplus_connections_active{job="nginx-plus-ingress"}

Grafana

The custom NGINX Service Mesh Grafana dashboard NGINX Mesh Top can be imported into your Grafana instance. For instructions and a list of features, see the Grafana example in the nginx-service-mesh GitHub repo.

To view Grafana, port-forward your Grafana Service:

kubectl port-forward -n <grafana-namespace> svc/grafana 3000

View source

Edit this page

Create a new issue