Traffic Metrics

Overview

NGINX Service Mesh uses Prometheus for metrics and Grafana for visualizations. Both are included in the installation by default.

Note:
You can configure NGINX Service Mesh to use a different Prometheus server when deploying. Refer to the Monitoring and Tracing task for instructions.

The mesh supports the SMI spec, including traffic metrics. The NGINX Service Mesh creates an extension API Server and shim that query Prometheus and return the results in a traffic metrics format.

Note:
Occasionally metrics are reset when the nginx-mesh-sidecar reloads NGINX Plus. If traffic is flowing and you fail to see metrics, retry after 30 seconds.

If you are deploying NGINX Plus Ingress Controller with the NGINX Service Mesh, make sure to configure the NGINX Plus Ingress Controller to export metrics. Refer to the Metrics section of the NGINX Plus Ingress Controller Deployment tutorial for instructions.

Prometheus Metrics

The NGINX Service Mesh sidecar exposes the following metrics in Prometheus format via the /metrics path on port 8887:

  • NGINX Plus metrics.
  • upstream_server_response_latency_ms: a histogram of upstream server response latencies in milliseconds. The response time is the time from when NGINX establishes a connection to an upstream server to when the last byte of the response body is received by NGINX.

All metrics have the namespace nginxplus, for example nginxplus_http_requests_total and nginxplus_upstream_server_response_latency_ms_count.

Labels

All metrics have the following labels:

Metric Name Description
job Prometheus job name. All metrics scraped from an nginx-mesh-sidecar have a job name of nginx-mesh-sidecars, and all metrics scraped from an NGINX Plus Ingress Controller have a job name of nginx-plus-ingress.
pod Name of the Pod.
namespace Namespace where the Pod resides.
instance Address of the Pod.
pod_template_hash Value of the pod-template-hash Kubernetes label.
deployment, statefulset, or daemonset Name of the Deployment, StatefulSet, or DaemonSet that the Pod belongs to.

Metrics for upstream servers, such as nginxplus_upstream_server_requests, have these additional labels:

Metric Name Description
code Response code of the upstream server. For NGINX Plus metrics, the code will be one of the following: 1xx, 2xx, 3xx, 4xx, or 5xx. For the upstream_server_response_latency_ms metrics, the code is the specific response code, such as 201.
upstream Name of the upstream server group.
server Address of the upstream server selected by NGINX.

Metrics for outgoing requests have the following destination labels:

Metric Name Description
dst_pod Name of the Pod that the request was sent to.
dst_service Name of the Service that the request was sent to.
dst_deployment, dst_statefulset, or dst_daemonset Name of the Deployment, StatefulSet, or DaemonSet that the request was sent to.
dst_namespace Namespace that the request was sent to.

Metrics exported by NGINX Plus Ingress Controller have these additional labels:

Metric Name Description
ingress Set to true if ingress traffic is enabled.
egress Set to true if egress traffic is enabled.
class Ingress class of the NGINX Plus Ingress Controller.
resource_type Type of resource: VirtualServer, VirtualServerRoute, or Ingress.
resource_name Name of the VirtualServer, VirtualServerRoute, or Ingress resource.
resource_namespace Namespace of the resource. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use dst_namespace for queries and filters.
service Service the request was sent to. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use dst_service for queries and filters.
pod_name Name of the Pod that the request was sent to. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use dst_pod for queries and filters.
Filter Prometheus Metrics using Labels

Here are some examples of how you can use the labels above to filter your Prometheus metrics:

  • Find all upstream server responses with server side errors for deployment productpage-v1 in namespace prod:

    nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code="5xx"}
    
  • Find all upstream server responses with successful response codes for deployment productpage-v1 in namespace prod:

    nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code=~"1xx|2xx"}
    
  • Find the p99 latency of all requests sent from deployment productpage-v1 in namespace prod to service details in namespace prod over the last 30 seconds:

    histogram_quantile(0.99, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details"}[30s])) by (le))
    
  • Find the p90 latency of all requests sent from deployment productpage-v1 in namespace prod to service details in namespace prod over the last 30 seconds, excluding 301 response codes:

    histogram_quantile(0.90, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code!="301"}[30s])) by (le))
    
  • Find the p50 latency of all successful(response codes of 200, or 201) requests sent from deployment productpage-v1 in namespace prod to service details in namespace prod over the last 30 seconds:

    histogram_quantile(0.50, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code=~"200|201"}[30s])) by (le))
    
  • Find all active connections for the NGINX Plus Ingress Controller:

    nginxplus_connections_active{job="nginx-plus-ingress"}
    

Grafana

By default, NGINX Service Mesh deploys Grafana for metric visualization and configures the NGINX Mesh Top dashboard as the default dashboard. For an example of the dashboard and a list of its features, see the Grafana example in the nginx-service-mesh GitHub repo.

To view the dashboard, port-forward the Grafana service:

kubectl port-forward -n nginx-mesh svc/grafana 3000

If you prefer to use your own Grafana instance refer to the Monitoring and Tracing task for instructions.

How to View Traffic Metrics

To view the NGINX Service Mesh traffic metrics, use the kubectl get command shown below.

kubectl get --raw /apis/metrics.smi-spec.io/v1alpha1/namespaces/default/deployments/reviews-v3/edges

The output looks similar to that shown below.

{
  "kind": "TrafficMetricsList",
  "apiVersion": "metrics.smi-spec.io/v1alpha1",
  "metadata": {
    "selfLink": "/apis/metrics.smi-spec.io/v1alpha1/namespaces/default/deployments/reviews-v3/edges"
  },
  "resource": {
    "kind": "Deployment",
    "namespace": "default",
    "name": "reviews-v3"
  },
  "items": [
    {
      "kind": "TrafficMetrics",
      "apiVersion": "metrics.smi-spec.io/v1alpha1",
      "metadata": {
        "name": "reviews-v3",
        "namespace": "default",
        "selfLink": "/apis/metrics.smi-spec.io/v1alpha1/namespaces/default/deployments/reviews-v3/edges",
        "creationTimestamp": "2019-10-16T22:33:02Z"
      },
      "timestamp": "2019-10-16T22:33:02Z",
      "window": "30s",
      "resource": {
        "kind": "Deployment",
        "namespace": "default",
        "name": "reviews-v3"
      },
      "edge": {
        "direction": "to",
        "resource": {
          "kind": "Deployment",
          "namespace": "default",
          "name": "ratings-v1"
        }
      },
      "metrics": [
        {
          "name": "p99_response_latency",
          "unit": "ms",
          "value": "2969m"
        },
        {
          "name": "p90_response_latency",
          "unit": "ms",
          "value": "2700m"
        },
        {
          "name": "p50_response_latency",
          "unit": "ms",
          "value": "1499m"
        },
        {
          "name": "success_count",
          "value": "9"
        },
        {
          "name": "failure_count",
          "value": "0"
        }
      ]
    },
    {
      "kind": "TrafficMetrics",
      "apiVersion": "metrics.smi-spec.io/v1alpha1",
      "metadata": {
        "name": "reviews-v3",
        "namespace": "default",
        "selfLink": "/apis/metrics.smi-spec.io/v1alpha1/namespaces/default/deployments/reviews-v3/edges",
        "creationTimestamp": "2019-10-16T22:33:02Z"
      },
      "timestamp": "2019-10-16T22:33:02Z",
      "window": "30s",
      "resource": {
        "kind": "Deployment",
        "namespace": "default",
        "name": "reviews-v3"
      },
      "edge": {
        "direction": "from",
        "resource": {
          "kind": "Deployment",
          "namespace": "default",
          "name": "productpage-v1"
        }
      },
      "metrics": [
        {
          "name": "p99_response_latency",
          "unit": "ms",
          "value": "29600m"
        },
        {
          "name": "p90_response_latency",
          "unit": "ms",
          "value": "26"
        },
        {
          "name": "p50_response_latency",
          "unit": "ms",
          "value": "13333m"
        },
        {
          "name": "success_count",
          "value": "11"
        },
        {
          "name": "failure_count",
          "value": "0"
        }
      ]
    }
  ]
}

Supported Metrics Endpoints

NGINX Service Mesh supports the SMI-spec endpoints shown in the table below.

Endpoint Description
/apis/metrics.smi-spec.io/v1alpha1/ gets supported resource kinds
…/namespaces retrieves generic information for traffic between namespaces
…/namespaces/{namespace} retrieves generic information for traffic to and from a particular namespace
…/namespaces/{namespace}/edges retrieves specific information for traffic to and from a particular namespace
…/namespaces/{namespace}/{resourceKind} retrieves generic information for traffic between resourceKind types
…/namespace/{namespace}/{resourceKind}/{resourceName} retrieves generic information for traffic between named resource of specified kind.
…/namespace/{namespace}/{resourceKind}/{resourceName}/edges retrieves specific edge information to and from named resource of specified kind.