Technical Specifications

The following document outlines the software versions and overhead NGINX Service Mesh uses while running.

Software Versions

NGINX Service Mesh requires Kubernetes version 1.16 or newer.

The following table lists the software versions NGINX Service Mesh uses by default.

NGINX Service Mesh Version NGINX Plus (sidecar) Spire Version NATS Prometheus Grafana * Jaeger (Default) * Zipkin *
v1.0 R23 0.12.1 nats:2.1.8-alpine3.11 prom/prometheus:v2.20.1 grafana/grafana:7.5.3 jaegertracing/all-in-one:1.19.2 openzipkin/zipkin:2.21

* - Software not required by NGINX Service Mesh. See Monitoring and Tracing for details on disabling.

We run a series of automated tests frequently to ensure the stability and reliability of our mesh. For small to medium deployments (less than 100 pods), we recommend a minimum cluster environment similar to the ones we use in our tests:

Environment Machine Type Number of Nodes
GKE n2-standard-4 (4 vCPU, 16GB) 3
AKS Standard_D4s_v3 (4 vCPU, 16GiB) 3
EKS t3.xlarge (4 vCPU, 16GiB) 3
AWS t3.xlarge (4 vCPU, 16GiB) 1 Control, 3 Workers

Overhead

The overhead of NGINX Service Mesh varies depending on the component in the mesh and the kinds of resources currently deployed. Because the control plane is responsible for holding the state of all managed resources, it will scale linearly with the number of resources being handled - whether those are pods, services, traffic splits, or any other resource under the lens of NGINX Service Mesh. Spire specifically watches for new workloads, which reside 1:1 in every pod deployed, thus will scale as more pods are added to the mesh.

The data plane sidecar must keep track of the other services in the mesh as well as any traffic policies that are associated with it. Therefore, the resource load will increase as a function of the number of services and traffic policies in the mesh. In an attempt to balance the stress on the cluster, we run a nightly test which flexes the most critical components of the mesh. Below are the details of this test, so you may get an idea of the overhead each component is responsible for and size your own cluster accordingly.

Stress Test Overhead

Cluster Information:

  • Environment: GKE
  • Node Type: n2-standard-4 (4 vCPU, 16GB)
  • Number of nodes: 3
  • Kubernetes Version: 1.18.16

Metrics were gathered using the Kubernetes Metrics API. CPU is calculated in terms of the number of cpu units, where one cpu is equivalent to 1 vCPU/Core. For more information on the metrics API and how the data is recorded, see The Metrics API documentation.

CPU

Num Services Control Plane (without metrics and tracing) Control Plane Total Average Sidecar
10 (20 pods) CPU: 0.075 vCPU CPU: 0.095 vCPU CPU: 0.033 vCPU
50 (100 pods) CPU: 0.097 vCPU CPU: 0.431 vCPU CPU: 0.075 vCPU
100 (200 pods) CPU: 0.148 vCPU CPU: 0.233 vCPU CPU: 0.050 vCPU

Memory

Num Services Control Plane (without metrics and tracing) Control Plane Total Average Sidecar
10 (20 pods) Memory: 168.766 MiB Memory: 767.500 MiB Memory: 33.380 MiB
50 (100 pods) Memory: 215.289 MiB Memory: 2347.258 MiB Memory: 38.542 MiB
100 (200 pods) Memory: 272.305 MiB Memory: 4973.992 MiB Memory: 52.946 MiB

Disk Usage

Spire uses a persistent volume to make restarts more seamless. NGINX Service Mesh automatically allocates a 1 Gibibyte persistent volume in supported environments (see our Persistent Storage setup page for environment requirements). Below is the information on the disk usage within that volume. Note the the number of pods are used in this metric because disk usage scales directly with the number of pods in the mesh.

Num Pods Disk Usage
20 4.2 MB
100 4.3 MB
200 4.6 MB