Secure Mesh Traffic using mTLS
TLS authentication is ubiquitous. Because of the baseline level of security TLS provides the client when connecting to an unknown host, and the low barrier to entry created by the advent of services like Let’s Encrypt, TLS has become table stakes for any moderately reputable website. In a microservices, multi-tenant Kubernetes environment, it is no longer sufficient for a client to authenticate a server’s signature. Clients may be compromised, and in a tightly controlled environment such as a service mesh, it is paramount that clients get vetted to ensure they should be making requests to any particular server. NGINX Service Mesh does this authentication through mTLS, and provides the ability to define Access Control policies for the additional authorization piece needed to properly grant access to an incoming request from a given client. This document details the steps required to enable mTLS in your cluster using NGINX Service Mesh.
Within the mTLS umbrella, NGINX Service Mesh allows for some level of configurability. This allows the flexibility to better support testing, development, and production environments. The options available are:
off: Disables mTLS between injected pods, and allows communication from any source or destination over plaintext. Suitable only for development environments.
permissive(default): Enables mTLS communication between injected pods, but also allows plaintext communication where mTLS cannot be established. While this provides flexibility in communicating to services outside of the mesh, it also means pods remain open for potential bad actors from unverified sources to gain access to an internal endpoint.
Permissive mode can be appealing when first experimenting with NGINX Service Mesh because of the ease of setup in deploying your application, but we strongly suggest that you move to mTLS
strictmode when evaluating NGINX Service Mesh for production scenarios.
strict: Production ready. All traffic between pods is encrypted, and only traffic destined for injected pods is supported. All other outgoing and incoming traffic is denied at the sidecar. See Sidecar Proxy Injection for more information on properly injecting your applications for use within the mesh.
If you need to route traffic to a non-meshed service in a
strictenvironment, see our guide on using the NGINX Ingress Controller for egress traffic. This can be useful when migrating legacy applications. Also, see Deploy with NGINX Plus Ingress Controller for information on how to get external traffic routed securely to resources managed by NGINX Service Mesh.Important:
Due to how tracing is set up within NGINX Service Mesh, mTLS
strictmode does not support tracing originating from the application itself. Each mesh sidecar automatically logs request information and aggregates that information in the configured tracing application. See Monitoring and Tracing for more information on how tracing is set up within NGINX Service Mesh.
All Kubernetes Resources that use the NGINX Service Mesh sidecar proxy inherit their mTLS settings from the global configuration. You can override the global setting for individual Resources if needed. Refer to Change the mTLS Settings for a Resource for instructions.
When deploying NGINX Service Mesh with mTLS enabled, you can opt to use
strict mode. The default setting for mTLS is
Using permissive mode is not recommended for production deployments.
To enable mTLS, specify the
--mtls-mode flag with the desired setting when deploying NGINX Service Mesh. For example:
nginx-meshctl deploy ... --mtls-mode strict
By default, deployments with mTLS enabled use a self-signed root certificate. For testing and evaluation purposes this is acceptable, but for production deployments you should use a proper Public Key Infrastructure (PKI).
SPIRE uses a mechanism called “Upstream Authority” to interface with PKI systems. In order to use an upstream authority, a user must provide the proper configuration and credentials so that SPIRE may interface with the upstream and obtain the pertinent certificates.
In order to use a proper PKI, you must first choose one of the upstream authorities NGINX Service Mesh supports:
disk: Requires certificates and private key be on disk.
The minimal configuration to successfully deploy the mesh using the
diskupstream authority looks like this:
apiVersion: v1 upstreamAuthority: disk config: cert_file_path: /path/to/rootCA.crt key_file_path: /path/to/rootCA.key
aws_pca: Uses Amazon Private certificate authority to manage certificates.
Here is the minimal configuration to deploy the mesh using the
apiVersion: "v1" upstreamAuthority: "aws_pca" config: region: "us-west-2" certificate_authority_arn: "arn:aws:acm-pca::123456789012:certificate-authority/test" aws_access_key_id: "<ACCESS_KEY>" aws_secret_access_key: "<YOUR_SECRET_ACCESS_KEY>"
awssecret: Loads CA credentials from AWS SecretsManager.
Here is the minimal configuration to deploy the mesh using the
apiVersion: "v1" upstreamAuthority: "awssecret" config: region: "us-west-2" cert_file_arn: "arn:aws:acm-pca::123456789012:certificate-authority/test" key_file_arn: "arn:aws:acm-pca::123456789012:certificate-authority/test-key"
AWS credentials may be necessary depending on your situation. View the SPIRE guide.
vault: Uses Vault PKI Engine to manage certificates.
- Template: vault.yaml
For a production deployment, you should provide the following:
rootCA.crt- A root CA certificate
rootCA.key- A root CA certificate key
intermediateCA.crt- An intermediate CA certificate (optional)
intermediateCA.key- An intermediate CA certificate key (optional)
For a production deployment, you should use an intermediate CA certificate instead of using the root CA certificate directly. In this case, you would specify the root CA certificate using the appropriate option for the upstream authority:
This keeps the root CA key secure because it adds the certificate, not the key itself, to the chain. The upstream bundle may contain multiple intermediate certificates, all the way up to the root CA.
For example, a production deployment using the
disk upstream authority will look something like this:
apiVersion: "v1" upstreamAuthority: "disk" config: cert_file_path: "/path/to/intermediateCA.crt" key_file_path: "/path/to/intermediateCA.key" bundle_file_path: "/path/to/rootCA.crt"
To deploy using one of these upstream authorities, you must specify the
nginx-meshctl deploy ... --mtls-upstream-ca-conf /path/to/upstream_authority.yaml
To find out more about how
nginx-meshctl interprets the upstream authority configuration, refer to the
Upstream CA Validation JSON schema
x509 certificates have a pathlen field that is used to limit the number of intermediate certificates in between the current certificate and the final endpoint certificate, not including the endpoint certificate.
SPIRE creates a certificate for itself using the intermediate certificate passed in using the arguments defined above, so the
pathlen must be either set to 1 or unset. For the root certificate, the pathlen must be at least 2, or unset.
SPIRE maintains a set of keys that it uses to sign certificates. NGINX Service Mesh supports two methods of storing those keys:
disk(default): Signing keys are kept on disk and recoverable in the case of a SPIRE server restart, but keys are vulnerable due to being kept on disk.Note:
diskkey manager plugin only maintains the integrity of the SPIRE CA if persistent storage is being used. For most environments, persistent storage will be deployed by default, but see our Persistent Storage setup page for more information on configuring persistent storage in your environment.
memory: Maintains the set of signing keys in memory and out of reach from bad actors should they gain access to your SPIRE server, but keys are lost on SPIRE server restart.
We recommend that you only use the
memorykey manager plugin when you are using an upstream CA. Otherwise, when the SPIRE Server restarts due to a failure, all agents must be manually restarted and all workload certificates must be re-minted and re-distributed - causing unnecessary load on your resources and a potential disruption to workload communication.
There are benefits and drawbacks to both key manager plugins, but we recommend using the
memory key manager alongside an upstream CA for a more secure production experience. When paired with an upstream CA, the drawbacks of the
memory key manager can be eliminated. For more information on productionizing your deployment, see our Production Tuning guide.
NGINX Service Mesh provides you the ability to modify the global mTLS setting on a per-resource basis. This allows you to patch individual resources with mTLS
strict mode as you begin to properly secure your application.
When configuring mTLS for resources, if your global mTLS mode is
strict, you will not be able to modify the mode on a per resource basis. The reason for this is that we want to push users towards the most secure deployment possible when evaluating mTLS
strict mode and production. Also if an admin configures
strict mTLS mode globally, it will prevent the Application Developer persona from modifying NGINX Service Mesh’s security settings on an ad hoc basis and potentially introducing security holes. We do provide the ability to communicate with non-meshed services using the NGINX Ingress Controller for egress traffic. If not all of your application components are ready for
strict mode, we encourage the use of
permissive mode and a non-production environment.
If the global mTLS value is set to
strict, then the annotation value will be ignored.
To override the global mTLS setting for a specific resource, add an annotation to the Resource definition. For example:
To disable mTLS globally, specify the
--mtls-mode off flag when deploying NGINX Service Mesh. For example:
nginx-meshctl deploy ... --mtls-mode off
To disable mTLS for a specific Resource, add the following annotation to the Resource definition:
NGINX Service Mesh deploys additional pods in the configured control plane namespace (default
nginx-mesh) for the SPIRE Server and Agent.
To verify deployment, check whether or not the SPIRE Pods are running. You should have a single Server Pod and an Agent Pod for each Kubernetes Node.
kubectl get pods -n nginx-mesh NAME READY STATUS RESTARTS AGE ... spire-agent-mb9jv 1/1 Running 0 24h spire-server-0 2/2 Running 0 24h ...
We’ll use the Istio
bookinfo example to test that traffic is, in fact, encrypted with mTLS enabled.
First, deploy the
kubectl apply -f bookinfo.yaml
bookinfo, set up port-forwarding:
kubectl port-forward svc/productpage 9080
Finally, navigate to
http://localhost:9080in a browser. On the front side, it uses clear text. All of the service-to-service calls will be SSL-encrypted.
Not all MTLS misconfiguration errors can be caught when the configuration is loaded. For example, NGINX will not detect if the certificate expires during operation. NGINX responds to requests with invalid certificates with a
400 Bad Request error. Debugging information is provided in the error log at the
Refer to logging for information about changing the log level.
With mTLS properly set up within your service mesh, it is important to set up authorization to properly verify incoming connections.
See Access Control policies for how to define authorization within your application.