End of Sale Notice:
Commercial support for NGINX Service Mesh is available to customers who currently have active NGINX Microservices Bundle subscriptions. F5 NGINX announced the End of Sale (EoS) for the NGINX Microservices Bundles as of July 1, 2023.
See our End of Sale announcement for more details.
Secure Mesh Traffic using mTLS
Learn about the mTLS options available in F5 NGINX Service Mesh.
Overview
TLS authentication is ubiquitous. Because of the baseline level of security TLS provides the client when connecting to an unknown host, and the low barrier to entry created by the advent of services like Let’s Encrypt, TLS has become table stakes for any moderately reputable website. In a microservices, multi-tenant Kubernetes environment, it is no longer sufficient for a client to authenticate a server’s signature. Clients may be compromised, and in a tightly controlled environment such as a service mesh, it is paramount that clients get vetted to ensure they should be making requests to any particular server. F5 NGINX Service Mesh does this authentication through mTLS, and provides the ability to define Access Control policies for the additional authorization piece needed to properly grant access to an incoming request from a given client. This document details the steps required to enable mTLS in your cluster using NGINX Service Mesh.
Important:
NGINX Service Mesh does not support mTLS communication with UDP at this time. UDP datagrams are currently proxied over plaintext.
Within the mTLS umbrella, NGINX Service Mesh allows for some level of configurability. This allows the flexibility to better support testing, development, and production environments. The options available are:
-
off
: Disables mTLS between injected pods, and allows communication from any source or destination over plaintext. Suitable only for development environments. -
permissive
(default): Enables mTLS communication between injected pods, but also allows plaintext communication where mTLS cannot be established. While this provides flexibility in communicating to services outside of the mesh, it also means pods remain open for potential bad actors from unverified sources to gain access to an internal endpoint.Permissive mode can be appealing when first experimenting with NGINX Service Mesh because of the ease of setup in deploying your application, but we strongly suggest that you move to mTLS
strict
mode when evaluating NGINX Service Mesh for production scenarios. -
strict
: Production ready. All traffic between pods is encrypted, and only traffic destined for injected pods is supported. All other outgoing and incoming traffic is denied at the sidecar. See Sidecar Proxy Injection for more information on properly injecting your applications for use within the mesh.If you need to route traffic to a non-meshed service in a
strict
environment, see our guide on using the NGINX Ingress Controller for egress traffic. This can be useful when migrating legacy applications. Also, see Deploy with NGINX Plus Ingress Controller for information on how to get external traffic routed securely to resources managed by NGINX Service Mesh.Important:
Due to how tracing is set up within NGINX Service Mesh, mTLSstrict
mode does not support tracing originating from the application itself. Each mesh sidecar automatically logs request information and aggregates that information in the configured tracing application. See Monitoring and Tracing for more information on how tracing is set up within NGINX Service Mesh.
All Kubernetes Resources that use the NGINX Service Mesh sidecar proxy inherit their mTLS settings from the global configuration. You can override the global setting for individual Resources if needed. Refer to Change the mTLS Settings for a Resource for instructions.
Usage
Enable mTLS
When deploying NGINX Service Mesh with mTLS enabled, you can opt to use permissive
or strict
mode. The default setting for mTLS is permissive
.
Caution:
Using permissive mode is not recommended for production deployments.
To enable mTLS, specify the --mtls-mode
flag with the desired setting when deploying NGINX Service Mesh. For example:
nginx-meshctl deploy ... --mtls-mode strict
Deploy Using an Upstream Root CA
By default, deployments with mTLS enabled use a self-signed root certificate. For testing and evaluation purposes this is acceptable, but for production deployments you should use a proper Public Key Infrastructure (PKI).
SPIRE uses a mechanism called “Upstream Authority” to interface with PKI systems. In order to use an upstream authority, a user must provide the proper configuration and credentials so that SPIRE may interface with the upstream and obtain the pertinent certificates.
In order to use a proper PKI, you must first choose one of the upstream authorities NGINX Service Mesh supports:
-
disk: Requires certificates and private key be on disk.
-
Template: disk.yaml
-
The minimal configuration to successfully deploy the mesh using the
disk
upstream authority looks like this:apiVersion: v1 upstreamAuthority: disk config: cert_file_path: /path/to/rootCA.crt key_file_path: /path/to/rootCA.key
-
-
aws_pca: Uses Amazon Certificate Manager Private Certificate Authority (ACM PCA) to manage certificates.
-
Template: aws_pca.yaml
-
Here is the minimal configuration to deploy the mesh using the
aws_pca
upstream authority:apiVersion: "v1" upstreamAuthority: "aws_pca" config: region: "us-west-2" certificate_authority_arn: "arn:aws:acm-pca::123456789012:certificate-authority/test"
Important:
This configuration assumes that the SPIRE server has access to your certificate authority in ACM PCA. See below for details on how to configure access.In order to use the
aws_pca
plugin, you need to give the SPIRE server access to your ACM PCA certificate authority.If you’d like NGINX Service Mesh to configure authentication on your behalf, you must specify your AWS Access Key ID and AWS Secret Key in the
aws_pca
config file. NGINX Service Mesh will create an AWS shared credentials file with these credentials, encode and store this file in a Kubernetes Secret, and then mount the secret to~/.aws/credentials
on the SPIRE server Pod(s).If you would like the SPIRE server to use an IAM role to access your certificate authority, make sure your role contains the policy described in the SPIRE
aws_pca
documentation, and do one of the following:- Attach the IAM role to your EC2 instances where NGINX Service Mesh is running.
- Tell SPIRE to assume the IAM role by specifying the role ARN in the
assume_role_arn
field in theaws_pca
config file.
Note:
The SPIRE server will need permission to assume this IAM role. Either attach an IAM role to the EC2 instance with the capability to assume the ACM PCA IAM role, or provide your AWS credentials in theaws_pca
config file. -
-
awssecret: Loads CA credentials from AWS SecretsManager.
-
Template: awssecret.yaml
-
Here is the minimal configuration to deploy the mesh using the
awssecret
upstream authority:apiVersion: "v1" upstreamAuthority: "awssecret" config: region: "us-west-2" cert_file_arn: "arn:aws:acm-pca::123456789012:certificate-authority/test" key_file_arn: "arn:aws:acm-pca::123456789012:certificate-authority/test-key"
Important:
AWS credentials may be necessary depending on your situation. View the SPIRE guide. -
-
vault: Uses Vault PKI Engine to manage certificates.
- Template: vault.yaml
-
cert-manager: Uses an instance of
cert-manager
running in Kubernetes to request intermediate signing certificates for SPIRE server.-
Template: cert-manager.yaml
-
Here is the minimal configuration to deploy the mesh using the
cert-manager
upstream authority:apiVersion: "v1" upstreamAuthority: "cert-manager" config: namespace: "my-cert-namespace" issuer_name: "my-spire-issuer-name"
-
For a production deployment, you should provide the following:
rootCA.crt
- A root CA certificaterootCA.key
- A root CA certificate keyintermediateCA.crt
- An intermediate CA certificate (optional)intermediateCA.key
- An intermediate CA certificate key (optional)
For a production deployment, you should use an intermediate CA certificate instead of using the root CA certificate directly. In this case, you would specify the root CA certificate using the appropriate option for the upstream authority:
- disk:
bundle_file_path
- aws_pca:
supplemental_bundle_path
This keeps the root CA key secure because it adds the certificate, not the key itself, to the chain. The upstream bundle may contain multiple intermediate certificates, all the way up to the root CA.
For example, a production deployment using the disk
upstream authority will look something like this:
apiVersion: "v1"
upstreamAuthority: "disk"
config:
cert_file_path: "/path/to/intermediateCA.crt"
key_file_path: "/path/to/intermediateCA.key"
bundle_file_path: "/path/to/rootCA.crt"
To deploy using one of these upstream authorities, you must specify the --mtls-upstream-ca-conf
flag:
nginx-meshctl deploy ... --mtls-upstream-ca-conf /path/to/upstream_authority.yaml
To find out more about how nginx-meshctl
interprets the upstream authority configuration, refer to the Upstream CA Validation JSON schema
Pathlen
x509 certificates have a pathlen field that is used to limit the number of intermediate certificates in between the current certificate and the final endpoint certificate, not including the endpoint certificate.
SPIRE creates a certificate for itself using the intermediate certificate passed in using the arguments defined above, so the pathlen
must be either set to 1 or unset. For the root certificate, the pathlen must be at least 2, or unset.
Choose a SPIRE Key Manager Plugin
SPIRE maintains a set of keys to sign certificates. NGINX Service Mesh supports two methods of storing those keys: disk
and memory
.
-
disk
(default): Signing keys are kept on disk and recoverable in the case of a SPIRE server restart, but keys are vulnerable due to being kept on disk.Note:
Thedisk
key manager plugin only maintains the integrity of the SPIRE CA if persistent storage is being used. For most environments, persistent storage will be deployed by default. See Persistent Storage setup page for more information on configuring persistent storage in your environment. -
memory
: Maintains the set of signing keys in memory and out of reach from bad actors should they gain access to your SPIRE server, but keys are lost on SPIRE server restart.We recommend that you only use the
memory
key manager plugin when you are using an upstream CA. Otherwise, when the SPIRE Server restarts due to a failure, all agents must be manually restarted and all workload certificates must be re-minted and re-distributed - causing unnecessary load on your resources and a potential disruption to workload communication.
There are benefits and drawbacks to both key manager plugins, but we recommend using the memory
key manager alongside an upstream CA for a more secure production experience. When paired with an upstream CA, the drawbacks of the memory
key manager can be eliminated. For more information on productionizing your deployment, see our Production Tuning guide.
Change the mTLS Setting for a Resource
NGINX Service Mesh provides you the ability to modify the global mTLS setting on a per-resource basis. This allows you to patch individual resources with mTLS strict
mode as you begin to properly secure your application.
When configuring mTLS for resources, if your global mTLS mode is strict
, you will not be able to modify the mode on a per resource basis. The reason for this is that we want to push users towards the most secure deployment possible when evaluating mTLS strict
mode and production. Also if an admin configures strict
mTLS mode globally, it will prevent the Application Developer persona from modifying NGINX Service Mesh’s security settings on an ad hoc basis and potentially introducing security holes. We do provide the ability to communicate with non-meshed services using the NGINX Ingress Controller for egress traffic. If not all of your application components are ready for strict
mode, we encourage the use of permissive
mode and a non-production environment.
Important:
If the global mTLS value is set tostrict
, then the annotation value will be ignored.
To override the global mTLS setting for a specific resource, add an annotation to the resource’s PodTemplateSpec. For example:
config.nsm.nginx.com/mtls-mode: "strict"
Disable mTLS
To disable mTLS globally, specify the --mtls-mode off
flag when deploying NGINX Service Mesh. For example:
nginx-meshctl deploy ... --mtls-mode off
To disable mTLS for a specific resource, add the following annotation to the resource’s PodTemplateSpec:
config.nsm.nginx.com/mtls-mode: "off"
Note:
Refer to NGINX Service Mesh Annotations for more information.
Verify Deployment
NGINX Service Mesh deploys additional pods in the configured control plane namespace (default nginx-mesh
) for the SPIRE Server and Agent.
To verify deployment, check whether or not the SPIRE Pods are running. You should have a single Server Pod and an Agent Pod for each Kubernetes Node.
kubectl get pods -n nginx-mesh
NAME READY STATUS RESTARTS AGE
...
spire-agent-mb9jv 1/1 Running 0 24h
spire-server-0 2/2 Running 0 24h
...
Verify Encryption by Using an Example Service
We’ll use the Istio bookinfo
example to test that traffic is, in fact, encrypted with mTLS enabled.
-
Enable automatic sidecar injection for the
default
namespace. -
Deploy the
bookinfo
application:kubectl apply -f bookinfo.yaml
-
To access
bookinfo
, set up port-forwarding:kubectl port-forward svc/productpage 9080
-
Finally, navigate to
http://localhost:9080
in a browser. On the front side, it uses clear text. All of the service-to-service calls will be SSL-encrypted.
Debug mTLS Issues
Not all MTLS misconfiguration errors can be caught when the configuration is loaded. For example, NGINX will not detect if the certificate expires during operation. NGINX responds to requests with invalid certificates with a 400 Bad Request
error. Debugging information is provided in the error log at the info
level.
Refer to logging for information about changing the log level.
Update mTLS settings after deployment
The following mTLS settings can be changed after the mesh has been deployed:
- mode
- caTTL
- svidTTL
- caKeyType
Important:
When updating
caTTL
,svidTTL
, orcaKeyType
, the SPIRE server will be restarted and brief downtime should be expected. Certificates cannot be given out during this time, so it is advised to not create any applications until the server has fully restarted. Any existing certificates will live until their expirations, but all new certificates will use the updated mTLS settings. Workload certificates can be forced to be updated by re-rolling the workload Pods.If using persistent storage–which is the default and recommended setting when deploying the mesh–, updated
caTTL
andcaKeyType
fields will not take effect until the original root CA certificate expires. The expiry time is based on the originalcaTTL
value. If you want to change these values immediately, then the fastest–but most disruptive–way to do this is to remove and redeploy NGINX Service Mesh.
See the API Usage Guide for instructions on how to update the mTLS settings using the REST API.
What Next
With mTLS properly set up within your service mesh, it is important to set up authorization to properly verify incoming connections.
See Access Control policies for how to define authorization within your application.