Troubleshoot NGINX Controller and the Controller Agent

Overview

If NGINX isn’t behaving how you expect, you can take the following steps to troubleshoot issues. If you need to contact NGINX Support, make sure to create a support package first.

Create a Support Package

You can create a support package for NGINX Controller that you can use to diagnose issues.

Note:
You will need to provide a support package if you open a ticket with NGINX Support via the MyF5 Customer Portal.

Take the following steps to create a support package:

  1. Open a secure shell (SSH) connection to the NGINX Controller host and log in as an administrator.

  2. Run the helper.sh utility with the supportpkg option:

    /opt/nginx-controller/helper.sh supportpkg

    The support package is saved to:

    /var/tmp/supportpkg-<timestamp>.tar.gz

    For example:

    /var/tmp/supportpkg-20200127T063000PST.tar.gz

  3. Run the following command on the machine where you want to download the support package to:

    scp <username>@<controller-host-ip>:/var/tmp/supportpkg-<timestamp>.tar.gz /local/path

Support Package Details

The support package is a tarball that includes NGINX Controller configuration information, logs, and system command output. Sensitive information, including certificate keys, is not included in the support package.

The support package gathers information from the following locations:

.
├── database
│   ├── common.dump                                - full dump of the common database
│   ├── common.dump_stderr                         - any errors when dumping the database
│   ├── common-apimgmt-api-client-api-keys.txt     - contents of apimgmt_api_client_api_keys table from the common database
│   ├── common-apimgmt-api-client-groups.txt       - contents of apimgmt_api_client_groups table from the common database
│   ├── common-email-verification.txt              - contents of email_verification table from the common database
│   ├── common-oauth-clients.txt                   - contents of oauth_clients table from the common database
│   ├── common-settings-license.txt                - contents of settings_license table from the common database
│   ├── common-settings-nginx-plus.txt             - contents of settings_nginx_plus table from the common database
│   ├── common-table-size.txt                      - list of all tables and their size in the common database
│   ├── data-table-size.txt                        - list of all tables and their size in the data database
│   ├── postgres-database-size.txt                 - size of every database
│   ├── postgres-long-running-queries.txt          - all queries running longer than 10 seconds
│   ├── system.dump                                - full dump of the system database
│   ├── system-account-limits.txt                  - contents of account_limits table from the system database
│   ├── system-accounts.txt                        - contents of accounts table from the system database
│   ├── system-deleted-accounts.txt                - contents of deleted_accounts table from the system database
│   ├── system-deleted-users.txt                   - contents of deleted_users table from the system database
│   ├── system-users.txt                           - contents of users table from the system database
│   └── system-table-size.txt                      - list of all tables and their size in the system database
├── k8s                                            - output of `kubectl cluster-info dump -o yaml` augmented with some extra info
│   ├── apiservices.txt                            - output of `kubectl get apiservice`
│   ├── kube-system                                - contents of the kube-system namespace
│   │   ├── coredns-5c98db65d4-6flb9
│   │   │   ├── desc.txt                           - pod description
│   │   │   ├── logs.txt                           - current logs
│   │   │   └── previous-logs.txt                  - previous logs, if any
│   │   ├── ...
│   │   ├── daemonsets.yaml                        - list of daemonsets
│   │   ├── deployments.yaml                       - list of deployments
│   │   ├── events.yaml                            - all events in this namespace
│   │   ├── namespace.yaml                         - details of the namespace, including finalizers
│   │   ├── pods.txt                               - output of `kubectl get pods --show-kind=true -o wide`
│   │   ├── pods.yaml                              - list of all pods
│   │   ├── replicasets.yaml                       - list of replicasets
│   │   ├── replication-controllers.yaml           - list of replication controllers
│   │   ├── resources.txt                          - all Kubernetes resources in this namespace
│   │   └── services.yaml                          - list of services
│   ├── nginx-controller                           - contents of the nginx-controller namespace
│   │   ├── apigw-8fb64f768-9qwcm
│   │   │   ├── desc.txt                           - pod description
│   │   │   ├── logs.txt                           - current logs
│   │   │   └── previous-logs.txt                  - previous logs, if any
│   │   ├── ...
│   │   ├── daemonsets.yaml                        - list of daemonsets
│   │   ├── deployments.yaml                       - list of deployments
│   │   ├── events.yaml                            - all events in this namespace
│   │   ├── namespace.yaml                         - details of the namespace, including finalizers
│   │   ├── pods.txt                               - output of `kubectl get pods --show-kind=true -o wide`
│   │   ├── pods.yaml                              - list of all pods
│   │   ├── replicasets.yaml                       - list of replicasets
│   │   ├── replication-controllers.yaml           - list of replication controllers
│   │   ├── resources.txt                          - all Kubernetes resources in this namespace
│   │   ├── services.yaml                          - list of services
│   ├── nodes.txt                                  - output of `kubectl describe nodes`
│   ├── nodes.yaml                                 - list of nodes
│   ├── resources.txt                              - all non-namespaced Kubernetes resources (including PersistentVolumes)
│   └── version.yaml                               - Kubernetes version
├── logs                                           - copy of /var/log/nginx-controller/
│   └── nginx-controller-install.log
├── os
│   ├── cpuinfo.txt                                - output of `cat /proc/cpuinfo`
│   ├── df-h.txt                                   - output of `df -h`
│   ├── df-i.txt                                   - output of `df -i`
│   ├── docker-container-ps.txt                    - output of `docker container ps`
│   ├── docker-images.txt                          - output of `docker images`
│   ├── docker-info.txt                            - output of `docker info`
│   ├── docker-stats.txt                           - output of `docker stats --all --no-stream`
│   ├── docker-version.txt                         - output of `docker version`
│   ├── du-mcs.txt                                 - output of `du -mcs /opt/nginx-controller/* /var/log /var/lib`
│   ├── env.txt                                    - output of `env`
│   ├── firewall-cmd.txt                           - output of `firewall-cmd --list-all`
│   ├── free.txt                                   - output of `free -m`
│   ├── hostname-all-fqdns.txt                     - output of `hostname --all-fqdns`
│   ├── hostname-fqdn.txt                          - output of `hostname --fqdn`
│   ├── hostname.txt                               - output of `hostname`
│   ├── hostsfile.txt                              - output of `cat /etc/hosts`
│   ├── ip-address.txt                             - output of `ip address`
│   ├── ip-neigh.txt                               - output of `ip neigh`
│   ├── ip-route.txt                               - output of `ip route`
│   ├── iptables-filter.txt                        - output of `iptables -L -n -v`
│   ├── iptables-mangle.txt                        - output of `iptables -L -n -v -t mangle`
│   ├── iptables-nat.txt                           - output of `iptables -L -n -v -t nat`
│   ├── iptables-save.txt                          - output of `iptables-save`
│   ├── journal-kubelet.txt                        - output of `journalctl -q -u kubelet --no-pager`
│   ├── lspci.txt                                  - output of `lspci -vvv`
│   ├── netstat-nr.txt                             - output of `netstat -nr`
│   ├── ps-faux.txt                                - output of `ps faux`
│   ├── pstree.txt                                 - output of `pstree`
│   ├── ps.txt                                     - output of `ps aux --sort=-%mem`
│   ├── resolvconf.txt                             - output of `cat /etc/resolv.conf`
│   ├── selinux-mode.txt                           - output of `getenforce`
│   ├── ss-ltunp.txt                               - output of `ss -ltunp`
│   ├── swapon.txt                                 - output of `swapon -s`
│   ├── sysctl.txt                                 - output of `sysctl -a --ignore`
│   ├── systemd.txt                                - output of `journalctl -q --utc`
│   ├── top.txt                                    - output of `top -b -o +%CPU -n 3 -d 1 -w512 -c`
│   ├── uname.txt                                  - output of `uname -a`
│   ├── uptime.txt                                 - output of `cat /proc/uptime`
│   └── vmstat.txt                                 - output of `cat /proc/vmstat`
├── timeseries
│   ├── table-sizes.stat                           - stat table containing controller table sizes
│   ├── events.csv                                 - events table dump in csv
│   ├── events.sql                                 - events table schema
│   ├── metrics_1day.csv                           - metrics_1day table dump in csv
│   ├── metrics_1day.sql                           - metrics_1day table schema
│   ├── metrics_1hour.csv                          - metrics_1hour table dump in csv
│   ├── metrics_1hour.sql                          - metrics_1hour table schema
│   ├── metrics_5min.csv                           - metrics_5min table dump in csv
│   ├── metrics_5min.sql                           - metrics_5min table schema
│   ├── metrics.csv                                - metrics table dump in csv
│   ├── metrics.sql                                - metrics table schema
│   ├── system-asynchronous-metrics.stat           - shows info about currently executing events or consuming resources
│   ├── system-events.stat                         - information about the number of events that have occurred in the system
│   ├── system-metrics.stat                        - system metrics
│   ├── system-parts.stat                          - information about parts of a table in the MergeTree family
│   ├── system-settings.stat                       - information about settings that are currently in use
│   └── system-tables.stat                         - information about all the tables
└── version.txt                                    - Controller version information

Controller Agent Install Script Failed to Download

When deploying an NGINX Plus instance, the deployment may fail because the Controller Agent install script doesn’t download. When this happens, an error similar to the following is logged to /var/log/agent_install.log: “Failed to download the install script for the agent.”

Take the following steps to troubleshoot the issue:

  • Ensure that ports 443 and 8443 are open between NGINX Controller and the network where the NGINX Plus instance is being deployed.
  • Verify that you can communicate with NGINX Controller from the NGINX Plus instance using the NGINX Controller FQDN that you provided when you installed NGINX Controller.
  • If you’re deploying an NGINX Plus instance on Amazon Web Services using a template, ensure that the Amazon Machine Image (AMI) referenced in the instance_template has a cURL version of 7.32 or newer.

Controller Agent Asks for Password

If the system asks you to provide a password when you’re installing the Controller Agent, the cause may be that you are starting the install script from a non-root account. If so, you need sudo rights. Depending on your system configuration, sudo may ask you for a password if you’re using a non-root account.

Controller Agent Isn’t Reporting Metrics

After you install and start the Controller Agent, it should begin reporting right away, pushing aggregated data to NGINX Controller at regular one-minute intervals. It takes about one minute for a new Instance to appear in the NGINX Controller user interface.

If you don’t see the new Instance in the user interface or the Controller Agent isn’t collecting metrics, make sure of the following:

  1. The Controller Agent package – nginx-controller-agent – installed successfully without any warnings.

  2. The controller-agent service is running and updating its log file. To check the status, run the following command:

    systemctl status controller-agent
    
    See Also:
    For troubleshooting purposes, you can turn on Controller Agent debug logging by editing the agent.conf file. For more information, refer to K64001240: Enabling NGINX Controller Agent debug logging.
  3. The system DNS resolver is correctly configured, and the NGINX Controller server’s fully qualified domain name (FQDN) can be resolved.

  4. The controller-agent service is running as root. To view the user ID for the controller-agent service, run the following command:

    ps -ef | egrep 'agent'
    

    The output looks similar to the following:

    root     19132     1  1 Sep03 ?        00:23:45 /usr/bin/nginx-controller-agent
    
  5. The NGINX instance is started with an absolute path. The Controller Agent cannot detect NGINX instances launched with a relative path. For example: ./nginx.

  6. The Controller Agent and the NGINX Instance user IDs can both run the ps command to see all the system processes. If the ps command is restricted for non-privileged users, the Controller Agent won’t be able to detect the NGINX master process.

  7. The system time is set correctly. If the time on the system where the Controller Agent is running is ahead or behind the NGINX Controller’s system time, you won’t be able to see data in graphs. Make sure that NGINX Controller and any NGINX Instances have their time synchronized using NTP.

  8. The NGINX Plus API is set up correctly and working.

    Refer to the Configuring the API section of the NGINX Plus Admin Guide for instructions.

  9. The nginx user (or the user set in NGINX config) has permission to read the NGINX access.log and error.log files.

  10. All NGINX configuration files are readable by the Controller Agent user ID. Check the owner, group, and permissions settings.

  11. Outbound TLS/SSL traffic from the system to NGINX Controller’s FQDN is not restricted. You can check this by using the curl command. Configure a proxy server for the Controller Agent if required.

  12. selinux, apparmor, or other third-party OS security tools are not interfering with the metrics collection. For example, for selinux, check /etc/selinux/config and try setenforce 0 temporarily to see if it improves the situation for certain metrics.

  13. The virtual private server (VPS) provider has not used hardened Linux kernels that may restrict non-root users from accessing /proc and /sys. Metrics describing the system and NGINX disk I/O are usually affected. There is no easy workaround for this except to allow the Controller Agent to run as root. Fixing permissions for /proc and /sys/block may also help.

See Also:

For more information on installing and configuring the Controller Agent, see the following topics:


This documentation applies to the following versions of NGINX Controller Documentation:
3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10 and 3.11.