Troubleshoot pod traffic with nsenter and tcpdump tools

Problem Statement

When you deploy a pod and the pod communicates with an internal or external network component, and for some reason, it fails, taking a look at the logs could help; however, sometimes you would like to see more details of the payload. If there is a TCP handshake, DNS resolution, or just to understand the network flow, then you think: "Can I run tcpdump to see where the traffic is going?". The next step is when you try to go into the pod and start running commands from a shell, but you find out that there is no shell at all, tools are not installed in the pod, or there are not enough privileges to do so. For example, the following commands illustrate the errors I mentioned:

$ oc -n my-ns get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP            NODE
pod-5cf6f6898d-bdcqx   2/2     Running   0          83s   10.129.2.20   worker-1

# strike one
$ oc -n my-ns exec pod-5cf6f6898d-bdcqx -c helper -it -- ip a
ERRO[0000] exec failed: unable to start container process: exec: "ip": executable file not found in $PATH
command terminated with exit code 255

# strike two
$ oc -n my-ns exec pod-5cf6f6898d-bdcqx -c helper -it -- /bin/sh
ERRO[0000] exec failed: unable to start container process: exec: "/bin/sh": permission denied
command terminated with exit code 255

# strike three :(
$ oc -n my-ns exec pod-5cf6f6898d-bdcqx -c helper -it -- /bin/bash
ERRO[0000] exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory

The logs show a problem reaching my-service.my-ns.svc.cluster.local service, and the "no such host" leads to a DNS problem, but can we see more from the network stack? We could check the container logs to see if more relevant information exists.

$ oc -n my-ns logs pod-5cf6f6898d-bdcqx -c helper --tail 10 -f
I1117 00:48:24.075414       1 cm20.go:431] DEBUG: exchange name
E1117 00:48:24.097314       1 rabbitmqapi.go:79] Failed to establish connection to my-service. Error dial tcp: lookup my-service.my-ns.svc.cluster.local: no such host
E1117 00:48:24.097329       1 rabbitmqapi.go:89] error: failed to open connection
E1117 00:48:24.097334       1 rabbitmqapi.go:105] error: failed to open channel
I1117 00:48:24.097339       1 cm20.go:436] error: failed to setup exchange
E1117 00:48:24.097350       1 rabbitmq_handler.go:38] Failed to create AMQP. Error Failed to Setup exchange: Failed to open channel

Another option could be to try to perform the tcpdump capture on the other side (the DNS server) but what options are left if that is not allowed either?

nsenter to the rescue

nsenter is part of util-linux-core basic utilities and is available in CoreOS. This command allows to run commands inside namespaces:

[core@worker-1 ~]$ rpm -qf $(which nsenter)
util-linux-2.32.1-27.el8.x86_64

If you have access to the worker nodes via SSH, it's simple to use with sudo privileges, let's see an example to access the pod from the example above, located in worker-1.

First, login to the worker node where the pod is running, and from there, look for the pod you want to check.

$ ssh core@worker-1
[core@worker-1 ~]$ sudo crictl ps | grep helper
2279b2951ca2f    17 minutes ago    Running    helper    0    db415107da15b

Then get the pod ID and inspect the pod content to retrieve the process ID (pid).

[core@worker-1 ~]$ sudo crictl inspect --output yaml 2279b2951ca2f | grep 'pid' | awk '{print $2}'
98611

Use nsenter and specify the pid with -t, and pass commands available in the host server to run inside the pod.

[core@worker-1 ~]$ sudo nsenter -n -t 98611 -- ip a
...
3: eth0@if67:  mtu 8900 qdisc noqueue state UP group default
    link/ether 0a:58:0a:82:02:2a brd ff:ff:ff:ff:ff:ff link-netns 2b099085-6bc1-46a1-b31f-dab424e9afa3
    inet 10.130.2.42/23 brd 10.130.3.255 scope global eth0
    valid_lft forever preferred_lft forever
    inet6 fd02:0:0:7::2a/64 scope global
        valid_lft forever preferred_lft forever
    inet6 fe80::858:aff:fe82:22a/64 scope link
        valid_lft forever preferred_lft forever

If access to SSH is restricted, there is an alternative by using the oc debug ... command to get access to the worker console, use --image ... option in a disconnected environment and specify the registry to use in /root/.toolboxrc.

$ oc debug node/worker-1 --image=registry.example.com:4443/rhel8/support-tools
...
sh-4.4# chroot /host
sh-4.4# vi /root/.toolboxrc
REGISTRY=registry.example.com:4443

sh-4.4# toolbox
[root@worker-1 /]#

Then follow similar steps, look for the pod, inspect it and grep the PID. Here crictl commands need to specify chroot /host before, and not all commands are available since we are running inside a toolbox image.

[root@worker-1 /]# chroot /host crictl ps
[root@worker-1 /]# chroot /host crictl inspect --output yaml 2279b2951ca2f | grep 'pid' | awk '{print $2}'
[root@worker-1 /]# nsenter -n -t 98611 -- ip a

Get your tcpdump filter skills ready

To use tcpdump we'll require to login to the worker node using the oc debug ... command, since CoreOS has no tcpdump command installed, even in connected environments, the only difference is that you might not have to specify the registry and the image to use.

Once logged into the toolbox, you can run nsenter and, this time, pass the tcpdump command, with the interface we listed previously. Storing the output generated by tcpdump is optional, just use a location where you have privileges to write, in this case, /host/var/tmp/, and finally, you can leave your creativity and knowledge of tcpdump unleash to specify filters and inspect the traffic you want.

[root@worker-1 /]# nsenter -n -t 98611 -- tcpdump -nn -i eth0 -w /host/var/tmp/helper_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcap udp port 53
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

In this capture we can see more details of the communication between the helper container pod and the RabbitMQ service, we can see how it tries both IPv4 and IPv6 and the iteration of the domains until all possibilities are tried and the record is not found. It confirms it can reach the DNS service, the source and destination IP address, but the name is not found. Helpful to discard certain issues.

00:51:38.680894 IP 10.129.2.20.33212 > 172.30.0.10.53: 20408+ A? my-service.my-ns.svc.cluster.local.my-ns.svc.cluster.local. (93)
00:51:38.680900 IP 10.129.2.20.33212 > 172.30.0.10.53: 20748+ AAAA? my-service.my-ns.svc.cluster.local.my-ns.svc.cluster.local. (93)
00:51:38.681427 IP 172.30.0.10.53 > 10.129.2.20.33212: 20748 NXDomain*- 0/1/1 (197)
00:51:38.681441 IP 172.30.0.10.53 > 10.129.2.20.33212: 20408 NXDomain*- 0/1/1 (197)
00:51:38.681465 IP 10.129.2.20.41183 > 172.30.0.10.53: 22208+ A? my-service.my-ns.svc.cluster.local.svc.cluster.local. (83)
00:51:38.681468 IP 10.129.2.20.41183 > 172.30.0.10.53: 22453+ AAAA? my-service.my-ns.svc.cluster.local.svc.cluster.local. (83)
00:51:38.681923 IP 172.30.0.10.53 > 10.129.2.20.41183: 22453 NXDomain*- 0/1/1 (187)
00:51:38.681941 IP 172.30.0.10.53 > 10.129.2.20.41183: 22208 NXDomain*- 0/1/1 (187)
00:51:38.681954 IP 10.129.2.20.44139 > 172.30.0.10.53: 54593+ A? my-service.my-ns.svc.cluster.local.cluster.local. (79)
00:51:38.681956 IP 10.129.2.20.44139 > 172.30.0.10.53: 54814+ AAAA? my-service.my-ns.svc.cluster.local.cluster.local. (79)
00:51:38.682265 IP 172.30.0.10.53 > 10.129.2.20.44139: 54593 NXDomain*- 0/1/1 (183)
00:51:38.682284 IP 172.30.0.10.53 > 10.129.2.20.44139: 54814 NXDomain*- 0/1/1 (183)
00:51:38.682297 IP 10.129.2.20.52085 > 172.30.0.10.53: 3660+ A? my-service.my-ns.svc.cluster.local.cluster4.example.com. (85)
00:51:38.682299 IP 10.129.2.20.52085 > 172.30.0.10.53: 3879+ AAAA? my-service.my-ns.svc.cluster.local.cluster4.example.com. (85)

To be continued...

Only some things can be proved with tcpdump captures, but having the option to use them to confirm certain aspects of the communication between two resources is helpful. In the future, we'll talk more about network traffic flows, OVS, and how to find out where the traffic is getting lost.