tarbots: kubernetes troubleshooting

Showing posts with label kubernetes troubleshooting. Show all posts

KUBERNETES: Operating System Commands To Validate Kubernetes Related OS Settings For Comparing and Troubleshooting Issues

This document provides useful Operating system level commands for validating following OS settings related to Kubernetes

firewalld
selinux
rpm packages list
iptables & iptable nat rules
system level settings - sysctl -a output
Operating system release and kernel versions
List of all the services on the operating system
All the repositories that are enabled and disabled
Swap settings
Loaded modules (modprobe settings)
Sudo users

Commands listed in this document will come in handy to troubleshoot Kubernetes issues where Kubernetes environment was working before and broken recently. For e.g. any issues post the OS patching, Kubernetes upgrade or other changes etc.

If there are plans to do any upgrade or maintenance, it is good to capture these outputs on the Kubernetes nodes before the upgrade and after the upgrade/maintenance. In case there are any issues after upgrade/maintenance you can capture the command outputs again and compare whats changed.

Following are the list of commands to use.

List the loaded modules

lsmod

List all the system level settings (sysctl settings)

sysctl -a

Capture the kernel version

uname -a

Capture Operating system release version.

cat /etc/*release*

Capture and save the current IP tables.

iptables-save

Capture the current IP tables NAT rules.

iptables -L -t nat -vn --line-number

List all the system level services which are enabled/disabled/stopped/started

systemctl list-unit-files

Validate SELinux status

sestatus

List all the firewalld rules

sudo firewall-cmd --list-all-zones

Validate swap settings to see if it is disabled

cat /proc/swaps

List all the Yum/DNF repositories which are enabled and disabled.

sudo dnf repolist all

List all the rpm packages that are installed and when they were last updated

rpm -qa --last

Get the list of all the users

getent passwd

Capture the current sshd config settings

cat /etc/ssh/sshd_config

Capture the current sudoers configuration file

cat /etc/sudoers

Capture the current sudoer users info

ls -lrt /etc/sudoers.d/*

Validate the current users which have sudoer permissions

getent passwd | cut -f1 -d: | sudo xargs -L1 sudo -l -U | grep -v 'not allowed'

Keywords

OS operating system systems OS command commands validating checking check compare comparison maintenance outage window changed changes changing troubleshoot debug debugging broke broken issues problem problems

KUBERNETES: KUBECTL Command To Check Health Status and Liveliness Probe (livez check) Of Kubernetes Components

Below kubectl command can be used on the Kubernetes Control plane node.

kubectl get --raw='/readyz?verbose'

Alternatively below curl command can also be used instead.

curl -k https://localhost:6443/livez?verbose

Below is sample output of above command.

[+]ping ok

[+]log ok

[+]etcd ok

[+]poststarthook/start-kube-apiserver-admission-initializer ok

[+]poststarthook/generic-apiserver-start-informers ok

[+]poststarthook/priority-and-fairness-config-consumer ok

[+]poststarthook/priority-and-fairness-filter ok

[+]poststarthook/storage-object-count-tracker-hook ok

[+]poststarthook/start-apiextensions-informers ok

[+]poststarthook/start-apiextensions-controllers ok

[+]poststarthook/crd-informer-synced ok

[+]poststarthook/start-service-ip-repair-controllers ok

[+]poststarthook/rbac/bootstrap-roles ok

[+]poststarthook/scheduling/bootstrap-system-priority-classes ok

[+]poststarthook/priority-and-fairness-config-producer ok

[+]poststarthook/start-system-namespaces-controller ok

[+]poststarthook/bootstrap-controller ok

[+]poststarthook/start-cluster-authentication-info-controller ok

[+]poststarthook/start-kube-apiserver-identity-lease-controller ok

[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok

[+]poststarthook/start-legacy-token-tracking-controller ok

[+]poststarthook/aggregator-reload-proxy-client-cert ok

[+]poststarthook/start-kube-aggregator-informers ok

[+]poststarthook/apiservice-registration-controller ok

[+]poststarthook/apiservice-status-available-controller ok

[+]poststarthook/kube-apiserver-autoregistration ok

[+]autoregister-completion ok

[+]poststarthook/apiservice-openapi-controller ok

[+]poststarthook/apiservice-openapiv3-controller ok

[+]poststarthook/apiservice-discovery-controller ok

livez check passed

KUBERNETES: KUBECTL Command To Check UnHealthy Status Pods

Below command can be used. Below command will list all the pods in all the namespaces which are in unhealthy status.

kubectl get pods -o wide --all-namespaces | grep -vE 'Running|Completed'

Above command will list all the pods which are in other status than the running or completed status

KUBERNETES: KUBECTL Command To Check Kubernetes Components Status

Below command can be used to check the Kubernetes components status.

kubectl get cs

Below is sample output.

#kubectl get cs

Warning: v1 ComponentStatus is deprecated in v1.19+

NAME STATUS MESSAGE ERROR

controller-manager Healthy ok

scheduler Healthy ok

etcd-0

KUBERNETES: Kubectl Command To View All Pods And List Containers & Images Part Of the All Pods (How To)

Below command can be used on Control node. In below command, we are listing pods part of all namespaces, you can change the command and list pods/container/images for the namespace you want using -n flag.

kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{"pod: "}{.metadata.name}{"\n"}{range .status.containerStatuses[*]}{"\tname: "}{.containerID}{"\n\timage: "}{.image}{"\n"}{end}'

Below is sameple output.

pod: details-v1-cf74bb974-8jfx5

image: docker.io/istio/examples-bookinfo-details-v1:1.19.1

pod: productpage-v1-87d54dd59-2xqm2

image: docker.io/istio/examples-bookinfo-productpage-v1:1.19.1

pod: ratings-v1-7c4bbf97db-hcxgt

image: docker.io/istio/examples-bookinfo-ratings-v1:1.19.1

KUBERNETES - How To Verify DNS Resolution Is Working and CoreDNS Pods Are Healthy In Kubernetes?

To verify / validate / check DNS resolution is working and coredns pods are healthy in Kubernetes, we can deploy dnsutils pods and run nslookup tests. More information on deploying dnsutils pod and running nslookup tests can be found in below kubernetes.io documentation.

https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

KUBERNETES: Kubectl Command To Tail CoreDNS Pod Service Logs (Kube-Dns)

Below command can be used.

kubectl logs --follow -n kube-system --selector 'k8s-app=kube-dns'

KUBERNETES: How To Quickly Test App Pod Without A Service & Not Exposed Externally Whether It is Accessible And Working?

KUBERNETES: How To Quickly Test App Pod Without A Service & Not Exposed Externally Whether It is Accessible And Working?

The quickest and easiest way to verify whether the app pod that is not exposed externally and without a service, is to do following

Identify the port on which app is listening/running in the pod.
One of the control nodes, use kubectl port-forward command to forward requests coming into IP address/Port on the control node to app pod and port on which app is listening on the pod.
Access the application via curl, wget or from browser using the IP address and port of the control node on which kubectl port-forward command is being executed.

Following is the syntax for kubectl port-forward command that has to be executed on the control node.

kubectl port-forward --address <local machine address> <target pod name> -n <namespace> <local machine port>:<target pod app listening port>

Once above kubectl port-forward command is executed, command will be running and control node will be listening on the port waiting for connects. For e.g.On one of the Kubernetes control nodes, You can do port forwarding to forward requests on port 8080 on that kubernetes control node to port 80 on one of the nginx pods as shown below.

kubectl port-forward --address 10.10.10.10 nginx-deployment-6595874d85-n6rww 8080:80

You can then try to access the application using the IP of the Kubernetes nodes and port in this case using http://10.10.10.10:8080 for accessing the nginx app. Other way to test if BUI access is not available is to do wget test as follows in this example.

wget -p 10.10.10.10:8080

You will see output as follows which indicates nginx index.html file is accessible on port 8080 (forwarded port)

wget -p 10.10.10.10:8080

--2024-03-21 18:07:29-- http://127.0.0.1:8080/

Connecting to 10.10.10.10:8080... connected.

HTTP request sent, awaiting response... 200 OK

Length: 612 [text/html]

Saving to: ‘10.10.10.10:8080/index.html’

100%[=========================================>] 612 --.-K/s in 0s

2024-03-21 18:07:29 (1.23 MB/s) - ‘10.10.10.10:8080/index.html’ saved [612/612]

FINISHED --2024-03-21 18:07:29--

KUBERNETES: Kubectl Port-Forward Command To Test Request Forwarding To Pods To Debug Network Issues

KUBERNETES: Kubectl Port-Forward Command To Test Request Forwarding To Pods To Debug Network Issues

kubectl port-forward command can be used to debug network issues connecting to pods.

Following is syntax for kubectl port forwarding

kubectl port-forward --address <local machine address> <target pod name> -n <namespace> <local machine port>:<target pod app listening port>

In above command, <local machine address> is the IP address of the local machine where you are running the kubectl test, <local machine port> is port you want to use on local machine where you are running kubectl test to start the port forwarding test. <target pod listening port> should be the pod port on which apps are listening. <target pod name> is the podman and <namespace> is the namespace under which pod is running in Kubernetes.

To demonstrate this test, lets take a example of starting nginx pod which has index HTML page with app listening on port 80. On one of the Kubernetes nodes lets forward the requests coming to port 8080 on the node to forward to port 80 on nginx pod. Below are steps for testing.

1) Create a nginx pods using below command.

kubectl create -f https://k8s.io/examples/application/deployment.yaml

You will have below nginx pods deployed.

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

nginx-deployment-6595874d85-hhhrs 1/1 Running 0 9s 10.244.1.28 cne14-worker1 <none> <none>

nginx-deployment-6595874d85-n6rww 1/1 Running 0 9s 10.244.1.29 cne14-worker1 <none> <none>

2) On one of the Kubernetes nodes, do port forwarding to forward requests on port 8080 on that kubernetes nodes to port 80 on one of the nginx pods. For example below command.

kubectl port-forward --address 10.10.10.10 nginx-deployment-6595874d85-n6rww 8080:80

You will see console output as follows:

kubectl port-forward nginx-deployment-6595874d85-n6rww 8080:80

Forwarding from 10.10.10.10:8080 -> 80

Forwarding from [::1]:8080 -> 80

Please note that if you do not specify the --address flag by default the port forwarding will listen on localhost 127.0.0.1 IP. You would output as follows:

kubectl port-forward nginx-deployment-6595874d85-n6rww 8080:80

Forwarding from 127.0.0.1:8080 -> 80

Forwarding from [::1]:8080 -> 80

3) From any other node which has access to the node where you are doing the kubectl port forward test, open SSH session to that node and run below command

wget -p 10.10.10.10:8080

You will see output as follows which indicates nginx index.html file is accessible on port 8080 (forwarded port)

wget -p 10.10.10.10:8080

--2024-03-21 18:07:29-- http://127.0.0.1:8080/

Connecting to 10.10.10.10:8080... connected.

HTTP request sent, awaiting response... 200 OK

Length: 612 [text/html]

Saving to: ‘10.10.10.10:8080/index.html’

100%[=========================================>] 612 --.-K/s in 0s

2024-03-21 18:07:29 (1.23 MB/s) - ‘10.10.10.10:8080/index.html’ saved [612/612]

FINISHED --2024-03-21 18:07:29--

At the same time of wget test when you see the window where you have kubectl port-forward running you will see snippets as follows which says that connection is being handled.

kubectl port-forward nginx-deployment-6595874d85-n6rww 8080:80

Forwarding from 127.0.0.1:8080 -> 80

Forwarding from [::1]:8080 -> 80

Handling connection for 8080

At the same time of wget test when you see the window where you have kubectl port-forward running you will see snippets as follows which says that connection is being handled. Also as part of this test if you are doing kubectl port forwarding test to a HTTP webapp on the pod, you can even access the app from browser using the listen address and port you specified in kubectl port forward command. In case of this article example, it would be http://10.10.10.10:8080

KUBERNETES: Kubectl Command To Create Temporary Pod and Connect For Testing

Kubectl Command To Create Temporary Pod and Connect For Testing

Below is the command to create busybox temporary pod.

kubectl run -it --rm busybox --image=busybox

With above command we will connect to the pod once it is created. Once we exit the pod using exit command, created pod will be deleted as it is temporary pod.

Below is snippet

$ kubectl run -it --rm busybox --image=busybox
If you don't see a command prompt, try pressing enter.
/ #
/ # exit

KUBERNETES: Kubectl Commands To Check CoreDNS (kube-dns) Pods, Services & Logs

KUBERNETES: Kubectl Commands To Check CoreDNS (kube-dns) Pods, Services & Logs

Below are commands to check coredns (kube-dns) pods, services and logs.

For listing kube-dns coredns pods, below command can be used.

# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns

For listing the kube-dns service, below command cna be used.

# kubectl get svc --namespace=kube-system

For checking the logs of kube-dns pods below command can be used.

# kubectl logs --namespace=kube-system -l k8s-app=kube-dns

KUBERNETES: Kubectl Commands To Check CoreDNS (kube-dns) DNS Service Logs

KUBERNETES: Kubectl Commands To Check CoreDNS (kube-dns) DNS Service Logs

Below commands can be used.

kubectl logs --namespace=kube-system -l k8s-app=kube-dns

For continuously failing the log below command can be used.

kubectl logs --follow -n kube-system --selector 'k8s-app=kube-dns'

KUBERNETES: Kubectl Command To Check Events On A Pod

KUBERNETES: Kubectl Command To Check Events On A Pod

Below command can be used.

kubectl alpha events pod <pod name> -n <name space>

For e.g. if you want to check the coredns pod events in kube-system namespace your command will look like this.

kubectl alpha events pod coredns-664c775d6f-nfdsg -n kube-system

Below is sample output of above command.

ainer-registry.oracle.com/olcne/kube-proxy:v1.24.15" already present on machine

13m Normal Killing Pod/kube-flannel-ds-dkjh4 Stopping container kube-flannel

13m Normal Scheduled Pod/kube-proxy-4ch2b Successfully assigned kube-system/kube-proxy-4ch2b to cne14-worker2

13m Normal SuccessfulCreate DaemonSet/kube-proxy Created pod: kube-proxy-4ch2b

13m Normal Created Pod/kube-proxy-4ch2b Created container kube-proxy

13m Normal Started Pod/kube-proxy-4ch2b Started container kube-proxy

13m Normal SuccessfulDelete DaemonSet/kube-flannel-ds Deleted pod: kube-flannel-ds-dkjh4

13m Normal Scheduled Pod/kube-flannel-ds-px4tt Successfully assigned kube-system/kube-flannel-ds-px4tt to cne14-worker1

KUBERNETES: How To Enable core_pattern On the Linux Host To Generate Coredump File For Pods/Containers Crash Issue

KUBERNETES: How To Enable core_pattern On the Linux Host To Generate Coredump File For Pods/Containers Crash Issue

A core_pattern defines the name and path for the core dump file if there is a crash of pods or containers due to segfault or system calls. Coredump file contains an image of the process’s memory at the time of termination.

Below are steps for enabling core_pattern on the Linux hosts to generate coredump if the pod or container crashes.

1) Make a note of the default core pattern setting using below command.

cat /proc/sys/kernel/core_pattern

Default value will be core.

2) Change the core pattern as follows:

sysctl -w "kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h"

3) Verify that the core handler changed to systemd core handler by runnign below command.

cat /proc/sys/kernel/core_pattern

You should see output as follows:

|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

How to Check and collect the coredump in case of pods/containers crash

1) If the pods are in crashloopbackoff state or error state due to segfault or system calls from OS, check and identify pods corefile PID using below command.

coredumpctl list

Below is sample output of above command which shows testpod under EXE column. The pid for the pod is 1459 in this case

# coredumpctl list

TIMEPID UID GID SIG PRESENT EXE

Tue 2024-01-09 20:51:56 GMT1459 0 0 3 * /usr/local/bin/testpod

2) Once the PID is identified dump the core to a file using below command. In below command replace 1459 PID with the identified coredns pid.

coredumpctl dump 1459 > /tmp/coredump.out

/tmp/coredump.out will be your coredump file.

In case you want to revert back the core_pattern to default value which was there before, for e.g. to set it back to default core, run below command.

sysctl -w "kernel.core_pattern=core"

KUBERNETES: How to Capture TCPDumps To Check Where CoreDNS Is Communicating To External DNS?

KUBERNETES: How to Capture TCPDumps To Check Where CoreDNS Is Communicating To External DNS?

CoreDNS communicates to DNS server over Port 53 and over UDP protocol. CoreDNS runs on Pod Network. When CoreDNS communicates to external DNS, CoreDNS sends the request over its pod network via UDP protocol to underlying host on which it is running, then NAT'ing happens and Pod network IP of coredns gets nat'ed to underlying host IP. Underlying host IP sends the request from coredns to external DNS. External DNS responds back and sends reply to underlying host on which coredns pod runs. Host then again nat's back the reply from DNS and sends it to coredns over its pod network IP.

In order to check if the communication is happening from coredns to external DNS, we have to capture tcpdumps on the host on which coredns is running on UDP protocol and port 53. Below are steps to capture tcpdumps

1) Execute below tcpdump command on the host where coredns pod is running to capture the tcpdumps on UDP and port 53.

In below command replace XX.XX.XX.XX with IP of the host where coredns pod is running, replace YY.YY.YY.YY with the IP of external DNS

sudo tcpdump -nnnni ens3 host XX.XX.XX.XX or host YY.YY.YY.YY and udp port 53

2) Connect to the coredns pod which is running on the host on which tcpdumps is executed (in above step 1). Run nslookup command on the coredns pod to query external DNS as follows for google.com for e.g.

nslookup google.com

In the tcpdumps capture you should be noticing snippets as follows showing the communication in a working scenario.

06:18:21.147166 IP 10.XX.XX.236.1171 > 169.XX.XX.XX.53: 34506+ A? google.com. (28)

06:18:21.147941 IP 169.XX.XX.XX.53 > 10.XX.XX.236.1171: 34506 1/0/0 A 142.250.70.46 (44)

06:18:21.148161 IP 10.XX.XX.XX.45357 > 169.XX.XX.XX4.53: 18678+ AAAA? google.com. (28)

06:18:21.148836 IP 169.XX.XX.XX.53 > 10.XX.XX.XX.45357: 18678 1/0/0 AAAA 2404:6800:4009:829::200e (56)

KUBERNETES: How To Enable & Capture Coredumps From Crashing/Failing Pods At Linux Level?

KUBERNETES: How To Enable & Capture Coredumps From Crashing/Failing Pods At Linux Level?

Follow below steps:

1) Make a note of the default core pattern setting using the below command.

cat /proc/sys/kernel/core_pattern

Default value will be core.

2) Change the core pattern as follows:

sysctl -w "kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h"

3) Verify that the core handler changed to systemd core handler by running below command.

cat /proc/sys/kernel/core_pattern

You should see output as follows:

|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

4) Restart the kubernetes pods which are having issues. Wait for the pods to fail/crash.

5) Once the pods are in crashloopbackoff or other failed or crashed state, check for pod corefile PID using below command.

coredumpctl list

Below is sample output of the above command, which shows test under EXE column. In your case, look for pods which is having issues under EXE column and identify the PID for it.

# coredumpctl list

TIME PID UID GID SIG PRESENT EXE

Tue XXXX-XX-XX XX:XX:56 GMT 1358 0 0 3 * /usr/local/bin/test

Once the PID is identified, dump the core to a file using below command. In below command, replace PID-NUMBER with the identified pod pid.

coredumpctl dump PID-NUMBER > dump.out

To revert the core_pattern to default value at Linux level, you can run below command.

sysctl -w "kernel.core_pattern=core"

Main Menu

Search

KUBERNETES: Operating System Commands To Validate Kubernetes Related OS Settings For Comparing and Troubleshooting Issues

KUBERNETES: KUBECTL Command To Check Health Status and Liveliness Probe (livez check) Of Kubernetes Components

KUBERNETES: KUBECTL Command To Check UnHealthy Status Pods

KUBERNETES: KUBECTL Command To Check Kubernetes Components Status

KUBERNETES: Kubectl Command To View All Pods And List Containers & Images Part Of the All Pods (How To)

KUBERNETES - How To Verify DNS Resolution Is Working and CoreDNS Pods Are Healthy In Kubernetes?

KUBERNETES: Kubectl Command To Tail CoreDNS Pod Service Logs (Kube-Dns)

KUBERNETES: How To Quickly Test App Pod Without A Service & Not Exposed Externally Whether It is Accessible And Working?

KUBERNETES: Kubectl Port-Forward Command To Test Request Forwarding To Pods To Debug Network Issues

KUBERNETES: Kubectl Command To Create Temporary Pod and Connect For Testing

KUBERNETES: Kubectl Commands To Check CoreDNS (kube-dns) Pods, Services & Logs

KUBERNETES: Kubectl Commands To Check CoreDNS (kube-dns) DNS Service Logs

KUBERNETES: Kubectl Command To Check Events On A Pod

KUBERNETES: How To Enable core_pattern On the Linux Host To Generate Coredump File For Pods/Containers Crash Issue

KUBERNETES: How to Capture TCPDumps To Check Where CoreDNS Is Communicating To External DNS?

KUBERNETES: How To Enable & Capture Coredumps From Crashing/Failing Pods At Linux Level?