Essential OpenShift & Kubernetes Troubleshooting: From Basics to Advanced

When a deployment goes sideways, knowing which command to run can save you hours of frustration. This guide covers the essential commands for debugging OpenShift/Kubernetes clusters, categorized by the depth of the issue.

1. The Essentials: Initial Discovery

The first step in any investigation is identifying which components are failing and why.

List All Pods Across Namespaces

Start by getting a bird’s-eye view. Look specifically for statuses like CrashLoopBackOff, Pending, or Error.

# List all pods in all namespaces
oc get pods -A

Example Output:

NAMESPACE      NAME            READY   STATUS    RESTARTS        AGE
default        mysql-pod       0/1     Error     1 (5s ago)      24s

Inspecting Pod Events

The describe command provides an “Events” log at the bottom, which explains failure reasons like image pull errors or resource constraints.

oc describe pod mysql-pod

2. Deep Dive: Logs and Timelines

If the Pod is scheduled but the application is failing, you need to look at the logs.

Application Logs

Use -f to follow live logs or --previous to see why a container crashed in its last life.

# View current logs
oc logs mysql-pod

# See logs from a previously crashed instance
oc logs mysql-pod --previous

Global Cluster Events

To see a chronological timeline of everything happening in the cluster, sort the events by time. This is great for spotting Node-level issues.

kubectl get events --sort-by='.lastTimestamp'

Real-time Resource Usage

If your pods are running but slow, check if they are hitting resource limits.

kubectl top pods

3. Intermediate Troubleshooting: “Getting Inside”

Interactive Shell

Allows you to check internal files, environment variables, and local connectivity.

oc exec -it mysql-pod -- /bin/bash

Service & Endpoint Debugging

If a Service isn’t routing traffic, check the Endpoints. If the list is empty, your Service labels don’t match your Pod selectors.

oc get svc
oc get endpoints

4. Advanced Troubleshooting: “Surgical Tools”

Debugging with Ephemeral Containers

If a Pod lacks troubleshooting tools (like curl or ls), use oc debug to attach a support container.

# Spin up a terminal for an existing pod
oc debug pod/mysql-pod

Node-Level Troubleshooting

If the issue is the Node itself (disk pressure, networking, or Kubelet failure), you can debug the host directly.

# Access the host node directly
oc debug node/w1.mylab.local

Network Connectivity Debugging

Use a temporary “checker” pod to test DNS and connectivity from within the cluster.

# Run a temporary pod to test connectivity
oc run tracker-pod --image=busybox -it --rm --restart=Never -- bin/sh

# Inside the tracker-pod, test your service DNS
nslookup mysql-service

Summary Checklist

If Status is…	Try this first…
ImagePullBackOff	oc describe pod (Check registry credentials)
CrashLoopBackOff	oc logs –previous (Check app exit codes)
Pending	oc describe pod (Check Resource Quotas)