Kubernetes — The Complete Guide
If Docker lets you package an application into a container, Kubernetes lets you run that container at scale — across dozens, hundreds, or thousands of machines — without thinking about which machine runs what.
Kubernetes (K8s) is an open-source container orchestration platform originally developed by Google, based on over a decade of their internal experience running production workloads with a system called Borg. Google open-sourced it in 2014, and it’s now maintained by the Cloud Native Computing Foundation (CNCF).
Today, Kubernetes is the de facto standard for running containerized applications in production. AWS (EKS), Google Cloud (GKE), Azure (AKS), and every major cloud provider offer managed Kubernetes services. If you’re working with containers, you’re working with Kubernetes.
This guide covers Kubernetes from the ground up — architecture, core resources, networking, storage, security, and real-world patterns.
Why Kubernetes Exists
Before Kubernetes, deploying applications looked like this:
Physical servers (2000s): One application per server. Wasteful — most servers sat at 10% CPU utilization. Scaling meant buying more hardware.
Virtual machines (2010s): Multiple VMs per server. Better utilization, but VMs are heavy — each carries a full OS kernel, consuming gigabytes of RAM and minutes to boot.
Containers (2013+): Docker made containers mainstream. Lightweight, fast to start (milliseconds), share the host kernel. But running a single container is easy. Running 500 containers across 50 machines — keeping them healthy, networking them together, scaling them up and down, rolling out updates without downtime — that’s the hard part.
Kubernetes solves the hard part. You tell it what you want (10 replicas of my web server, 3 replicas of my database, connected on this network) and Kubernetes figures out how to make it happen.
Architecture — Control Plane and Worker Nodes
A Kubernetes cluster has two types of components:
┌────────────────────────────────────────────────────────────┐
│ CONTROL PLANE │
│ │
│ ┌────────────┐ ┌─────────────┐ ┌───────────────────┐ │
│ │ API Server │ │ Scheduler │ │ Controller Manager │ │
│ └────────────┘ └─────────────┘ └───────────────────┘ │
│ ┌────────────┐ ┌─────────────┐ │
│ │ etcd │ │ Cloud Ctrl │ │
│ └────────────┘ └─────────────┘ │
│ │
└────────────────────────────────────────────────────────────┘
| | |
┌─────┴──────┐ ┌─────┴──────┐ ┌─────┴──────┐
│ Worker Node │ │ Worker Node │ │ Worker Node │
│ │ │ │ │ │
│ ┌─────────┐│ │ ┌─────────┐│ │ ┌─────────┐│
│ │ kubelet ││ │ │ kubelet ││ │ │ kubelet ││
│ ├─────────┤│ │ ├─────────┤│ │ ├─────────┤│
│ │kube-proxy││ │ │kube-proxy││ │ │kube-proxy││
│ ├─────────┤│ │ ├─────────┤│ │ ├─────────┤│
│ │Container ││ │ │Container ││ │ │Container ││
│ │ Runtime ││ │ │ Runtime ││ │ │ Runtime ││
│ └─────────┘│ │ └─────────┘│ │ └─────────┘│
│ ┌───┐┌───┐ │ │ ┌───┐┌───┐ │ │ ┌───┐┌───┐ │
│ │Pod││Pod│ │ │ │Pod││Pod│ │ │ │Pod││Pod│ │
│ └───┘└───┘ │ │ └───┘└───┘ │ │ └───┘└───┘ │
└────────────┘ └────────────┘ └────────────┘
Control Plane Components
API Server (kube-apiserver) — The front door to Kubernetes. Every interaction — kubectl commands, internal component communication, external integrations — goes through the API server. It validates requests, authenticates users, and persists state to etcd.
etcd — A distributed key-value store that holds the entire cluster state. Every resource definition, every pod’s status, every secret — all stored in etcd. It’s the single source of truth. If etcd dies and there’s no backup, you lose the cluster state.
Scheduler (kube-scheduler) — When a new pod needs to run, the scheduler decides which node to place it on. It considers resource requirements (CPU, memory), node capacity, affinity/anti-affinity rules, taints and tolerations, and other constraints.
Controller Manager (kube-controller-manager) — Runs a collection of controllers that watch the cluster state and make changes to move toward the desired state. The most important ones:
- ReplicaSet controller — Ensures the right number of pod replicas are running
- Deployment controller — Manages rolling updates and rollbacks
- Node controller — Monitors node health
- Service Account controller — Creates default service accounts
Cloud Controller Manager — Integrates with cloud provider APIs (AWS, GCP, Azure) for things like load balancers, storage volumes, and node management.
Worker Node Components
kubelet — The agent that runs on every worker node. It receives pod specifications from the API server and ensures the containers described in those specs are running and healthy. If a container crashes, kubelet restarts it.
kube-proxy — Handles networking on each node. Manages iptables or IPVS rules to route traffic to the correct pods when you access a Service.
Container Runtime — The software that actually runs containers. Kubernetes supports containerd (the default since K8s 1.24), CRI-O, and others via the Container Runtime Interface (CRI). Docker as a runtime was deprecated in 1.20 and removed in 1.24.
Core Resources
Pods
The pod is the smallest deployable unit in Kubernetes. A pod is one or more containers that share:
- Network namespace (same IP address, same ports)
- Storage volumes
- Lifecycle (created and destroyed together)
Most pods run a single container. Multi-container pods are used for sidecar patterns (logging, proxying, service mesh).
apiVersion: v1
kind: Pod
metadata:
name: my-app
labels:
app: web
spec:
containers:
- name: web
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
$ kubectl apply -f pod.yaml
pod/my-app created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app 1/1 Running 0 5s
$ kubectl logs my-app
... nginx access logs ...
$ kubectl exec -it my-app -- /bin/sh
# hostname
my-app
You almost never create pods directly. Instead, you create higher-level resources that manage pods for you.
Deployments
A Deployment manages a set of identical pod replicas and handles rolling updates.
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: myapp:v1.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "100m"
memory: "128Mi"
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
This creates 3 replicas of myapp:v1.0. If one pod crashes, the Deployment controller automatically creates a new one. If a node goes down, the pods are rescheduled to healthy nodes.
Rolling Updates
# Update the image — triggers a rolling update
$ kubectl set image deployment/web-app web=myapp:v2.0
# Watch the rollout progress
$ kubectl rollout status deployment/web-app
Waiting for deployment "web-app" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "web-app" rollout to finish: 2 out of 3 new replicas have been updated...
deployment "web-app" successfully rolled out
# Rollback if something goes wrong
$ kubectl rollout undo deployment/web-app
deployment.apps/web-app rolled back
During a rolling update, Kubernetes gradually replaces old pods with new ones — ensuring zero downtime. The readinessProbe prevents traffic from being sent to pods that aren’t ready yet.
Services
Pods are ephemeral — they get new IP addresses every time they’re recreated. A Service provides a stable network endpoint that routes traffic to the right pods.
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
selector:
app: web # Routes to pods with label app=web
ports:
- port: 80 # Service port
targetPort: 8080 # Container port
type: ClusterIP # Internal only (default)
Service types:
| Type | Scope | Use Case |
|---|---|---|
| ClusterIP | Internal only | Service-to-service communication |
| NodePort | External via node IP:port | Development, simple external access |
| LoadBalancer | External via cloud LB | Production external access |
| ExternalName | DNS alias | Pointing to external services |
# List services
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
web-service ClusterIP 10.96.45.123 <none> 80/TCP 5m
# Access from within the cluster
$ kubectl exec -it some-pod -- curl http://web-service:80
The Service uses labels and selectors to find its target pods. Any pod with the label app: web receives traffic from this service. Pods can come and go — the Service always routes to the healthy ones.
ConfigMaps and Secrets
ConfigMaps store configuration data as key-value pairs:
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
DATABASE_HOST: "postgres.default.svc"
LOG_LEVEL: "info"
config.json: |
{
"feature_flags": {
"new_ui": true
}
}
Secrets store sensitive data (base64-encoded, optionally encrypted at rest):
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
type: Opaque
data:
username: dGhpbGFu # base64("thilan")
password: c2VjcmV0MTIz # base64("secret123")
Using them in a pod:
spec:
containers:
- name: app
image: myapp:v1.0
envFrom:
- configMapRef:
name: app-config
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: app-config
Important: Kubernetes Secrets are base64-encoded, not encrypted by default. Enable encryption at rest (EncryptionConfiguration) or use external secret managers (HashiCorp Vault, AWS Secrets Manager) for real security.
Namespaces
Namespaces provide logical isolation within a cluster:
$ kubectl get namespaces
NAME STATUS AGE
default Active 30d
kube-system Active 30d
kube-public Active 30d
$ kubectl create namespace production
$ kubectl create namespace staging
# Deploy to a specific namespace
$ kubectl apply -f deployment.yaml -n production
# Set default namespace for your context
$ kubectl config set-context --current --namespace=production
Namespaces are useful for:
- Separating environments (dev, staging, production)
- Team isolation
- Resource quotas (limit CPU/memory per namespace)
- Network policies (restrict traffic between namespaces)
Networking
Kubernetes networking follows a flat network model with three fundamental rules:
- Every pod gets its own IP address
- Pods can communicate with any other pod without NAT
- Agents on a node can communicate with all pods on that node
Pod-to-Pod Communication
Pods on the same node communicate via a virtual bridge (cbr0). Pods on different nodes communicate via an overlay network or cloud provider routing.
Popular CNI (Container Network Interface) plugins:
- Calico — L3 networking with BGP, supports network policies
- Cilium — eBPF-based, high performance, advanced security
- Flannel — Simple overlay network, good for getting started
- Weave — Mesh networking with encryption
DNS
Every Service gets a DNS entry:
<service-name>.<namespace>.svc.cluster.local
# From any pod in the cluster:
$ curl http://web-service.default.svc.cluster.local
# Or simply (within the same namespace):
$ curl http://web-service
Pods also get DNS entries, though they’re used less frequently:
<pod-ip-dashed>.<namespace>.pod.cluster.local
# Example: 10-244-1-5.default.pod.cluster.local
Ingress
An Ingress provides HTTP/HTTPS routing from outside the cluster to Services inside:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
tls:
- hosts:
- myapp.example.com
secretName: tls-secret
rules:
- host: myapp.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
This routes myapp.example.com/api/* to the API service and everything else to the web service. TLS termination happens at the Ingress controller.
You need an Ingress controller (nginx-ingress, Traefik, HAProxy, etc.) installed in your cluster for Ingress resources to work.
Network Policies
By default, all pods can communicate with all other pods. Network Policies restrict this:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-policy
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: web
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
This policy says: the API pods can only receive traffic from web pods on port 8080, and can only send traffic to database pods on port 5432. Everything else is blocked.
Storage
Volumes
Containers have ephemeral filesystems — data is lost when the container restarts. Volumes provide persistent storage.
spec:
containers:
- name: app
volumeMounts:
- name: data
mountPath: /var/data
volumes:
- name: data
persistentVolumeClaim:
claimName: my-pvc
PersistentVolumes and PersistentVolumeClaims
PersistentVolume (PV) — A piece of storage provisioned by an admin or dynamically:
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
storageClassName: standard
hostPath:
path: /data/my-pv
PersistentVolumeClaim (PVC) — A request for storage by a user:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: standard
The PVC is bound to a PV that meets its requirements. In cloud environments, dynamic provisioning creates the PV automatically when a PVC is created.
StatefulSets
For stateful applications (databases, message queues), StatefulSets provide:
- Stable, unique pod names (
postgres-0,postgres-1,postgres-2) - Stable network identities
- Ordered deployment and scaling
- Persistent storage per pod
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
Each pod gets its own PVC (data-postgres-0, data-postgres-1, data-postgres-2). If postgres-1 is rescheduled to a different node, it reattaches to the same volume.
Security
RBAC — Role-Based Access Control
Kubernetes uses RBAC to control who can do what:
# Define what actions are allowed
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
---
# Bind the role to a user
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: production
subjects:
- kind: User
name: alice
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
# Check permissions
$ kubectl auth can-i list pods --namespace production --as alice
yes
$ kubectl auth can-i delete pods --namespace production --as alice
no
Pod Security
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
Key security practices:
- Run as non-root — Never run containers as root
- Read-only filesystem — Prevent writes to the container filesystem
- Drop capabilities — Remove all Linux capabilities, add back only what’s needed
- No privilege escalation — Prevent processes from gaining more privileges
Service Accounts
Every pod runs with a Service Account that determines its API access:
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app-sa
namespace: production
---
spec:
serviceAccountName: my-app-sa
containers:
- name: app
image: myapp:v1.0
Best practice: create dedicated service accounts with minimal permissions for each application. Don’t use the default service account.
Scaling
Horizontal Pod Autoscaler (HPA)
Automatically scales pod replicas based on metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
When average CPU exceeds 70%, the HPA adds more replicas (up to 20). When it drops, replicas are removed (down to 2).
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-hpa Deployment/web-app 45%/70% 2 20 3 1h
Cluster Autoscaler
The HPA scales pods. The Cluster Autoscaler scales nodes. When pods can’t be scheduled (insufficient resources), it adds nodes. When nodes are underutilized, it removes them.
This is cloud-provider specific — EKS, GKE, and AKS all have their own Cluster Autoscaler integrations.
Health Checks
Kubernetes uses probes to determine pod health:
| Probe | Purpose | Action on Failure |
|---|---|---|
| Liveness | Is the container alive? | Restart the container |
| Readiness | Is the container ready for traffic? | Remove from Service endpoints |
| Startup | Has the container started? | Kill and restart (for slow-starting apps) |
containers:
- name: app
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10
The startup probe gives the app up to 300 seconds (30 failures x 10 seconds) to start. After that, the liveness and readiness probes take over.
Jobs and CronJobs
For one-off or scheduled tasks:
# One-time job
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
spec:
template:
spec:
containers:
- name: migrate
image: myapp:v1.0
command: ["./manage.py", "migrate"]
restartPolicy: Never
backoffLimit: 3
---
# Scheduled job
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *" # 2:00 AM daily
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: backup-tool:latest
command: ["./backup.sh"]
restartPolicy: OnFailure
Essential kubectl Commands
# Cluster info
kubectl cluster-info
kubectl get nodes -o wide
# Workloads
kubectl get pods -A # All pods, all namespaces
kubectl get deployments
kubectl describe pod <name> # Detailed pod info
kubectl logs <pod> -f # Stream logs
kubectl logs <pod> -c <container> # Specific container in multi-container pod
kubectl exec -it <pod> -- /bin/sh # Shell into a pod
# Debugging
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl top pods # CPU/memory usage
kubectl top nodes
kubectl describe node <name> # Node capacity and allocations
# Scaling
kubectl scale deployment web-app --replicas=5
# Updates
kubectl set image deployment/web-app web=myapp:v2.0
kubectl rollout status deployment/web-app
kubectl rollout undo deployment/web-app
kubectl rollout history deployment/web-app
# Resources
kubectl apply -f manifest.yaml # Create or update
kubectl delete -f manifest.yaml # Delete
kubectl diff -f manifest.yaml # Preview changes
# Port forwarding (for debugging)
kubectl port-forward svc/web-service 8080:80
Production Best Practices
Resource Management:
- Always set
requestsandlimitsfor CPU and memory - Use
LimitRangeto enforce defaults per namespace - Use
ResourceQuotato cap total resource usage per namespace
Reliability:
- Run at least 3 replicas for critical services
- Use
PodDisruptionBudgetto prevent too many pods going down during maintenance - Set proper liveness and readiness probes
- Use
topologySpreadConstraintsto spread pods across zones
Security:
- Enable RBAC and follow least privilege
- Run containers as non-root
- Use NetworkPolicies to restrict traffic
- Scan container images for vulnerabilities
- Enable audit logging
- Encrypt Secrets at rest
Observability:
- Centralize logs (Fluentd/Fluent Bit -> Elasticsearch, or Loki)
- Metrics with Prometheus + Grafana
- Distributed tracing with Jaeger or Zipkin
- Set up alerts for pod crashes, high resource usage, and failed deployments
GitOps:
- Store all manifests in Git
- Use tools like ArgoCD or Flux for automated deployment
- Never
kubectl applymanually in production
Final Thoughts
Kubernetes is complex. There’s no sugarcoating it. The learning curve is steep, and the ecosystem is vast. But there’s a reason it won — it solves real problems that every organization running distributed applications faces.
The key insight behind Kubernetes is declarative configuration. You don’t tell Kubernetes how to deploy your application — you tell it what you want (3 replicas, 2 CPU cores each, exposed on port 80, with a health check). Kubernetes figures out the rest. If a pod crashes, it creates a new one. If a node goes down, it reschedules the pods. If you push a new image, it rolls out gradually with zero downtime.
Once you internalize this declarative model — “describe the desired state, let the system converge” — Kubernetes starts making sense. Everything is a resource definition in YAML. Everything is watched by controllers. Everything is reconciled toward the desired state.
Start small. Deploy a single application. Add a Service. Add an Ingress. Then scale. Then add monitoring. Each piece builds on the last.
Thanks for reading!