1. Scaling and Scheduling Pods in Kubernetes: All About HPA
Scaling Made Simple: Understanding the Role of HPA in Kubernetes Scheduling
Horizontal Pod Auto-Scaler (HPA)
As this is my 50th blog, here’s a detailed exploration of Kubernetes Horizontal Pod Autoscaler (HPA) and how it simplifies scaling and scheduling.
Steps to utilize the Horizontal Pod Autoscaler (HPA) for pods in Kubernetes:
- Set up a server on any cloud provider. In this example, AWS is used, and an EC2 instance has been created with
t2.medium
instance. If you are unfamiliar with AWS, you can refer to any of my previous blogs for guidance on creating an EC2 instance. And connect with instance via SSH.
- Now fork this repository: kubestarter and go to this repo and pull the latest code.
Commands:
git clone https://github.com/Chetan-Mohod/kubestarter.git
cd kubestarter
git pull
- Now go to the kind-cluster and create a new cluster with three nodes using the
config.yml
file that we used in our previous blogs.
cd kind-cluster
kind create cluster --name=acrobat-cluster --config=config.yml
Concept that demonstrate about HPA:
- Assume you have a cluster with a
t2.medium
instance that includes three nodes: one master and two worker nodes. When you run a pod on worker node1, the pod automatically checks the resources and limits of that node. Resources refer to the amount of RAM needed to run the pod, while limits indicate the maximum resources the pod can use before it cannot handle more traffic.
For example, if your pod uses 1 GB of RAM with a limit of 512 MB, it means that once the pod reaches 512 MB of RAM usage, it cannot handle additional traffic. To address this, we need to scale the pod so that it automatically creates another pod to share the load. This process is known as Auto-Scaling.
Auto-Scaling relies on metrics. Kubernetes introduces a Metrics Server, which checks the resource usage of our pods and uses this information to distribute the load. Metrics include storage, RAM, network, traffic, and load, and based on these, it automatically creates additional pods.
Let's automate this process.
- Run the command below to check if Metrics are available:
kubectl top node
If it shows this message, it means our cluster is not ready for HPA.
WHY it is not ready?
- To perform HPA, we need the Metrics API, which provides information about your cluster resources like CPU and storage.
HOW to install Metrics Server?
You can refer our repository: Click here Or follow below commands:
If you’re using Kind cluster install Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
- Edit the Metrics Server Deployment
kubectl -n kube-system edit deployment metrics-server
Our metrics server will be install in kube-system
namespace. Above command will edit the deployment which we have downloaded.
Add this argument to the file because you are in a virtual machine and Docker container. There are several components like SSL, TLS, etc., within it, and these networking elements are not secure. This can cause issues in a cluster. Therefore, include the following commands to secure these elements.
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
- Restart the deployment
kubectl -n kube-system rollout restart deployment metrics-server
- Verify if the metrics server is running
kubectl get pods -n kube-system
kubectl top nodes
You can see that it is running:
Now go back to the previous folder.
cd ..
- Now, we aim to run an application here, apply load to it, and scale it accordingly.
There are two types of Scaling Horizontal & Vertical:
cd HPA_VPA
ls -lrt
Apache is a server that displays a default webpage and is used to serve HTML files, configuration files, servers, and applications. It is also known as httpd.
Create a apache-deployment.yml
file from scratch. You can refer to the Kubernetes.io official website by searching for Kubernetes HPA Yaml. You can see in the code I have added the comments for resources and limits. And just replace metadata, labels, and image name from containers.
apiVersion: apps/v1
kind: Deployment
metadata:
name: apache-deployment
labels:
app: apache
spec:
replicas: 1
selector:
matchLabels:
app: apache
template:
metadata:
labels:
app: apache
spec:
containers:
- name: apache
image: httpd:2.4
ports:
- containerPort: 80
resources:
requests: #minimum request to run the app
cpu: 100m
memory: 128Mi
limits: #Maximum limit is 200m after that pod will fail
cpu: 200m
memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
name: apache-service
labels:
app: apache
spec:
selector:
app: apache
ports:
- protocol: TCP
port: 80
- Now, apply the changes:
kubectl apply -f apache-deployment.yml
kubectl get pods
- Port Forward our service
kubectl port-forward service/apache-service 80:80 --address=0.0.0.0 &
If you receive below error
Then run below command, -E
in sudo
keeps your settings (like KUBECONFIG
) when running kubectl
as superuser.
sudo -E kubectl port-forward service/apache-service 80:80 --address=0.0.0.0 &
- Go to EC2 instance and copy the public ip and paste it into the browser.
- To obtain more detailed information about our pod, execute the following command:
kubectl get pods -owide
kubectl get pod pod_name
Auto Scaling using HPA:
We aim to enable our pods to automatically scale in response to increased load.
How can this be achieved?
Remove existing
apache-hpa.yml
&apache-vpa.yml
from HPA_VPA directory.Create
apache-hpa.yml
from scratch, we will just create simple Apache Auto-Scaler.#vim apache-hpa.yml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: apacha-hpa spec: scaleTargetRef: # HPA replica target to our deployment, that's why we add apiVersion, kind, name of deployment apiVersion: apps/v1 kind: Deployment name: apache-deployment minReplicas: 1 maxReplicas: 5 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 5
- Apply the changes:
kubectl apply -f apache-hpa.yml
kubectl get hpa
You can see that only one replica is currently running:
- Next, we will create load on our server using "BusyBox", refer from GitHub README.
kubectl run -i --tty load-generator --image=busybox /bin/sh
while true; do wget -q -O- http://apache-service.default.svc.cluster.local; done
- Now you can check the HPA, and you'll see it creates 5 replicas.
kubectl get hpa
kubectl get pods
- Now, delete the load-generator pod to stop generating additional load.
kubectl delete pod load-generator
Application will still works:
- Now we will delete our HPA and reapply
apache-deployment.yml
to change the replicas from 5 back to 1.
kubectl delete -f apache-hpa.yml
kubectl apply -f apache-deployment.yml
This process shows how Horizontal Pod Auto-Scaling works. I’ll cover VPA in upcoming blog.
Happy Learning :)
Chetan Mohod ✨
For more DevOps updates, you can follow me on LinkedIn.