1. Scaling and Scheduling Pods in Kubernetes: All About HPA

Scaling Made Simple: Understanding the Role of HPA in Kubernetes Scheduling

1. Scaling and Scheduling Pods in Kubernetes: All About HPA

Horizontal Pod Auto-Scaler (HPA)

As this is my 50th blog, here’s a detailed exploration of Kubernetes Horizontal Pod Autoscaler (HPA) and how it simplifies scaling and scheduling.

Steps to utilize the Horizontal Pod Autoscaler (HPA) for pods in Kubernetes:

  1. Set up a server on any cloud provider. In this example, AWS is used, and an EC2 instance has been created with t2.medium instance. If you are unfamiliar with AWS, you can refer to any of my previous blogs for guidance on creating an EC2 instance. And connect with instance via SSH.

  1. Now fork this repository: kubestarter and go to this repo and pull the latest code.

Commands:

git clone https://github.com/Chetan-Mohod/kubestarter.git

cd kubestarter

git pull
  1. Now go to the kind-cluster and create a new cluster with three nodes using the config.yml file that we used in our previous blogs.
cd kind-cluster

kind create cluster --name=acrobat-cluster --config=config.yml

Concept that demonstrate about HPA:

  • Assume you have a cluster with a t2.medium instance that includes three nodes: one master and two worker nodes. When you run a pod on worker node1, the pod automatically checks the resources and limits of that node. Resources refer to the amount of RAM needed to run the pod, while limits indicate the maximum resources the pod can use before it cannot handle more traffic.

  • For example, if your pod uses 1 GB of RAM with a limit of 512 MB, it means that once the pod reaches 512 MB of RAM usage, it cannot handle additional traffic. To address this, we need to scale the pod so that it automatically creates another pod to share the load. This process is known as Auto-Scaling.

    • Auto-Scaling relies on metrics. Kubernetes introduces a Metrics Server, which checks the resource usage of our pods and uses this information to distribute the load. Metrics include storage, RAM, network, traffic, and load, and based on these, it automatically creates additional pods.

    • Let's automate this process.

  1. Run the command below to check if Metrics are available:
kubectl top node

If it shows this message, it means our cluster is not ready for HPA.

WHY it is not ready?

  • To perform HPA, we need the Metrics API, which provides information about your cluster resources like CPU and storage.

HOW to install Metrics Server?

  • You can refer our repository: Click here Or follow below commands:

  • If you’re using Kind cluster install Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
  • Edit the Metrics Server Deployment
kubectl -n kube-system edit deployment metrics-server

Our metrics server will be install in kube-system namespace. Above command will edit the deployment which we have downloaded.

Add this argument to the file because you are in a virtual machine and Docker container. There are several components like SSL, TLS, etc., within it, and these networking elements are not secure. This can cause issues in a cluster. Therefore, include the following commands to secure these elements.

- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
  • Restart the deployment
kubectl -n kube-system rollout restart deployment metrics-server
  • Verify if the metrics server is running
kubectl get pods -n kube-system
kubectl top nodes

You can see that it is running:

Now go back to the previous folder.

cd ..
  1. Now, we aim to run an application here, apply load to it, and scale it accordingly.

There are two types of Scaling Horizontal & Vertical:

cd HPA_VPA

ls -lrt

Apache is a server that displays a default webpage and is used to serve HTML files, configuration files, servers, and applications. It is also known as httpd.

Create a apache-deployment.yml file from scratch. You can refer to the Kubernetes.io official website by searching for Kubernetes HPA Yaml. You can see in the code I have added the comments for resources and limits. And just replace metadata, labels, and image name from containers.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: apache-deployment
  labels:
    app: apache
spec:
  replicas: 1
  selector:
    matchLabels:
      app: apache
  template:
    metadata:
      labels:
        app: apache
    spec:
      containers:
      - name: apache
        image: httpd:2.4
        ports:
        - containerPort: 80
        resources:
          requests:    #minimum request to run the app
            cpu: 100m
            memory: 128Mi
          limits:      #Maximum limit is 200m after that pod will fail
            cpu: 200m
            memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
  name: apache-service
  labels:
    app: apache
spec:
  selector:
    app: apache
  ports:
    - protocol: TCP
      port: 80
  1. Now, apply the changes:
kubectl apply -f apache-deployment.yml

kubectl get pods

  1. Port Forward our service
kubectl port-forward service/apache-service 80:80 --address=0.0.0.0 &

If you receive below error

Then run below command, -E in sudo keeps your settings (like KUBECONFIG) when running kubectl as superuser.

sudo -E kubectl port-forward service/apache-service 80:80 --address=0.0.0.0 &
  1. Go to EC2 instance and copy the public ip and paste it into the browser.

  1. To obtain more detailed information about our pod, execute the following command:
kubectl get pods -owide

kubectl get pod pod_name

Auto Scaling using HPA:

  1. We aim to enable our pods to automatically scale in response to increased load.

    How can this be achieved?

    • Remove existing apache-hpa.yml & apache-vpa.yml from HPA_VPA directory.

    • Create apache-hpa.yml from scratch, we will just create simple Apache Auto-Scaler.

        #vim apache-hpa.yml
      
        apiVersion: autoscaling/v2
        kind: HorizontalPodAutoscaler
        metadata:
          name: apacha-hpa
        spec:
          scaleTargetRef: # HPA replica target to our deployment, that's why we add apiVersion, kind, name of deployment
            apiVersion: apps/v1
            kind: Deployment
            name: apache-deployment
      
          minReplicas: 1
          maxReplicas: 5
          metrics:
          - type: Resource
            resource:
              name: cpu
              target:
                type: Utilization
                averageUtilization: 5
      
      • Apply the changes:
        kubectl apply -f apache-hpa.yml

        kubectl get hpa

You can see that only one replica is currently running:

  1. Next, we will create load on our server using "BusyBox", refer from GitHub README.
kubectl run -i --tty load-generator --image=busybox /bin/sh

while true; do wget -q -O- http://apache-service.default.svc.cluster.local; done
  1. Now you can check the HPA, and you'll see it creates 5 replicas.
kubectl get hpa

kubectl get pods

  1. Now, delete the load-generator pod to stop generating additional load.
kubectl delete pod load-generator

Application will still works:

  1. Now we will delete our HPA and reapply apache-deployment.yml to change the replicas from 5 back to 1.
kubectl delete -f apache-hpa.yml

kubectl apply -f apache-deployment.yml

This process shows how Horizontal Pod Auto-Scaling works. I’ll cover VPA in upcoming blog.


Happy Learning :)

Chetan Mohod ✨

For more DevOps updates, you can follow me on LinkedIn.