GKE cluster scale to zero — tips

Published in

Plarium-engineering

3 min readJan 31, 2022

Kubernetes it an amazing orchestration system. On Google Cloud Platform you can find it also very cost-effective, because GKE provide the ability to downscale node pools to zero — this way you can reduce cost to minimum.

The use-case we had for scaling Kubernetes to zero is by running Apache Spark batch jobs - Using the Spark operator we encourage the scaling ability, resource management and docker containers to have a consistent and reliable Spark jobs.

To do the job we defined 3 node pools in the cluster:

Low Resources (2 cpu 1 gb) (auto scaling off) — this pool holds the Spark operator pods and contains minimal amount of resources.
High Resources (8 cpu 64 gb) (auto scaling on 0 – 10) – these pools is for the Spark driver pods. Here we used non Preemptible due to the fact that the only place where application lives on and relies on — is the Spark driver pod. We cannot have a node evicted from the pool here.
High Preemptible (16 cpu 128 gb) (auto scaling on 0 — 100) — this pool is of in usage of Spark workers, since Spark is resilient by design - we can safely use temporary nodes to reduce cost.

Yes, we have also defined more node pools. Each one has its own purpose and job definitions, depending on the application we run and job specification; but for our use with those 3 configurations you can start running your job.

Sooner or later you will notice multiple issues:

Auto scaling slows — 10 min. delay between scaling operations. Well, it’s not so cost-optimized.
Scale To Zero never happens — when jobs are done you will notice that not all nodes are evicted from the pools. You will have at least one node still up and running and it will never be evicted — this occurs because of kube-system pods that hold the node alive.

Here are some tricks that can help you tackle those issues:

Tuning the autoscaler profile

The default autoscale profile is balanced. Switching to optimize utilization gives us more aggressive downscaling of node and helps us reduce cost — from google docs:

optimize-utilization: Prioritize optimizing utilization over keeping spare resources in the cluster. When selected, the cluster autoscaler scales down the cluster more aggressively: it can remove more nodes, and remove nodes faster. This profile has been optimized for use with batch workloads that are not sensitive to start-up latency. We do not currently recommend using this profile with serving workloads.

This is how to apply that on the cluster:

gcloud container clusters update <cluster name>  --autoscaling-profile optimize-utilization --project <project name>

2. Change Kube-System Pods Configurations

To tackle the next issue we have to tune the kube-system pods — if you will wait for a while you will discover that the autoscaler produces error logs when trying to downscale the node pools to zero. Here are some logs examples:

{
  "noDecisionStatus": {
    "measureTime": "1643186659",
    "noScaleDown": {
          ...
          "reason": {
            "parameters": [
              "event-exporter-gke-5479fd58c8-hkzh6"
            ],
            "messageId": "no.scale.down.node.pod.has.local.storage"
          },
          ...
      ]
    }
  }
}

{
  "noDecisionStatus": {
    "noScaleDown": {
          ...
          "reason": {
            "parameters": [
              "metrics-server-v0.4.4-857776bc9c-76kv2"
            ],
            "messageId": "no.scale.down.node.pod.has.local.storage"
          }
          ...
}

You can read more about these issues in the following information provided by google cloud:

autoscaler/FAQ.md at master · kubernetes/autoscaler

The answers in this FAQ apply to the newest (HEAD) version of Cluster Autoscaler. If you're using an older version of CA…

github.com

And open issue on google cloud GitHub repository:

event-exporter prevents scaledown by cluster-autoscaler · Issue #359 ·…

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

I found this information good but didn’t get a practical example of how to apply this in my own cluster — so here you have the instructions and changes you need to apply to get the desired state of autoscaling to zero:

On the cluster you want to tune — on the kube-system namespace — open the kube-dns-autoscaler config map and change:

preventSinglePointFailure -> false

The next step is to create poddisruptionbudget for the glbc selector

kubectl create poddisruptionbudget l7-default-backend-pdb --namespace=kube-system --selector name=glbc --max-unavailable 1

I hope this helps anyone,

Ben Mizrahi

GKE cluster scale to zero — tips

autoscaler/FAQ.md at master · kubernetes/autoscaler

The answers in this FAQ apply to the newest (HEAD) version of Cluster Autoscaler. If you're using an older version of CA…

event-exporter prevents scaledown by cluster-autoscaler · Issue #359 ·…

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

Written by Ben Mizrahi