GKE cluster scale to zero — tips

Ben Mizrahi
Plarium-engineering

--

Kubernetes it an amazing orchestration system. On Google Cloud Platform you can find it also very cost-effective, because GKE provide the ability to downscale node pools to zero — this way you can reduce cost to minimum.

The use-case we had for scaling Kubernetes to zero is by running Apache Spark batch jobs - Using the Spark operator we encourage the scaling ability, resource management and docker containers to have a consistent and reliable Spark jobs.

To do the job we defined 3 node pools in the cluster:

  1. Low Resources (2 cpu 1 gb) (auto scaling off) — this pool holds the Spark operator pods and contains minimal amount of resources.
  2. High Resources (8 cpu 64 gb) (auto scaling on 0 – 10) – these pools is for the Spark driver pods. Here we used non Preemptible due to the fact that the only place where application lives on and relies on — is the Spark driver pod. We cannot have a node evicted from the pool here.
  3. High Preemptible (16 cpu 128 gb) (auto scaling on 0 — 100) — this pool is of in usage of Spark workers, since Spark is resilient by design - we can safely use temporary nodes to reduce cost.

Yes, we have also defined more node pools. Each one has its own purpose and job definitions, depending on the application we run and job specification; but for our use with those 3 configurations you can start running your job.

Sooner or later you will notice multiple issues:

  1. Auto scaling slows — 10 min. delay between scaling operations. Well, it’s not so cost-optimized.
  2. Scale To Zero never happens — when jobs are done you will notice that not all nodes are evicted from the pools. You will have at least one node still up and running and it will never be evicted — this occurs because of kube-system pods that hold the node alive.

Here are some tricks that can help you tackle those issues:

  1. Tuning the autoscaler profile

The default autoscale profile is balanced. Switching to optimize utilization gives us more aggressive downscaling of node and helps us reduce cost — from google docs:

optimize-utilization: Prioritize optimizing utilization over keeping spare resources in the cluster. When selected, the cluster autoscaler scales down the cluster more aggressively: it can remove more nodes, and remove nodes faster. This profile has been optimized for use with batch workloads that are not sensitive to start-up latency. We do not currently recommend using this profile with serving workloads.

This is how to apply that on the cluster:

gcloud container clusters update <cluster name>  --autoscaling-profile optimize-utilization --project <project name>

2. Change Kube-System Pods Configurations

To tackle the next issue we have to tune the kube-system pods — if you will wait for a while you will discover that the autoscaler produces error logs when trying to downscale the node pools to zero. Here are some logs examples:

{
"noDecisionStatus": {
"measureTime": "1643186659",
"noScaleDown": {
...
"reason": {
"parameters": [
"event-exporter-gke-5479fd58c8-hkzh6"
],
"messageId": "no.scale.down.node.pod.has.local.storage"
},
...
]
}
}
}
{
"noDecisionStatus": {
"noScaleDown": {
...
"reason": {
"parameters": [
"metrics-server-v0.4.4-857776bc9c-76kv2"
],
"messageId": "no.scale.down.node.pod.has.local.storage"
}
...
}

You can read more about these issues in the following information provided by google cloud:

And open issue on google cloud GitHub repository:

I found this information good but didn’t get a practical example of how to apply this in my own cluster — so here you have the instructions and changes you need to apply to get the desired state of autoscaling to zero:

  • On the cluster you want to tune — on the kube-system namespace — open the kube-dns-autoscaler config map and change:
preventSinglePointFailure -> false
  • The next step is to create poddisruptionbudget for the glbc selector
kubectl create poddisruptionbudget l7-default-backend-pdb --namespace=kube-system --selector name=glbc --max-unavailable 1

I hope this helps anyone,

Ben Mizrahi

--

--