Managing your cloud ecosystems: Maintaining workload continuity during worker node upgrades

Planning and managing your cloud ecosystem and environments is critical for reducing production downtime and maintaining a functioning workload. In the “Managing your cloud ecosystems” blog series, we cover different strategies for ensuring that your setup functions smoothly with minimal downtime.

To start things off, the first topic in this blog series is ensuring workload continuity during worker node upgrades.

What are worker node upgrades?

Worker node upgrades apply important security updates and patches and should be completed regularly. For more information on types of worker node upgrades, see Updating VPC worker nodes and Updating Classic worker nodes in the IBM Cloud Kubernetes Service documentation.

During an upgrade, some of your worker nodes may become unavailable. It’s important to make sure your cluster has enough capacity to continue running your workload throughout the upgrade process. Building a pipeline to update your worker nodes without causing application downtime will allow you to easily apply worker node upgrades regularly.

For classic worker nodes

Create a Kubernetes configmap that defines the maximum number of worker nodes that can be unavailable at a time, including during an upgrade. The maximum value is specified as a percentage. You can also use labels to apply different rules to different worker nodes. For complete instructions, see Updating Classic worker nodes in the CLI with a configmap in the Kubernetes service documentation. If you choose not to create a configmap, the default maximum amount of worker nodes that become unavailable is 20%.

If you need your total number of worker nodes to remain up and running, use the ibmcloud ks worker-pool resize command to temporarily add extra worker nodes to your cluster for the duration of the upgrade process. When the upgrade is complete, use the same command to remove the additional worker nodes and return your worker pool to its previous size.

For VPC worker nodes

VPC worker nodes are replaced by removing the old worker node and provisioning a new worker node that runs at the new version. You can upgrade one or more worker nodes at the same time, but if you upgrade multiple at once, they become unavailable at the same time. To make sure you have enough capacity to run your workload during the upgrade, you can plan to either resize your worker pools to temporarily add extra worker nodes (similar to the process described for classic worker nodes) or plan to upgrade your worker nodes one by one.

Wrap up

Whether you choose to implement a configmap, resize your worker pool or upgrade components one-by-one, creating a workload continuity plan before you upgrade your worker nodes can help you create a more streamlined, efficient setup with limited downtime.

Now that you have a plan to prevent disruptions during worker node upgrades, keep an eye out for the next blog in our series, which will discuss how, when and why to implement major, minor or patch upgrades to your clusters and worker nodes.

Learn more about IBM Cloud Kubernetes Service clusters

Software Engineering Lead – IKS/ROKS/Satellite

Documentation Writer