April 17, 2024
What a well optimized Kubernetes looks like
5 min read
Kubernetes. Released in 2015, today it has become the de facto for the modern compute stack. The scale and rise of its adoption has also led to another challenge; it is one of the largest cost centers in tech enabled product and business organizations. Given the architecture and the nature of platform, Kubernetes (or K8s) can be a very effective and a cost efficient tool if utilized well. However, in order to achieve this, platform teams have to ensure they are making full use of the elasticity provided by K8s. They would also need to invest in deeply understanding the cloud enablers for right compute choices. Finally, teams need to build a system of continuous upkeep and review of the cluster configuration to accommodate changing workloads and the fast paced development.
The integral leg in enabling this is having a well optimized Kubernetes cluster. The key focus areas for a well optimized K8s cluster are :
The right instrumentation
Node capacity provisioning plans
Autoscaling strategy
Workload configuration
Culture of Collaborative cost optimization
In this blog, we'll dive into each of these facets to explore what it takes to run a lean, efficient and optimized Kubernetes cluster.
Instrumenting your cluster
A K8s cluster is a complex orchestration of a vast multitude of components. Having the right instrumentation strategy of the cluster is essential, not just for optimization, but for operability and reliability as well.
The good news however, is that the Kubernetes ecosystem is rich with various plug and play tools that provide teams with an added advantage during the kick off of the instrumentation process. While advanced instrumentation is a broad and complex topic (one that merits a standalone blog of its own), there are some simple ways to bootstrap cluster instrumentation that can be prescribed to teams.
Kube State Metrics (KSM) is the default standard to extract metrics from your cluster. KSM gathers metrics across all objects within the cluster, including nodes, workloads, containers, storage, services, namespaces and more. Piping metrics from KSM into Prometheus and writing them into a metrics store is an optimal starting point in the process.
Finally, comes the need to visualize the latitude of these metrics using an observability platform of your choice such as Grafana. It is these metrics that form the starting point for any optimization exercise that you might be looking to commence in your cluster.
How to choose your nodes
Nodes constitute the majority of the cluster costs. You can create the right provisioning strategy for your nodes by using the right data points and combining it with a better understanding of the workload profile (both present and future).
A couple of important variables to consider are:
VM Instance Types - Among the modern cloud providers, there is a decent amount of arbitrage available within various compute classes they offer. Choices between Intel AMD, Graviton, or similar compute classes can have an effect on your cluster bills.
Workload fault tolerance - By Marrying the elasticity of the cloud with the elasticity of Kubernetes, it is possible to achieve a significant level of optimization. This is especially true for more fault tolerant workloads. Choosing Spot instanced over On-demand ones can help scale your cluster for sporadic workloads in the most cot efficient way possible.
Base compute requirements- Understanding the minimum compute requirements of your cluster can help take advantage of the commitment based discounts offered by cloud providers. Reserved Instances (RIs) or Compute Savings Plans (CSP) or Committed Used Discounts can lower cloud bills by 20-30%, if not higher
Compute vs memory optimized instances- A deeper look into compute and memory requirements proportionally at a cluster level can help choose the right specifications for your nodes. This can also help plan your workloads and their node affinities more efficiently.
Autoscaling strategy
In principle, the continuous optimization of a cluster requires three different automations.
Cluster autoscaler - (Karpenter) - Adjusts the cluster size based on workload demands.
Pod Autoscaler - (HPA) - Scales pods according to workload requirements.
Bin Packing - Efficiently utilizes resources by packing pods onto nodes.
An optimized Kuberbetes cluster uses various signals to autoscale pods on the basis of workloads. If needed it autoscales the cluster itself by provisioning or de-provisioning nodes as required by the workload. (watch out for Amnic's dedicated blog on each of these topics)
Workload Configurations
Resource requirement configurations within workloads serve as the foundation for provisioning and autoscaling strategies. Platform teams need to align themselves to a broader objective, the goal of which is to find the optimal resource request configurations that balance availability and affordability.
Making this decision requires cross functional collaboration especially since it can dramatically change the cost profile of your clusters.
It is important to start with the right configuration for your workloads. However, more importantly, finding the best configuration can only happen through a continuous process of trial and error. Right algorithms for this configuration can have a huge impact your cluster optimization. By using the past, present and forecasted resource requirements of your workloads, teams should be able to find the configuration that works best.
This process involves a significant amount of guesswork and estimation, of which accuracy is often let to slide. To provider greater visibility and eliminate the vagueness in estimating configurations, platforms like Amnic are able to support teams with insights backed by data. These platforms can analyze the usage patterns and resource requirements in a fast and efficient manner. This allows teams to simplify the process of finding the right configuration and experiment more safely with different heuristics algorithms.
Culture of Collaborative Cost Optimization
An optimized cluster is the outcome of a culture of continuous optimization between engineering and platform teams.
The right visualizations and granularity of data are essential to drive this culture and build cost ownership, especially in the hands of cost custodians. This is where using a cost observability platform that empowers your teams is an essential part of running an optimized Kubernetes cluster.
Such tools offer the ability to view costs at its most granular level, providing a single pane of glass view across the cloud infrastructure. Today, these tools are accomplished in providing drill downs into specific cost elements, attributing costs to the right cost centers, and enabling a comprehensive understanding of the true costing of products, customers, and features. Additionally, they facilitate team reporting, allowing for collaborative analysis and decision-making based on unit economics.
Curious about which tool can help you accomplish this? Amnic is a Kubernetes and cloud cost observability platform that helps teams understand their cloud costs at the most granular level. You can get started by visiting app.amnic.com/signup .