Rightsize OpenShift Applications: A Guide for Developers

Introduction:

Considering the following scenario: Your Rightsize OpenShift Applications has been running in Red Hat OpenShift for months when all of a sudden the container platform informs you that there are insufficient resources to allow it to spin up a new pod in the cluster. Or consider another circumstance in which the application has stopped completely due to heavy CPU usage. Both of these scenarios are undesirable, thus we must talk about rightsizing your application in order to resolve these problems. This article discusses the capacity management features of the OpenShift platform as well as the author’s considerations and administrative advice.

Disclaimer:

This article’s information reflects the author’s opinions and viewpoints, not necessarily Red Hat’s best practices or recommendations. Consider the information in this article to be informal advice rather than authoritative teaching.

I’d like to start by emphasizing that rightsizing your application is an art rather than a science. This topic does not have a single method, and there are many variations depending on the needs of the client and the demands of the organization. In this essay, I’ll discuss my opinions on numerous subjects and practical advice I’ve given to many of my Red Hat OpenShift clients. The strategies I’ve employed in this post may not be the most effective ones for you, but they should be thought of as conversation starters.

The Platform

We’ll want to know how resource limits are applied by the Kubernetes platform (and OpenShift, by extension) at the container and node levels before we start talking about rightsizing. Although there are other factors to take into account, we’ll only examine CPU and memory for the purposes of this rightsizing discussion.

For each pod and container, resource requests and limits can be set. Requests ensure that resources set aside for pods will be available, whilst limitations act as safety measures to secure the cluster infrastructure. Quality of Service is how Kubernetes configures the relationship between a pod’s limits and requests (QoS). The container runtime uses kernel cgroups to impose the resource limits after receiving this information from the kubelet (a resource monitoring agent) on the node.

The Kubernetes scheduler considers existing pod resource restrictions and finds a valid placement on available nodes when scheduling a new pod. In order to reserve resources for the OS and Kubernetes system components to use, OpenShift pre-configures system-reserved. The scheduler considers the leftover amount to be the node’s capacity because it is defined as allocatable. Based on the total resource requests made by all pods, the scheduler can allocate nodes’ resources according to their capacity. It should be noted that the total resource limits of all pods may exceed the node capacity; this is referred to as over-committing.

There are two situations that we want to steer clear of when regulating our node capacity. In the first case, actual memory consumption hits its limit, and the kubelet, acting on eviction signals, initiates a node-pressure eviction. The node oom-killer will react if the node runs out of memory before the kubelet can free memory, choosing which pods to kill based on an oom score adj value generated from each pod’s QoS. The applications that make up these pods are affected as a result.

There are two cases that we want to keep away of when controlling our node capacity. In the first case, actual memory usage hits its limit, and the kubelet, acting on eviction signals, initiates a node-pressure eviction. The node oom-killer will react if the node runs out of memory before the kubelet can free memory, choosing which pods to kill based on an oom score adj value generated from each pod’s QoS. The applications that make up these pods are affected as a result.

In contrast to memory, the mechanics of overcommitting on the CPU distribute CPU time as it is available throughout containers. CPU throttling is the outcome of high CPU usage, however, it does not result in node-pressure eviction or the automatic termination of pods by Kubernetes. The applications pods may still degrade, fail their liveness probe, and restart as a result of CPU depletion.

There is another situation that we want to stop. Requests at the node level are resource-guaranteed and must not exceed capacity since the Kubernetes scheduler does not oversubscribe. The surplus capacity is essentially wasted if requests are significantly and repeatedly greater than the actual resources used. While setting aside resources for peak processing times may be advantageous, the administrator should consider this against the ongoing costs of maintaining excess capacity that may not be required. It is a balancing act to configure requests depending on real usage, and application risk management should be considered as well.

The Administrator

Abstraction of infrastructure from developers, so they can concentrate on creating applications, is a significant goal of an OpenShift administrator. The management and sizing of cluster capacity is the responsibility of administrators, and OpenShift collects metrics on cluster use for command-line and web UI dashboards. Additionally, OpenShift gives administrators access to the Machine API Operator so they may manage nodes and autoscaling features on supported hypervisors in a flexible manner. Nodes can also be manually added or withdrawn. A fellow Red Hatter wrote a fantastic blog series on How Full is My Cluster if you want to learn more about controlling cluster capacity.

We’ll focus more on the interactions between administrators and developers in this blog. The running apps are a large component of the rightsizing issue, even if administrators can adjust cluster capacity themselves using OpenShift’s built-in capabilities. Different developers may write an application to address a certain issue in various ways, resulting in varying performance. There is no strategy that works for all applications because they are all different. Administrators have less control over a developer’s program, and in large organizations, it may be difficult for a single administration team to communicate with several different development teams. Therefore, the administrator’s main goal should be to implement safeguards that enable developers to appropriately resize their own apps.

Implementing Limits

Administrators can achieve this by putting LimitRanges into place, which offers developers proposed sizing restrictions for particular containers and pods. For the sake of this discussion, an example of a LimitRange is as follows. Your exact statistics will change because each cluster and application has unique business and risk requirements.

apiVersion: v1
kind: LimitRange
metadata:
  name: "resource-limits"
spec:
  limits:
  - max:
      cpu: "4"
      memory: 4Gi
    min:
      cpu: 50m
      memory: 12Mi
    type: Pod
  - default:
      cpu: "1"
      memory: 1Gi
    defaultRequest:
      cpu: 50m
      memory: 1Gi
    max:
      cpu: "4"
      memory: 4Gi
    maxLimitRequestRatio:
      cpu: "25"
      memory: "2"
    min:
      cpu: 50m
      memory: 12Mi
    type: Container

Microservice applications, as opposed to massive monolithic ones, should be created while developing on a platform that uses containers. Limits should be implemented to restrict the maximum size of pods in order to promote the growth of microservices. This maximum size may be determined by a node’s physical capabilities since it ought to fit multiple of the largest pods without issues. An illustration is a cup that contains various-sized rocks. Pebbles and sand grains can fill in the spaces left by the larger rocks if they are positioned first. The largest rock might not fit, though, if the pebbles and sand are added first, depending on the size of the cup.

Let’s proceed with the LimitRange example from earlier. The requirements of the currently running application will probably dictate the minimum pod and container size, therefore it is less important for administrators to enforce. Additionally, for simplicity, developers are urged to run a single container per pod (a notable exception is the use of sidecar containers i.e. Red Hat Service Mesh, based on Istio). Because of this, both pods and containers use the identical resource values in the example above.

For developers, the default requests and restrictions serve as suggested values. Terminating pods and workload resources (such as DeploymentConfig or BuildConfig) that don’t explicitly indicate container sizes will inherit the default values (i.e. deployer pod from a DeploymentConfig or build pod from a BuildConfig). Developers need to avoid assuming default values and instead explicitly declare resource requirements and restrictions in workload resources.

Developers should follow to the maxLimitRequestRatio for CPU and memory bursting. When a prototype application is frequently running inactive in a development environment yet demands appropriate on-demand resources when needed, a high CPU maxLimitRequestRatio is ideal. Developers might only work during regular business hours, code in their own IDE offline, sporadically test one microservice or test an entirely separate stage of a CI/CD pipeline. In contrast, you’ll notice a larger baseline utilisation if numerous end users contact the service concurrently throughout the day. The maxLimitRequestRatio may be reduced and the request limitations may even be 1:1 as a result of this being closer to your production situation. Prior to production, it is crucial to test with simulated workloads to determine the proper pod sizing because differing utilisation patterns at different stages of the pipeline can lead to different requests and restrictions.

The maxLimitRequestRatio will be used by developers as a standards for rightsizing. Developers should arrange resource requests to represent actual use since the Kubernetes scheduler bases scheduling choices on them. Then, developers will create limitations to comply to the maxLimitRequestRatio based on the risk profile of their application. In order to reduce risk and prioritise security in production, an administrator who sets maxLimitRequestRatio equal to 1 forces developers to configure requests that are equivalent to limits.

The two resources respond differently under strain, with excessive memory potentially leading to pod eviction or restarts from an Out Of Memory state, as we discussed in an earlier section of this post where we compared memory with CPU. Therefore, to avoid application pod restarts, it is preferable to define a lower maxLimitRequestRatio for RAM across environments. Memory configuration for OpenJDK pods needs to take into account additional factors. The resource constraints imposed on the former will have an impact on the latter even though the JVM heap inside the container and pod is unaware of the container’s requests and limits. The OpenShift documentation offers suggestions and things to keep in mind for adjusting workloads that are specific to OpenJDK.

Implementing Quotas

In order to assist developers in appropriately sizing their application based on projected estimations, administrators can additionally create ResourceQuotas, which offer capacity-based limitations on namespaces. For the purposes of this discussion, the following is an example of a ResourceQuota (reduced to quota in this blog for brevity).

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
spec:
  hard:
    limits.memory: 24Gi
    requests.cpu: "4"
    requests.memory: 24Gi

The development team should consult the administrator when first creating an application namespace to estimate the size of their application and apply the proper quota. Based on the quantity of services, replicas, and anticipated size of pods, an administrator should project application size. The administrator may use the “t-shirt size” technique as a starting point to simplify administering several namespaces, with small, medium, and big applications receiving matching predefined quotas.

Applications are promoted via several CI/CD pipeline stages, each in a unique namespace with a custom quota. Applications should configure minimum sized pods and one pod replica per service in development and testing namespaces where performance and high availability are unimportant. On the other hand, larger pods and at least two pod replicas per service should be deployed in a production namespace to manage increased volume and provide high availability. Prior to the release of a production version, developers can choose the right production pod sizes, replica counts, and quotas through stress and performance testing with simulated workloads in the CI/CD pipeline.

The usage pattern, peak traffic, and, if any, configured pod or node autoscalers for the application should all be taken into account when allocating quota for future growth. Additional quota might be allotted, for instance, in a development namespace that is rapidly adding new microservices, in a namespace for performance testing to establish the right production pod sizes, or in a namespace for production that uses pod autoscalers to adapt to peak load. An administrator should balance the danger to the infrastructure capacity and offer enough quota overhead to account for these and other possible eventualities.

Quotas will likely change over time, which administrators and developers should anticipate. By analysing each service and lowering pod requests or limitations to match real use, developers can reclaim quota without the intervention of an administrator. This blog post explains how developers may use the metrics provided by the OpenShift the interface to ascertain the pod restrictions. The OpenShift web console gives developers metrics of the actual CPU and memory consumption for deployments and operating pods. Developers should contact the administrator if they have followed these instructions but still need more quota. When a developer requests quota on a regular basis, administrators should use this as a chance to compare actual consumption to earlier forecasted estimates, confirm or modify quota sizing as necessary, and update fresh projected estimates.

Some secondary factors for setting quota sizes will be discussed in the remaining paragraphs of this section. To effectively use both, the ratio of CPU and memory quotas should take node capacity into account. An AWS EC2 instance of type m5.2xlarge, for instance, has 8 vCPUs and 32 GiB of RAM. By allocating application quota in ratios of 1 vCPU for every 4 GB RAM (while disregarding the node’s system-reserved), a cluster made up of m5.2xlarge nodes can effectively utilise both CPU and memory. A different node size may be taken into consideration if application workloads (i.e., CPU or memory heavy) do not match node sizes.

For applications that are deployed across many namespaces, administrators may choose to use multi-project quotas, sometimes referred to as ClusterResourceQuotas. Here are some situations where using this strategy might be suitable. Perhaps an application’s components are logically segregated and spread across various namespaces. A CI/CD pipeline may assign a fixed capacity to an application team based on the hardware that was bought for use in different development stages. Another possibility is that a development team using feature branches quickly generates and destroys namespaces and needs a pool of resources to draw from.

Administrators have disagreed over when to impose CPU restrictions for quotas, therefore here we’ll offer points to keep in mind rather than official advice. This informative article includes a section on resource constraints and compressibility. As previously discussed, CPU starvation of a pod causes throttling but does not always result in pod termination. If an administrator wishes to overcommit and use every CPU core on a node, CPU restrictions for quota shouldn’t be specified. The opposite is also true: CPU quota restrictions should be set to minimise overcommitting and the risk to application performance; this may be a cost- and business-related choice rather than a technical one.

Finally, there are a few cases in which implementing quotas is not recommended. By using quotas, the administrator can have some influence over the capacity planning of specially created apps. OpenShift infrastructure projects shouldn’t use quotas since they need predefined amounts of resources that are tried and supported by the Red Hat provider. Quotas shouldn’t be used with commercial off-the-shelf (COTS) products offered by independent suppliers for identical reasons.

Summary

In this blog post, we discussed how the Kubernetes platform guards the infrastructure by imposing resource limitations. We also discussed rightsizing issues when applying the restrictions of limits and quotas to application namespaces. Rightsizing is an art, not a science, as was said in the beginning of this text. I hope that this blog has presented many of the considerations for thinking that an administrator must navigate, such as the appetite for risk of each application and the capacity of the OpenShift cluster.