Red Hat OpenShift power monitoring technology preview
Here in this blog, we will learn about red Hat openshift power monitoring technolgy preview.
We are thrilled to announce the release of Red Hat OpenShift’s power monitoring technology preview today, marking a significant turning point in our journey. We would like to express our sincere gratitude to all of the early adopters and tech enthusiasts that have enthusiastically contributed to the deployment of power monitoring and kindly provided priceless feedback.
For those who haven’t had a chance to try it out, Red Hat OpenShift power monitoring is a collection of tools that lets you keep an eye on how much power workloads in an OpenShift cluster are using. This data can be used for a number of things, like identifying the namespaces that use the most power or creating a plan to reduce energy usage.
We have included information below on how to get started if you would like to test out this preview release. Kindly refer to the Technology Preview statement for additional details regarding official support.
Red Hat OpenShift power monitoring installation
Since our goal is to deliver a cohesive experience, these installation instructions bear a strong resemblance to the previous iteration of power monitoring that was dependent on the community operator.
- Follow the instructions in the Red Hat OpenShift documentation to enable user-workload-monitoring.
- First, remove any previous iteration of kepler-operator that you may have installed via the community operators catalog in order to prevent needless conflicts.
- Go to Operators -> OperatorHub in the OpenShift 4.14 (and above) console to install the Operator. Next, look up Red Hat OpenShift power monitoring using the search box, select it, and click the “Install” button.
- Click “View Operator” and then “Create instance” under the Kepler API to launch an instance of the Kepler Custom Resource Definition after the Operator has been installed.
Completed. Two new dashboards can be accessed from the OpenShift Console under the Observe>Dashboards UI tab after Kepler is installed:
Users are strongly encouraged to read the official documentation on power monitoring found under the OpenShift docs to learn more about the subject.
The effectiveness of observation
You can now learn the following details about the cluster and its workloads by utilizing these dashboards:
- Keep track of the total amount of energy used in your cluster over the previous 24 hours, along with the number of monitored nodes and the CPU architecture that was chosen.
- View a summary of the namespaces that use the most power.
- Recognize which pods and containers are using the most electricity. Analyzing the metrics Kepler has made available under the “Observe -> Metrics” tab will help you accomplish this.
- Pro tip: Use this regular expression to query all power monitoring metrics that are available: { “kepler.+”} =~ __name__
As previously mentioned, these metrics are available thanks to the integration of Kepler with OpenShift. Kepler’s collection of metrics is highly dependent on the hardware and cluster configuration that it is running on. As of right now, Kepler can extract precise measurements from a limited range of cloud configurations, namely those built on Intel hardware that can expose ACPI (Advanced Configuration and Power Interface) and Running Average Power Limit (RAPL) in bare metal deployments.
An initial machine learning model is supplied for other configurations, and Red Hat is collaborating with the larger community to increase the precision of machine learning-based estimators. As things stand right now, you can depend on these estimates to display differences between runs of the same workload because they are consistent. It’s crucial to remember that these are estimates of the actual power usage.
We have added an additional column called “Components Source” to the Node – CPU Architecture panel under the Overview page so that users can more easily determine whether Kepler is providing metric-based or model-based values by default.
With this improvement, users can now see the source of metrics like rapl-sysfs and rapl-msr, which increases transparency. Kepler will display “estimator” as the source if hardware power consumption metrics cannot be obtained. This is where Kepler turns to the machine learning model and the previously mentioned estimators that come from it. The goal of ongoing development efforts is to improve these models’ accuracy and the range of footprints they can cover.
Examining the metrics
So, how does it appear? Let’s use an example to go deeper. We installed an HTTP/2 traffic generator and a mock after installing Kepler in accordance with the official documentation, and we also applied a small load to the system.
Let’s now examine the effect of power consumption on our OpenShift cluster. Now that we have navigated to the OpenShift console’s Observe -> Dashboards -> Power Monitoring Overview dashboard, we can see that:
- The nodes displayed in the Architecture panel, which displays rapl-sysfs in the Components Source column, are providing metrics-based results.
- The cluster hasn’t used up as much kWh because it has been idle for some time.
- Additionally displayed is a list of the namespaces that have the biggest impact on the energy bill.
What more is there? Now that we know the profiles of power and energy consumption, let’s explore the second dashboard, “Power Monitoring / Namespace.” Following the selection of our interest namespace (hermes-ns):
- First, it is evident that the PKG component is the main cause of the power consumption in watts, which steadily increases over time following a few peaks. The socket’s total energy consumption is measured in the Package (PKG) domain. It covers the use of integrated graphics, all cores, and non-core components (memory controller, last level caches). This makes sense because power consumption is a measure of energy rate, and no accumulations are discovered.
- Additionally, we can observe that energy consumption is rising with time. In this instance, the power contribution of DRAM—which gauges the energy usage of RAM connected to the integrated memory controller—seems insignificant.
By swiping down a little, we can examine the contributions for PKG and DRAM in more detail, per container. The first thing that draws our attention is the apparent cause of the initial peaks—a prior workload. It sounds intriguing, and we may have to troubleshoot it in our mock or application!
When the second deployment is prepared, we can see that the power consumption of both containers is the same.
What does this indicate to us? Does this imply that the amount of energy used by a traffic generator that uses OpenTelemetry, metrics, and traces along with intricate logic and instrumentalization is equal to that of a straightforward mock that simply prints “200 OK” to the console? Even though we adore contemporary observability, it is occasionally necessary to print to the standard output.
Go is used for the mock, and C++ is used for the traffic generator. Recent research indicates that, in some scenarios, C++ can be 2.5 times more energy efficient than Go; however, that topic is well outside the purview of this blog. Every language has its own domains in which to flourish, and we adore them all. We’re eager to see how you intend to apply power monitoring. Using your own data, you might even be able to participate in the power consumption vs programming language debate.
What comes next?
We are still dedicated to taking your suggestions into consideration, adjusting, and adding improvements. We will continue to work in partnership with the community to support the international effort to better track energy use. The future looks bright: plans could include anything from helping developers view their code in the OpenShift platform to integrating power monitoring with larger sustainability initiatives to exporting data via OpenTelemetry.