What's new in openshift monitoring 4.14, logging 5.8, distributed tracing 2.9

What’s new in OpenShift Monitoring 4.14, Logging 5.8, Distributed Tracing 2.9?

Here in this blog, we will learn about the new features in OpenShift Monitoring 4.14, Logging 5.8, Distirbuted Tracing 2.9.

Red Hat OpenShift Observability’s Five Pillars

We are happy to announce that OpenShift Monitoring 4.14, Logging 5.8, and Distributed Tracing 2.9 will soon include more Observability features. In order to turn your data into answers, Red Hat OpenShift Observability’s plan is still being implemented as our teams work on important data collection, storage, delivery, visualization, and analytics features.

Data Collection

The OpenTelemetry collector operator Technology Preview is elevated to a whole new level by the Distributed Tracing platform 2.9. It makes it possible to gather OpenTelemetry Protocol (OTLP) metrics for the first time. Additionally, this aids users in gathering metrics and traces from distant clusters via OTLP/HTTP(s). Since OpenTelemetry has been upgraded to level 4 (Deep Insights), its operator capabilities have also been increased to support upgrades, monitoring, and alerting of the OpenTelemetry collector instances themselves. States that are managed and unmanaged are now supported.

Admins Can Customize Alerting Rules

Admins will soon be able to create new alerting rules based on any platform metrics that are exposed in namespaces like default, kube, and openshift. This major improvement makes it possible to create new alerting rules that target metrics from any namespace. Moreover, administrators can now easily create new rules by copying existing ones, which makes it possible to edit any alert rules that already exist. All of this has been created in response to an obvious need: to give administrators the ability to add rules customized for their particular environments to the OpenShift Container Platform.

Improved Adjustments for Node-Exporter Gatherers (Second Phase)

We are moving the Node Exporter customizations to the next level. Options for turning on and off switches for various collectors, such as Systemd, Hwmon, Mountstats, and ksmd, will be shown to users. Maxprocs is one of the general settings for node-exporter that will be introduced along with these options.

Design Scrape Profiles (Technical Preview) in CMO

For service monitors in CMO, we are introducing the idea of optional scrape profiles in an attempt to provide more flexibility and optimization. With this innovation, administrators will have more control over how many metrics the in-cluster stack gathers. Furthermore, CMO’s improved scaling behavior will be apparent in both small and large settings. The main idea behind this change is to give administrators more control over the collected data by allowing them to remove metrics that are not necessary.

Setting Resource Caps for Every Component

We’ll be adding more features to allow users to specify resource requirements in our next release. Other necessary components like Node exporter, Kube state metrics, OpenShift state metrics, prometheus-adapter, prometheus-operator, admission webhook, and telemeter-client are being added to this feature. Currently, they can set limits for components like Prometheus, Alertmanager, Thanos Querier, and Thanos Ruler.

Give all pertinent pods access to user-customizable TopologySpreadConstraints.

Our platform has undergone a major update that enables users to configure TopologySpreadConstraints for every pod that CMO has deployed. Prometheus-adapter, Kube-state-metrics, Telemeter-client, Thanos-querier, UWM alertmanager, UWM prometheus, UWM Thanos ruler, Prometheus-operator, and Config reloader are just a few examples of the extensive list that are included in this.

Data Storage

Tempo, our brand-new distributed tracing storage, will soon achieve General Availability (GA) status. With Distributed Tracing 2.9, Tempo will get a number of improvements. Large-scale microservices architecture traces can be stored and queried using Tempo, a scalable distributed tracing storage solution. As of right now, Tech Preview allows users to use the Distributor service to ingest and store distributed traces in the following protocols: Zipkin, Jaeger gRPC, Jaeger Thrift binary, and Jaeger Thrift compact. We have endeavored to elevate the Tempo operator to level 4 (Deep Insights), much like we did with the OpenTelemetry operator.

We now support both managed and unmanaged states for the TempoStack custom resource, just as we did for the OpenTelemetry collector. We would also like to mention that we have been working in the Tempo Gateway, which offers authorization and authentication services and supports OTLP gRPC as the Query Frontend service. Deployed via the operator, the Tempo Gateway is an independent component that can be used to query Tempo traces and data ingestion. Additionally, we have improved multitenancy so that it can be utilized without the Gateway.

Loki log storage has seen numerous enhancements with Logging 5.8. Clients utilize clusters that span multiple availability zones to run the OpenShift Logging stack. Zone-aware data replication is now supported by our new Loki-based stack, which we are releasing with Logging 5.8. Data ingestion for all tenants in all availability zones will be possible with this new feature, and certain query capabilities will be guaranteed even in the event of an availability zone failure.

We are also adding Cluster Restart and Reliability hardening to Loki as of Logging 5.8. These features improve Loki’s availability and dependability because they allow it to continue functioning when clusters restart and recover on its own without requiring human assistance. Customers will be able to adjust their Affinity/Anti-Affinity rulesets for Loki, and Loki will be more cognizant of node placements to ensure that no crucial components share a node.

Data Delivery

If you were wondering where all those OTLP metrics that are collected end up, our users can now choose in Technology Preview to store them in the user-workload-monitoring via the Prometheus exporter, forward them via OTLP/HTTP(s), or forward them via OTLP/gRPC. The resourcedetection and k8sattributesprocessor processors are also included in the new version of the OpenShift distribution for the OpenTelemetry Collector included in Distributed Tracing 2.9. These processors can be used to detect resource information from the host and append or override the values in telemetry data with this information. With just a little configuration adjustment, the user now has the ability to enrich data whenever they want. The following resource attributes will be retrieved by querying the OpenShift and Kubernetes APIs: cloud.provider, cloud.platform, cloud.region, and k8s.cluster.name. You can then add these to your OpenTelemetry signals.

One of our most exciting new features in Logging 5.8 is the ability to set up multiple log forwarders. This feature enables multiple, isolated ClusterLogForwarder instances to be created in any namespace, enabling independent groups to forward their preferred logs to their preferred destinations. This feature allows users to independently manage their log forwarding configurations without interfering with those of other users.

Data Visualization

We continue to improve the overall functionality and user experience of our primary Observability visualization tool, the OpenShift Web Console, as previously mentioned in our release blogs. Not only do we want to make navigation easier, but we also want to give you, the user, more power to navigate the growing amount of data and signals and spend less time debugging individual clusters. The console team moved the Monitoring features into an optional console plugin in an attempt to improve the overall monitoring console experience. These resources will be deployed via CMO, and whenever CMO is present, the monitoring console pages will be visible. OpenShift 4.14 offers an entirely new Silences tab in the Developer perspective of the OpenShift Web Console, which is beneficial from a monitoring standpoint as well. Developers can now manage alert silences directly and expire them in bulk, thereby automatically reducing silence noise, thanks to this new feature.

Logging 5.8 offers several fantastic features. First, log-based alerts in the OpenShift Web Console’s Developer perspective will be helpful to you. In fact, those were made available in the Admin perspective with Logging 5.7. Developers can also gain from searching patterns across different namespaces. Users will be able to track down issues across various services and spend less time troubleshooting thanks to this new functionality.

We are introducing Loki dashboards in Logging 5.8 to give users a visual representation of the health and performance of their log storage. Lastly, users will be able to search logs and thus patterns across all namespaces by gaining access to the Web Console’s Developer Perspective, which will facilitate application debugging.

We are happy to announce that you will be able to take advantage of a Dev Preview feature that allows you to experience correlation for the first time right in the OpenShift Web Console with Logging 5.8. The Red Hat Observability team has been working on korrel8r, an open source project that aims to make correlation across observability signals accessible to everyone, since it was first introduced at KubeCon Europe 2023. In what ways can correlation help you? By switching quickly between observability signals, you can minimize the time spent troubleshooting individual clusters and, consequently, the time required to identify issues. The good news is that you can now quickly switch from an alert to its equivalent log and/or from a log to its equivalent metric because we’ve integrated korrel8r into our OpenShift Observability experience. From now on, all it will take is a few clicks to identify issues!