Red Hat observability guide for OpenShift 4.17
With the release of Red Hat OpenShift 4.17, we continue to enhance our observability capabilities, which are crucial for monitoring, troubleshooting, and optimizing OpenShift clusters. This update introduces a variety of new features and integrations designed to improve the observability experience within your OpenShift environment.
Unified Cluster Observability
In September 2024, we introduced Cluster Observability Operator 0.4.0, which allows users to install specific observability components like a more flexible monitoring stack and UI plugins, offering better platform, visualization, analytics, and profiling tools. While still a technology preview, this release brings several important updates:
- Traces UI Plugin (Technology Preview): Users can now explore trace details more easily using a new Gantt chart powered by Perses. For more details, refer to the distributed tracing section and related blog post.
- Observability Signal Correlation (Technology Preview): The new feature, powered by Korrel8r, allows users to navigate correlated observability signals and Kubernetes resources, making it easier to detect the root cause of issues. Check out this blog article for more information.
Enhanced Troubleshooting Journey
The observability troubleshooting journey offers a series of key analytics features aimed at helping users resolve issues faster. The latest Cluster Observability Operator release includes a technology preview of the signal correlation tool, which enhances the troubleshooting panel in the OpenShift web console. This interactive panel allows users to focus on specific signals and navigate directly to the relevant UI for more detailed information, such as pods, deployments, or metrics.
Korrel8r’s backend correlates various observability signals across data stores, such as metrics, logs, and alerts. Improvements include:
- The ability to focus on a specific signal in the troubleshooting panel.
- A new entry point for the panel through the OpenShift web console’s application launcher.
- Increased visibility into the queries Korrel8r makes, along with customizable investigation depth.
Optimized In-Cluster Monitoring
OpenShift provides a preconfigured monitoring stack for core platform components, and we’re introducing a new feature aimed at improving metrics storage efficiency. With this enhancement, Prometheus instances in the User Workload Monitoring (UWM) stack can tolerate scrape time jitter, which can reduce storage overhead by up to 50%. This improvement helps mitigate issues that arise from small timing variations, especially in high-availability configurations. While it slightly affects data accuracy, it leads to better compression and lower storage consumption.
Advanced Logging Features
OpenShift’s logging subsystem aggregates various types of logs, including node system audit logs, application container logs, and infrastructure logs. In the latest Logging 6.1 release, we’ve introduced a technology preview of End-to-End OTLP (OpenTelemetry protocol) support. This allows logs collected by the Cluster Logging Operator to be forwarded to external OTLP-enabled endpoints or the internal Loki log store.
Logs from both the Cluster Logging Operator and the Red Hat Build of OpenTelemetry can now be stored in the same log store and visualized together in the OpenShift Observability UI, providing a unified view of logs across different systems.
Red Hat Build of OpenTelemetry
The Red Hat Build of OpenTelemetry has been enhanced with new tools to help users better understand and manage their observability data. A new dashboard in the OpenShift console provides insights into the amount of data being processed, including details on ingested vs. rejected data, signal information, and resource consumption by the OpenTelemetry collector. The build also adds new components, such as a metrics transform processor and a Prometheus remote write exporter, for greater flexibility and control.
Distributed Tracing Improvements
OpenShift now includes an embedded Gantt chart for distributed traces, powered by Perses and available through a UI plugin in the Cluster Observability Operator. This feature allows users to visualize trace spans and their relationships directly within the OpenShift console. You can also access context for individual spans, including their duration and whether any spans are problematic. This builds on previous updates like the Bubble chart and tables for tracing, providing a more comprehensive tool for observing application performance.
Other distributed tracing updates include the ability to configure temporary access to AWS S3 via AWS Security Token Service (STS) and improved TLS configuration for service annotations in Tempo.
Simplified Network Observability
Network Observability 1.6 introduces two key features for easier network monitoring:
- Network On-Demand: The new CLI tool (oc netobserv) simplifies network flow and packet visualization. It deploys a NetObserv eBPF agent and flow logs-pipeline on your cluster, allowing users to capture network flows or packets with a simple command—without needing to configure operators.
- Lightweight Network Observability Operator: This version eliminates the requirement for the Loki log aggregation system, enabling users to gain network insights without additional storage components. This lightweight setup has its own benefits and trade-offs, which we’ve outlined in the related article.
Looking Ahead for OpenShift Observability
The future of OpenShift Observability is bright. Soon, you’ll be able to perform **Application Performance Monitoring (APM)** directly from the OpenShift console. We’re also working toward a more seamless integration experience between cloud platforms, observability vendors, and technologies, ensuring that OpenShift remains the “default to open” platform.
Additionally, we’re continuing to improve analytics features and UI elements within the Cluster Observability Operator, with future releases focused on refining the integration between observability signals and simplifying the user experience.
Conclusion
Ready to explore these new observability features? Visit redhat.com/observability and the OpenShift documentation pages for more information and to get started with the latest tools for monitoring and troubleshooting your OpenShift clusters. The Red Hat Developers Observability page also provides additional resources to help you implement and enhance observability within your OpenShift environment.