Jump to: Incomplete Features | Incomplete Epics | Other Complete | Other Incomplete |
Note: this page shows the Feature-Based Change Log for a release
When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release
Goal:
Graduate to GA (full support) Gateway API with Istio to unify the management of cluster ingress with a common, open, expressive, and extensible API.
Description:
Gateway API is the evolution of upstream Kubernetes Ingress APIs. The upstream project is part of Kubernetes, working under SIG-NETWORK. OpenShift is contributing to the development, building a leadership position, and preparing OpenShift to support Gateway API, with Istio as our supported implementation.
The plug-able nature of the implementation of Gateway API enables support for additional and optional 3rd-party Ingress technologies.
Gateway API is the next generation of the Ingress API in upstream Kubernetes.
OpenShift Service Mesh (OSSM) and several other offering of ours like Kuadrant, Microshift and OpenShift AI all have critical dependencies on Gateway API's API resources. However, even though Gateway API is an official Kubernetes project its API resources are not available in the core API (like Ingress) and instead require the installation of Custom Resource Definitions (CRDs).
OCP will be fully in charge of managing the life-cycle of the Gateway API CRDs going forward. This will make the Gateway API a "core-like" API on OCP. If the CRDs are already present on a cluster when it upgrades to the new version where they are managed, the cluster admin is responsible for the safety of existing Gateway API implementations. The Cluster Ingress Operator (CIO) enacts a process called "CRD Management Succession" to ensure the transfer of control occurs safely, which includes multiple pre-upgrade checks and CIO startup checks.
The organization as a whole needs to be made aware of this as new projects will continue to pop up with Gateway API support over the years. This includes (but is not limited to)
Importantly our cluster infrastructure work with Cluster API (CAPI) is working through similar dilemmas for CAPI CRDs, and so we need to make sure to work directly with them as they've already broken a lot of ground here. Here are the relevant docs with the work they've done so far:
On OCP 4.19 onward we will ensure the Gateway API CRDs are present a specific version with its own feature gate which will default to true. If we can not ensure the CRDs are present at the expected version we will mark the cluster degraded.
See the description of NE-1898.
The Cluster Ingress Operator (CIO) currently provides some logic around handling the Gateway API CRDs, and a chunk of this work is simply updating that. The CIO should:
See some of the current CRD management logic here.
Problem: ** As an administrator, I would like to securely expose cluster resources to remote clients and services while providing a self-service experience to application developers.
GA: A feature is implemented as GA so that developers can issue an update to the Tech Preview MVP and:
Dependencies (internal and external)
GWAPI and istio logs are not in the must-gather reports.
Add Gateway API resources and possibly OSSM resources to the operator's relatedObjects field.
Use cases:
This Epic is a place holder for stories regarding e2e and unit tests that are missing for old features and to determine whether OSSM 3.x TP2 bugs affect us before they are fixed in GA. There is already one epic for DNS and test cases should be added for any new features in the release.
Write and run test cases that are currently missing.
and https://github.com/openshift/api?tab=readme-ov-file#defining-featuregate-e2e-tests
the tests would be covered in Origin are:
Add a test to cluster-ingress-operator's E2E tests to verify that Istio is configured not to allow manual deployment.
Cgroup V1 was deprecated in OCP 4.16 . RHEL will be removing support for cgroup v1 in RHEL 10 so we will remove it in OCP 4.19
Goal
For clusters running cgroup v1 on OpenShift 4.18 or earlier, upgrading to OpenShift 4.19 will be blocked. To proceed with the upgrade, clusters on OpenShift 4.18 must first switch from cgroup v1 to cgroup v2. Once this transition is complete, the cluster upgrade to OpenShift 4.19 can be performed.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Remove the CgroupModeV1 config option from the openshift/api repository
Ref: https://github.com/openshift/api/blob/master/config/v1/types_node.go#L84
Add a CRD validation check on the CgroupMode field of the nodes.config spec to avoid the update to "v1" and only allow the "v2" and "" as valid values.
Latest update:
Raise a PR with the updated enhancement proposal to handle the removal of cgroupsv1
Enable OpenShift to be deployed on Confidential VMs on GCP using Intel TDX technology
Users deploying OpenShift on GCP can choose to deploy Confidential VMs using Intel TDX technology to rely on confidential computing to secure the data in use
As a user, I can choose OpenShift Nodes to be deployed with the Confidential VM capability on GCP using Intel TDX technology at install time
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
This is a piece of a higher-level effort to secure data in use with OpenShift on every platform
Documentation on how to use this new option must be added as usual
As LUKS encryption is required for certain customer environments e.g. being PCI compliant and the current implementation with Network Based LUKS encryption are a) complex and b) not reliable and secure we need to support our Customers with an way to have the Root Device encrypted on a secure way with IBM HW based HSM to secure the LUKS Key. This is a kind of TPM approach to store the luks key but fence it from the user.
Hardware based LUKS encryption requires injection of the read of secure keys in clevis during boot time.
Provide hardware based root volume encryption
Provide hardware based root volume encryption with LUKS
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | Y |
Classic (standalone cluster) | Y |
Hosted control planes | Y |
Multi node, Compact (three node), or Single node (SNO), or all | Y |
Connected / Restricted Network | Y |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | IBM Z |
Operator compatibility | n/a |
Backport needed (list applicable versions) | n/a |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | n/a |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Once ignition spec 3.5 stablizes, we should switch to using spec 3.5 as the default in the MCO to enable additional features in RHCOS.
(example: https://issues.redhat.com/browse/MULTIARCH-3776 needs 3.5)
This story covers all the needed work from the code side that needs to be done to support the 3.5 ignition spec.
To support 3.5 we need to, from a high level perspective:
Done When:
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Add a new topology metrics in https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/pkg/operator/configmetrics/configmetrics.go#L16-L44
This was discussed and recommended in the OCP Arch Call
As a developer of TNF, I need:
Acceptance Criteria
As a developer of 2NO, I need:
Acceptance Criteria
In order to add the TNF support to the authenticator operator it would be best to do the dependency update in a separate PR to avoid behavior differences between dep changes and TNF change.
As a developer of TNF, I need:
Acceptance Criteria
As a developer of 2NO, I need:
Acceptance Criteria
As a developer of 2NO, I need:
Acceptance Criteria
This feature aims to comprehensively refactor and standardize various components across HCP, ensuring consistency, maintainability, and reliability. The overarching goal to increase customer satisfaction by increasing speed to market and saving engineering budget by reducing incidents/bugs. This will be achieved by reducing technical debt, improving code quality, and simplifying the developer experience across multiple areas, including CLI consistency, NodePool upgrade mechanisms, networking flows, and more. By addressing these areas holistically, the project aims to create a more sustainable and scalable codebase that is easier to maintain and extend.
Over time, the HyperShift project has grown organically, leading to areas of redundancy, inconsistency, and technical debt. This comprehensive refactor and standardization effort is a response to these challenges, aiming to improve the project's overall health and sustainability. By addressing multiple components in a coordinated way, the goal is to set a solid foundation for future growth and development.
Ensure all relevant project documentation is updated to reflect the refactored components, new abstractions, and standardized workflows.
This overarching feature is designed to unify and streamline the HCP project, delivering a more consistent, maintainable, and reliable platform for developers, operators, and users.
Goal
Refactor and modularize controllers and other components to improve maintainability, scalability, and ease of use.
As a (user persona), I want to be able to:
https://issues.redhat.com//browse/HOSTEDCP-1801 introduced a new abstraction to be used by ControlPlane components. We need to refactor every component to use this abstraction.
Description of criteria:
All ControlPlane Components are refactored:
Example PR to refactor cloud-credential-operator : https://github.com/openshift/hypershift/pull/5203
docs: https://github.com/openshift/hypershift/blob/main/support/controlplane-component/README.md
Move bash kas bootstrapping into testable binary
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
Upgrade the OCP console to Pattern Fly 6.
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
The core OCP Console should be upgraded to PF 6 and the Dynamic Plugin Framework should add support for PF6 and deprecate PF4.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Console, Dynamic Plugin Framework, Dynamic Plugin Template, and Examples all should be upgraded to PF6 and all PF4 code should be removed.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
As a company we have all agreed to getting our products to look and feel the same. The current level is PF6.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Some of the PatternFly releases in https://github.com/openshift/console/pull/14621 are prereleases. Once final releases are available (v.6.2.0 is scheduled for the end of March), we should update to them.
Also update https://github.com/openshift/console/blob/900c19673f6f3cebc1b57b6a0a9cadd1573950d9/dynamic-demo-plugin/package.json#L21-L24 to the same versions.
Most of the *-theme-dark classes defined in the console code base were for PF5 and are likely unnecessary in PF6 (although the version number was updated). We should evaluate each class and determine if it is still necessary. If it is not, we should remove it.
Console is adopting PF6 and removing the PF4 support. It creates lots of UI issues in the Developer Console which we need to support to fix.
Fix all the UI issues in the ODC related to PF6 upgrade
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
<your text here>
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
<your text here>
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Hypershift currently allows NodePools to be up to three minor versions behind the HostedCluster control plane (y-3), by virtue of refererencing the floating upstream docs (which changed from n-2 to n-3), but only tests configurations up to two minor versions behind at best (y-2).
This feature will align the allowed NodePool skew with the tested and supported versions to improve stability and prevent users from deploying unsupported configurations.
Hypershift currently allows for NodePool minor version skew based on the upstream Kubernetes skew policy. However, our testing capacity only allows us to fully validate up to y-2 skew at best. This mismatch creates a potential risk for users deploying unsupported configurations.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | both |
Classic (standalone cluster) | |
Hosted control planes | yes |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Customers who have deployed NodePools with a skew greater than y-2 may need to upgrade their NodePools before upgrading the HostedCluster control plane in the future.
The HCP documentation ] on NodePool versioning and upgrading needs to be updated to reflect the new supported skew limits.
Impacts ROSA/ARO HCP
The goal of this feature is to align the allowed NodePool minor version skew with the tested and supported versions (y-2) to improve stability and prevent users from deploying unsupported configurations. This feature ensures that only configurations that have been fully validated and tested are deployed, reducing the risk of instability or issues with unsupported version skews.
This is important because the current mismatch between the allowed NodePool skew (which allows up to y-3) and the actual tested configurations (which only support up to y-2) creates a risk for users deploying unsupported configurations. These unsupported configurations could lead to untested or unstable deployments, causing potential issues or failures within the cluster. By enforcing a stricter version skew policy, this change will:
Main Success Scenario:
Alternative Flow Scenario:
What items must be delivered by other teams/groups to enable delivery of this epic.
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
As a developer, I want to be able to:
Description of criteria:
Goal
Support for more than one disk in machineset API for vSphere provider
Feature description
Customers using vSphere should be able to create machines with more than one disk. This is already available for other cloud and on-prem providers.
Why do customers need this?
To have Proper disk layout that better address their needs. Some examples are using the local storage operator or ODF.
Affected packages or components
RHCOS, Machine API, Cluster Infrastructure, CAPV.
User Story:
As an OpenShift administrator, I need to be able to configure my OpenShift cluster to have additional disks on each vSphere VM so that I can use the new data disks for various OS needs.
Description:
This goal of this epic is to be able to allow the cluster administrator to install and configure after install new machines with additional disks attached to each virtual machine for various OS needs.
Required:
Nice to Have:
Acceptance Criteria:
Notes:
USER STORY:
As an OpenShift administrator, I want to be able to configure thin provisioned for my new data disks so that adjust the behavior that may be different than my default storage policy.
DESCRIPTION:
Currently, we have the machine api changes forcing the thin provisioned flag to true. We need to add a flag to allow admin to configure this. The default behavior will be to not set the flag and use default storage policy.
ACCEPTANCE CRITERIA:
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
To align with the 4.19 release, dependencies need to be updated to 1.30. This should be done by rebasing/updating as appropriate for the repository
We need to maintain our dependencies across all the libraries we use in order to stay in compliance.
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
<your text here>
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.
Add unit tests for the Timestamp component to prevent regressions like https://issues.redhat.com/browse/OCPBUGS-51202
AC:
As a user, I do not want to load polyfills for browsers that OCP console no longer supports.
Note: This feature will be a TechPreview in 4.16 since the newly introduced API must graduate to v1.
Overarching Goal
Customers should be able to update and boot a cluster without a container registry in disconnected environments. This feature is for Baremetal disconnected cluster.
Background
This epic describes the work required to GA a minimal viable version of the Machine Config Node feature to enable the subsequent GAing of the Pinned Image Sets feature. The GAing of status reporting as well as any further enhancements for the Machine Config Node feature will be tracked in MCO-1506.
Related Items:
Done when:
The first step in GAing the MCN API is finalizing the v1alpha1 API. This will allow for testing of the final API design before the API is graduated to V1. Since there are a fair amount of changes likely to be made for the MCN API, making our changes in v1alpha1 first seems to follow the API team’s preference of V1 API graduations only having minor changes.
Done when:
In order for Managed OpenShift Hosted Control Planes to run as part of the Azure Redhat OpenShift, it is necessary to support the new AKS design for secrets/identities.
Hosted Cluster components use the secrets/identities provided/referenced in the Hosted Cluster resources creation.
All OpenShift Hosted Cluster components running with the appropriate managed or workload identity.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | Managed |
Classic (standalone cluster) | No |
Hosted control planes | Yes |
Multi node, Compact (three node), or Single node (SNO), or all | All supported ARO/HCP topologies |
Connected / Restricted Network | All supported ARO/HCP topologies |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | All supported ARO/HCP topologies |
Operator compatibility | All core operators |
Backport needed (list applicable versions) | OCP 4.18.z |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | No |
Other (please specify) |
This is a follow-up to OCPSTRAT-979 required by an AKS sweeping change to how identities need to be handled.
Should only affect ARO/HCP documentation rather than Hosted Control Planes documentation.
Does not affect ROSA or any of the supported self-managed Hosted Control Planes platforms
As an ARO HCP user, I want to be able to:
so that
Description of criteria:
Updating any external OpenShift components that run in the HCP
This does not require a design proposal.
This does not require a feature gate.
As an ARO HCP user, I want to be able to:
so that
Description of criteria:
Updating any HyperShift-only components that run in the HCP
This does not require a design proposal.
This does not require a feature gate.
The installation process for the OpenShift Virtualization Engine (OVE) has been identified as a critical area for improvement to address customer concerns regarding its complexity compared to competitors like VMware, Nutanix, and Proxmox. Customers often struggle with disconnected environments, operator configuration, and managing external dependencies, making the initial deployment challenging and time-consuming.
To resolve these issues, the goal is to deliver a streamlined, opinionated installation workflow that leverages existing tools like the Agent-Based Installer, the Assisted Installer, and the OpenShift Appliance (all sharing the same underlying technology) while pre-configuring essential operators and minimizing dependencies, especially the need for an image registry before installation.
By focusing on enterprise customers, particularly VMware administrators working in isolated networks, this effort aims to provide a user-friendly, UI-based installation experience that simplifies cluster setup and ensures quick time-to-value.
VMware administrators transitioning to OpenShift Virtualization in isolated/disconnected environments.
The first area of focus is a disconnected environment. We target these environments with the Agent-Based Installer.
The current docs for installing on disconnected environment are very long and hard to follow.
The image registry is required in disconnected installations before the installation process can start. We must simplify this point so that users can start the installation with one image, without having to explicitly install one.
This isn't a new requirement and in the past we've analyzed options for this and even did a POC, we could revisit this point, see Deploy OpenShift without external registry in disconnected environments.
The OpenShift Appliance can in fact be installed without a registry.
Additionally, we started work in this direction AGENT-262 (Strategy to complete installations where there isn't a pre-existing registry).
We also had the field (Brandon Jozsa) doing a POC which was promising:
https://gist.github.com/v1k0d3n/cbadfb78d45498b79428f5632853112a
The type of users coming from VMware vSphere expect a UI. They aren't used to writing YAML files and this has been identified as a blocker for some of them. We must provide a simple UI to stand up a cluster.
https://miro.com/app/board/uXjVLja4xXQ=/
Recently the appliance allowed using an internal registry (see https://github.com/openshift/appliance/pull/349).
Modify the script to use that (instead of the external one), and test the installation workflow.
Currently the builder script embeds the agent-setup-tui.service in the ignition files, but the script directly in the ISO. For consistency, also the script should be placed inside the ISO ignition
Openshift Virtualization team marked (see https://issues.redhat.com/browse/OCPSTRAT-1874 and https://issues.redhat.com/browse/RFE-6327)
Node Health Check Operator
Fence Agents Remediation Operator
Node Maintenance Operator
Migration Toolkit for Virtualization
Kube Descheduler Operator
NMState Operator
Self Node Remediation Operator
as the MVP operators for the virtualization bundle.
MTV and k8s nmstate are already supported by AI.
The operators can be installed in AI and part of the virtualization bundle
yes
Enable OpenShift to be deployed on Confidential VMs on GCP using AMD SEV-SNP technology
Users deploying OpenShift on GCP can choose to deploy Confidential VMs using AMD SEV-SNP technology to rely on confidential computing to secure the data in use
As a user, I can choose OpenShift Nodes to be deployed with the Confidential VM capability on GCP using AMD SEV-SNP technology at install time
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
This is a piece of a higher-level effort to secure data in use with OpenShift on every platform
Documentation on how to use this new option must be added as usual
Goal
Add Nutanix platform integration support to the Agent-based Installer
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Implement Migration core for MAPI to CAPI for AWS
When customers use CAPI, There must be no negative effect to switching over to using CAPI . Seamless migration of Machine resources. the fields in MAPI/CAPI should reconcile from both CRDs.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
When converting CAPI2MAPI, we convert CAPA's `AdditionalSecurityGroups` into the security groups for MAPA. While this looks correct, there are also fields like `SecurityGroupOverrides` which when present currently, would cause an error.
We need to understand how security groups work today in MAPA, compare that to CAPA, and be certain that we are correctly handling the conversion here.
Is CAPA doing anything else under the hood? Is it currently applying extra security groups that are standard that would otherwise cause issues?
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Presently, the mapi2capi and capi2mapi code cannot handle translations of owner references.
We need to be able to map CAPI/MAPI machines to their correct CAPI/MAPI MachineSet/CPMS and have the owner references correctly set.
This requires identifying the correct owner and determining the correct UID to set.
This will likely mean extending the conversion utils to be able to make API calls to identify the correct owners.
Owner references for non-MachineSet types should still cause an error.
To enable CAPI MachineSets to still mirror MAPI MachineSets accurately, and to enable MAPI MachineSets to be implemented by CAPI MachineSets in the future, we need to implement a way to convert CAPI Machines back into MAPI Machines.
These steps assume that the CAPI Machine is authoritative, or, that there is no MAPI Machines.
PERSONAS:
The following personas are borrowed from Hypershift docs used in the user stories below.
USER STORY:
ACCEPTANCE CRITERIA:
What is "done", and how do we measure it? You might need to duplicate this a few times.
Given a
When b
Then c
CUSTOMER EXPERIENCE:
Only fill this out for Product Management / customer-driven work. Otherwise, delete it.
BREADCRUMBS:
Where can SREs look for additional information? Mark with "N/A" if these items do not exist yet so Functional Teams know they need to create them.
NOTES:
If there's anything else to add.
As a hypershift CLI user, I want to be able to disable the image registry capability when creating hosted clusters via `hypershift create cluster`.
Mark with an X when done; strikethrough for non-applicable items. All items
must be considered before closing this issue.
[ ] Ensure all pull request (PR) checks, including ci & e2e, are passing
[ ] Document manual test steps and results
[ ] Manual test steps executed by someone other than the primary implementer or a test artifact such as a recording are attached
[ ] All PRs are merged
[ ] Ensure necessary actions to take during this change's release are communicated and documented
[ ] Troubleshooting Guides (TSGs), ADRs, or other documents are updated as necessary
TBD
GROOMING CHECKLIST:
You can find out more information about ARO workflow, including roles and responsibilities here. Some items in the list should be left for Team Leads (TL) and Region Leads (RL) to perform. Otherwise, all other fields should be populated.
USER STORY:
What are we attempting to achieve? You might need to duplicate this a few times.
As a/an a
I want b
So that c
ACCEPTANCE CRITERIA:
What is "done", and how do we measure it? You might need to duplicate this a few times.
Given a
When b
Then c
CUSTOMER EXPERIENCE:
Only fill this out for Product Management / customer-driven work. Otherwise, delete it.
BREADCRUMBS:
Where can SREs look for additional information? Mark with "N/A" if these items do not exist yet so Functional Teams know they need to create them.
NOTES:
If there's anything else to add.
This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were not completed when this image was assembled
All images using cachito on Brew should also work with cachi2 on Konflux. https://issues.redhat.com/browse/ART-11902 outlines the ART automation that will support these changes, but ARTists can start testing by adding the annotations to the PipelineRun directly.
If an image build fails on konflux that requires changes to the Dockerfile, an OCPBUGS ticket should be raised. The process doc (which is attached to this ticket) should also be attached to the bugs ticket. ARTists will work with the image owners to hash out any issues until the image builds successful on both Konflux and Brew
CAPI Agent Control Plane Provider and CAPI Bootstrap Provider will provide an easy way to install clusters through CAPI.
Those providers will not be generic OpenShift providers, as they are geared towards Bare Metal. Those providers will leverage Assisted Installer ZTP flow, and will benefit BM users by avoiding to provision a bootstrap node (as opposed to regular OpenShift install where the bootstrap node is required, but it will comply better to CAPI interface)
milestones:
Yes
we should leverage onprem data collection system and identify when a cluster has been installed with the CAPI provider
Deprecate high_availability_mode as it was replaced by control_plane_count
high_availability_mode is no longer used in our code
Yes
When an Assisted Service SaaS user performs the creation of a new OpenShift cluster, nmstate operator should be installed bundled together with other operators when virtualization capability is requested. Operator bundling is out of the scope of this epic, it is only for enabling nmstate operator installation in the assisted installer, without any UI support.
nmstate operator is one of the enablers of virtualization platforms
Assisted installer should not define any resource requirements for the operators unless specifically stated in their official installation instructions.
Following our migration to konflux in MGMT-18343, we will use this epic for future tasks related to konflux.
More and more tasks are becoming mandatory in Konflux pipeline
Konflux used to have an automation that opened PR to add those tasks. It seems it's not triggered anymore, so we ave to add those tasks manually.
As of today, it raises a warning in the IntegrationTest pipeline that is very likely not seen by anyone. (The build pipeline is not raising any warning)
In the short term we have to add those tasks to all pipelines (maybe only the product one ? I haven't checked)
In the long term, if we can't have the konflux PR back, we should have some automation that detects the warning and inform us we have to update the pipelines
Slack thread: https://redhat-internal.slack.com/archives/C04PZ7H0VA8/p1741091688194839
PR example: https://github.com/openshift/assisted-service/pull/7358
Add support for syncing CA bundle to the credentials generated by Cloud Credential Operator.
It it generally necessary to provide a CA file to OpenStack clients in order to communicate with a cloud that uses self-signed certificates. The cloud-credential-operator syncs clouds.yaml files to various namespaces so that services running in those namespaces are able to communicate with the cloud, but it does not sync the CA file. Instead, this must be managed using another mechanism. This has led to some odd situations, such as the Cinder CSI driver operator inspecting cloud-provider configuration to pull out this file.
We should start syncing not only the clouds.yaml file but also the CA file to anyone that requests it via a CredentialsRequest. Once we've done this, we can modify other components such as the Installer, CSI Driver Operator, Hypershift, and CCM Operator to pull the CA file from the same secrets that they pull the clouds.yaml from, rather than the litany of places they currently use.
None.
None.
None.
The Installer creates the initial version of the root credential secret at kube-system / openstack-credentials, which cloud-credential-operator (CCO) will consume. Once we have support in CCO for consuming a CA cert from this root credential, we should modify the Installer to start populating the CA cert field. We should also stop adding the CA cert to the openshift-cloud-controller-manager / cloud-conf config map since the Cloud Config Operator (and CSI Drivers) will be able to start consuming the CA cert from the secret instead. This may need to be done separately depending on the order that patches land in.
This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were completed when this image was assembled
Description of problem:
Azure creates a nic in "provisioning failed" and the code is not checking the provisioning status.
Version-Release number of selected component (if applicable):
4.12
How reproducible:
100%
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
https://github.com/openshift/machine-api-provider-azure/blob/main/pkg/cloud/azure/actuators/machine/reconciler.go https://pkg.go.dev/github.com/Azure/azure-sdk-for-go@v68.0.0+incompatible/services/network/mgmt/2021-02-01/network#InterfacePropertiesFormat
Description of problem:
When debugging a node using the OpenShift Console, the logs of the <NodeName>-debug pod are not accessible from either the Console UI or the CLI. However, when debugging the node via CLI (oc debug node/<node_name>), the logs are accessible as expected.
Version-Release number of selected component (if applicable):
OpenShift Versions Tested: 4.8.14, 4.8.18, 4.9.0 ... so 4.12
How reproducible:
always
Steps to Reproduce:
1. Open OpenShift Console. 2. Navigate to Compute → Node → <node_name> → Terminal. 3. Run any command in the terminal. 4. A new <NodeName>-debug pod is created in a dynamically generated namespace (openshift-debug-node-xxx). 5. Try to access logs: Console UI: Workloads → Pod → <NodeName>-debug → Logs → Logs not visible. CLI: Run oc logs <NodeName-debug_pod> -n <openshift-debug-node-xxx> → No logs available.
Actual results:
Logs of the <NodeName>-debug pod are not available in either the Console UI or CLI when debugging via Console.
Expected results:
The <NodeName>-debug pod logs should be accessible in both the Console UI and CLI, similar to the behavior observed when debugging via oc debug node/<node_name>.
Additional info:
Debugging via CLI (oc debug node/<node_name>) creates the debug pod in the current namespace (e.g., <project_name>). Logs are accessible via: $ oc logs -n <project_name> -f <NodeName-debug_pod> Debugging via Console creates the pod in a new dynamic namespace (openshift-debug-node-xxx), and logs are not accessible. Possible Cause: Namespace issue - Debug pod is created in openshift-debug-node-xxx, which may not be configured to expose logs correctly.
Description of problem:
The OpenShift-Installer does not validate if the apiVIPs and ingressVIPs are specified when the load balancer is configured as UserManaged and fall back to the default behaviour where it picks the 5th and 7th IPs of the machine network
Version-Release number of selected component (if applicable):
4.18
How reproducible:
100%
Steps to Reproduce:
1. Create an install-config.yaml file with the following content: $ cat ocp4/install-config.yaml apiVersion: v1 baseDomain: mydomain.test compute: - name: worker platform: openstack: type: m1.xlarge replicas: 3 controlPlane: name: master platform: openstack: type: m1.xlarge replicas: 3 metadata: name: mycluster networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 machineNetwork: - cidr: 192.168.10.0/24 platform: openstack: loadBalancer: type: UserManaged 2. Run the following command to generate manifests: $ openshift-installer create manifests --dir ocp4 3. Check the generated cluster-config.yaml: $ cat ocp4/manifests/cluster-config.yaml 4.Observe the following unexpected output: platform: openstack: cloud: openstack externalDNS: null apiVIPs: - 192.168.10.5 ingressVIPs: - 192.168.10.7 loadBalancer: type: UserManaged
Actual results:
The apiVIPs and ingressVIPs fields are unexpectedly added to cluster-config.yaml.
Expected results:
The apiVIPs and ingressVIPs fields should not be automatically assigned.
Additional info:
Description of problem:
the CIS "plugin did not respond" blocked the public install
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2025-03-14-195326
How reproducible:
Always
Steps to Reproduce:
1.create public ipi cluster on IBMCloud platform 2. 3.
Actual results:
level=info msg=Creating infrastructure resources... msg=Error: Plugin did not respond ... msg=panic: runtime error: invalid memory address or nil pointer dereference msg=[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x24046dc]256level=error257level=error msg=goroutine 2090 [running]:258level=error msg=github.com/IBM-Cloud/terraform-provider-ibm/ibm/service/cis.ResourceIBMCISDnsRecordRead(0xc003573900, {0x4ed2fa0?, 0xc00380c008?})
Expected results:
create cluster succeed.
Additional info:
https://github.com/IBM-Cloud/terraform-provider-ibm/issues/6066 ibm_cis_dns_record leads to plugin crash
Description of problem:
Fix labels for allow opentelemetry allow list, currently all labels has exporter/receiver postfix on it. This is incorrect, because the name of the exporter/importer doesn't contain such postfix.
Description of problem:
The TestControllerEventuallyReconciles within the e2e-gcp-op-ocl test suite fails very often, which prevents the rest of the job from running. This causes reduced confidence in the test suite and lowers the overall quality signal for OCL.
Version-Release number of selected component (if applicable):
N/A
How reproducible:
Often.
Steps to Reproduce:
Run the e2e-gcp-op-ocl job by opening a PR. The job will eventually fail on this test.
Actual results:
The test, TestControllerEventuallyReconciles fails on a fairly consistent basis.
Expected results:
The test should pass.
Additional info:
I suspect that part of the problem is that the "success" criteria between the Build Controller and the e2e test suite are not the same. As part of the potential fix I've found, I exported the success criteria function so that it can be reused with the e2e test suite and I've also set certain hard-coded values as constants instead so that they can be adjusted from one central place.
This enabled machineset preflights by default https://github.com/kubernetes-sigs/cluster-api/pull/11228
We won't to disable this functionality in hcp because of the following reasons:
MachineSetPreflightCheckKubeadmVersionSkew
Description of problem:
platform.powervs.clusterOSImage is still required and should not be removed from the install-config
Version-Release number of selected component (if applicable):
4.19.0
Steps to Reproduce:
1. Specify OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE and try to deploy 2. The deploy does not use the override value
Actual results:
The value of platform.powervs.clusterOSImage will be ignored.
Expected results:
The deploy uses the overriden value of OS_IMAGE_OVERRIDE
Additional info:
Description of problem:
The position of the play/pause button in the events page is different when there are no events vs when there are events
Version-Release number of selected component (if applicable):
4.19.0
How reproducible:
always
Steps to Reproduce:
1. open the events page 2. observe play/pause button position shift
Actual results:
the button moves
Expected results:
no shift
Additional info:
We need to bump the Kubernetes Version. To the latest API version OCP is using.
This what was done last time:
https://github.com/openshift/cluster-samples-operator/pull/409
Find latest stable version from here: https://github.com/kubernetes/api
This is described in wiki: https://source.redhat.com/groups/public/appservices/wiki/cluster_samples_operator_release_activities
Description of problem:
/k8s/all-namespaces/volumesnapshots returns 404 Page Not Found
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2025-03-17-135359
How reproducible:
Always
Steps to Reproduce:
1. navigate to Storage -> VolumeSnapshots, make sure 'All Projects' selected 2. Click on 'Create VolumeSnapshot' button, user will be redirected to /k8s/ns/default/volumesnapshots/~new/form page and project selection will be changed to `default 3. open project selector dropdown and change project to 'All Projects' again $ oc get volumesnapshots -A No resources found
Actual results:
3. URL path will be changed to /k8s/all-namespaces/volumesnapshots and we will see error 404: Page Not Found The server doesn't have a resource type "volumesnapshots". Try refreshing the page if it was recently added.
Expected results:
3. should display volumesnapshots in all projects, volumesnapshots resources can be successfully listed/queried $ oc get volumesnapshots -A No resources found
Additional info:
HyperShift currently seems to only maintain one version at a time in status on a FeatureGate resource. For example, in a HostedControlPlane that had been installed a while back, and recently done 4.14.37 > 4.14.38 > 4.14.39, the only version in FeatureGate was 4.14.39:
$ jq -r '.status.featureGates[].version' featuregates.yaml 4.14.39
Compare that with standalone clusters, where FeatureGates status is appended with each release. For example, in this 4.18.0-rc.0 to 4.18.0-rc.1 CI run:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/release-openshift-origin-installer-e2e-aws-upgrade/1865110488958898176/artifacts/e2e-aws-upgrade/must-gather.tar | tar -xOz quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-b7fd0a8ff4df55c00e9e4e676d8c06fad2222fe83282fbbea3dad3ff9aca1ebb/cluster-scoped-resources/config.openshift.io/featuregates/cluster.yaml | yaml2json | jq -r '.status.featureGates[].version' 4.18.0-rc.1 4.18.0-rc.0
The append approach allows consumers to gracefully transition over time, as they each update from the outgoing version to the incoming version. With the current HyperShift logic, there's a race between the FeatureGate status bump and the consuming component bumps:
In this bug, I'm asking for HyperShift to adopt the standalone approach of appending to FeatureGate status instead of dropping the outgoing version, to avoid that kind of race window. At least until there's some assurance that the update to the incoming version has completely rolled out. Standalone pruning removes versions that no longer exist in ClusterVersion history. Checking a long-lived standalone cluster I have access to, I see:
$ oc get -o json featuregate cluster | jq -r '.status.featureGates[].version' 4.18.0-ec.4 4.18.0-ec.3 ... 4.14.0-ec.1 4.14.0-ec.0 $ oc get -o json featuregate cluster | jq -r '.status.featureGates[].version' | wc -l 27
so it seems like pruning is currently either non-existent, or pretty relaxed.
Seen in a 4.14.38 to 4.14.39 HostedCluster update. May or may not apply to more recent 4.y.
Unclear
Steps to Reproduce
When vB is added to FeatureGate status, vA is dropped.
If the CPO gets stuck during the transition, some management-cluster-side pods (cloud-network-config-controller, cluster-network-operator, ingress-operator, cluster-storage-operator, etc.) crash loop with logs like:
E1211 15:43:58.314619 1 simple_featuregate_reader.go:290] cluster failed with : unable to determine features: missing desired version "4.14.38" in featuregates.config.openshift.io/cluster E1211 15:43:58.635080 1 simple_featuregate_reader.go:290] cluster failed with : unable to determine features: missing desired version "4.14.38" in featuregates.config.openshift.io/cluster
vB is added to FeatureGate status early in the update, and vA is preserved through much of the update, and only removed when it seems like there might not be any more consumers (when a version is dropped from ClusterVersion history, if you want to match the current standalone handling on this).
None yet.
Description of problem:
"Export as CSV" on "Observe"->"Alerting" page is not marked for i18n.
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2024-12-12-133926
How reproducible:
Always
Steps to Reproduce:
1.Check "Export as CSV" on "Observe"->"Alerting" page. 2. 3.
Actual results:
1. It's not marked for i18n
Expected results:
1. Should marked for i18n
Additional info:
"Export as CSV" also need i18n for each languages.
Description of problem:
when trying to use the ImageSetConfig as described below i see that oc-mirror gets killed abruptly. kind: ImageSetConfiguration apiVersion: mirror.openshift.io/v2alpha1 mirror: platform: channels: - name: stable-4.16 # Version of OpenShift to be mirrored minVersion: 4.16.30 # Minimum version of OpenShift to be mirrored maxVersion: 4.16.30 # Maximum version of OpenShift to be mirrored shortestPath: true type: ocp graph: true operators: - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.16 full: false - catalog: registry.redhat.io/redhat/certified-operator-index:v4.16 full: false - catalog: registry.redhat.io/redhat/community-operator-index:v4.16 full: false helm: {}
Version-Release number of selected component (if applicable):
4.18
How reproducible:
Always
Steps to Reproduce:
1. Use the ImageSetConfig as above 2. Run command `oc-mirror -c /tmp/config.yaml file://test --v2` 3.
Actual results:
oc-mirror command gets killed even after having about 24GB of Ram and 12 core cpu, for some customers even after having 64GB of ram it looks like it never worked. 2025/03/03 10:40:01 [INFO] : :mag: collecting operator images... 2025/03/03 10:40:01 [DEBUG] : [OperatorImageCollector] setting copy option o.Opts.MultiArch=all when collecting operator images 2025/03/03 10:40:01 [DEBUG] : [OperatorImageCollector] copying operator image registry.redhat.io/redhat/redhat-operator-index:v4.16 (24s) Collecting catalog registry.redhat.io/redhat/redhat-operator-index:v4.16 2025/03/03 10:40:26 [DEBUG] : [OperatorImageCollector] manifest 2be15a52aa4978d9134dfb438e51c01b77c9585578244b97b8ba1d4f5e6c0ea1 (5m59s) Collecting catalog registry.redhat.io/redhat/redhat-operator-index:v4.16 2025/03/03 10:46:01 [WARN] : error parsing image registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9 : registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9 unable to parse image correctly : tag and dige ✓ (5m59s) Collecting catalog registry.redhat.io/redhat/redhat-operator-index:v4.16 2025/03/03 10:46:01 [DEBUG] : [OperatorImageCollector] copying operator image registry.redhat.io/redhat/certified-operator-index:v4.16 ⠦ (2s) Collecting catalog registry.redhat.io/redhat/certified-operator-index:v4.16 2025/03/03 10:46:03 [DEBUG] : [OperatorImageCollector] manifest 816c65bcab1086e3fa158e2391d84c67cf96916027c59ab8fe44cf68a1bfe57a 2025/03/03 10:46:03 [DEBUG] : [OperatorImageCollector] label /configs ✓ (51s) Collecting catalog registry.redhat.io/redhat/certified-operator-index:v4.16 2025/03/03 10:46:53 [DEBUG] : [OperatorImageCollector] copying operator image registry.redhat.io/redhat/community-operator-index:v4.16 ⠇ (11s) Collecting catalog registry.redhat.io/redhat/community-operator-index:v4.16 2025/03/03 10:47:04 [DEBUG] : [OperatorImageCollector] manifest 7a8cb7df2447b26c43b274f387197e0789c6ccc55c18b48bf0807ee00286550d ⠹ (34m26s) Collecting catalog registry.redhat.io/redhat/community-operator-index:v4.16 Killed
Expected results:
oc-mirror process should not get killed abruptly.
Additional info:
More info in the link here: https://redhat-internal.slack.com/archives/C02JW6VCYS1/p1740783474190069
This fix updates OpenShift 4.19 to Kubernetes v1.32.3, incorporating the latest upstream changes and fixes.
For details on the changes included in this update, see the Kubernetes changelog:
Description of problem:
During debugging ocp-42855 failure, hostedcluster conditions Degraded is True
Version-Release number of selected component (if applicable):
quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64
How reproducible:
follow ocp-42855 test steps
Steps to Reproduce:
1.Create a basic hosted cluster using hypershift tool 2.check hostedcluster conditions
Actual results:
[hmx@ovpn-12-45 hypershift]$ oc get pods -n clusters-mihuanghy NAME READY STATUS RESTARTS AGE aws-ebs-csi-driver-controller-9c46694f-mqrlc 7/7 Running 0 55m aws-ebs-csi-driver-operator-5d7867bc9f-hqzd5 1/1 Running 0 55m capi-provider-6df855dbb5-tcmvq 2/2 Running 0 58m catalog-operator-7544b8d6d8-dk4hh 2/2 Running 0 57m certified-operators-catalog-7f8f6598b5-2blv4 0/1 CrashLoopBackOff 15 (4m20s ago) 57m cloud-network-config-controller-545fcfc797-mgszj 3/3 Running 0 55m cluster-api-54c7f7c477-kgvzn 1/1 Running 0 58m cluster-autoscaler-658756f99-vr2hk 1/1 Running 0 58m cluster-image-registry-operator-84d84dbc9f-zpcsq 3/3 Running 0 57m cluster-network-operator-9b6985cc8-sd7d7 1/1 Running 0 57m cluster-node-tuning-operator-65c8f6fbb9-xzpws 1/1 Running 0 57m cluster-policy-controller-b5c76cf58-b4rth 1/1 Running 0 57m cluster-storage-operator-7474f76c99-9chl7 1/1 Running 0 57m cluster-version-operator-646d97ccc9-l72m5 1/1 Running 0 57m community-operators-catalog-774fdb48fc-z6s4d 1/1 Running 0 57m control-plane-operator-5bc8c4c996-4nz8c 2/2 Running 0 58m csi-snapshot-controller-5b7d6bb685-vf8rf 1/1 Running 0 55m csi-snapshot-controller-operator-6f74db85c6-89bts 1/1 Running 0 57m csi-snapshot-webhook-57c5bd7f85-lqnwf 1/1 Running 0 55m dns-operator-767c5bbdd8-rb7fl 1/1 Running 0 57m etcd-0 2/2 Running 0 58m hosted-cluster-config-operator-88b9d49b7-2gvbt 1/1 Running 0 57m ignition-server-949d9fd8c-cgtxb 1/1 Running 0 58m ingress-operator-5c6f5d4f48-gh7fl 3/3 Running 0 57m konnectivity-agent-79c5ff9585-pqctc 1/1 Running 0 58m konnectivity-server-65956d468c-lpwfv 1/1 Running 0 58m kube-apiserver-d9f887c4b-xwdcx 5/5 Running 0 58m kube-controller-manager-64b6f757f9-6qszq 2/2 Running 0 52m kube-scheduler-58ffcdf789-fch2n 1/1 Running 0 57m machine-approver-559d66d4d6-2v64w 1/1 Running 0 58m multus-admission-controller-8695985fbc-hjtqb 2/2 Running 0 55m oauth-openshift-6b9695fc7f-pf4j6 2/2 Running 0 55m olm-operator-bf694b84-gvz6x 2/2 Running 0 57m openshift-apiserver-55c69bc497-x8bft 2/2 Running 0 52m openshift-controller-manager-8597c66d58-jb7w2 1/1 Running 0 57m openshift-oauth-apiserver-674cd6df6d-ckg55 1/1 Running 0 57m openshift-route-controller-manager-76d78f897c-9mfmj 1/1 Running 0 57m ovnkube-master-0 7/7 Running 0 55m packageserver-7988d8ddfc-wnh6l 2/2 Running 0 57m redhat-marketplace-catalog-77547cc685-hnh65 0/1 CrashLoopBackOff 15 (4m15s ago) 57m redhat-operators-catalog-7784d45f54-58lgg 1/1 Running 0 57m { "lastTransitionTime": "2022-12-31T18:45:28Z", "message": "[certified-operators-catalog deployment has 1 unavailable replicas, redhat-marketplace-catalog deployment has 1 unavailable replicas]", "observedGeneration": 3, "reason": "UnavailableReplicas", "status": "True", "type": "Degraded" },
Expected results:
Degraded is False
Additional info:
$ oc describe pod certified-operators-catalog-7f8f6598b5-2blv4 -n clusters-mihuanghy Name: certified-operators-catalog-7f8f6598b5-2blv4 Namespace: clusters-mihuanghy Priority: 100000000 Priority Class Name: hypershift-control-plane Node: ip-10-0-202-149.us-east-2.compute.internal/10.0.202.149 Start Time: Sun, 01 Jan 2023 02:47:03 +0800 Labels: app=certified-operators-catalog hypershift.openshift.io/control-plane-component=certified-operators-catalog hypershift.openshift.io/hosted-control-plane=clusters-mihuanghy olm.catalogSource=certified-operators pod-template-hash=7f8f6598b5 Annotations: hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64 k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.38" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.38" ], "default": true, "dns": {} }] openshift.io/scc: restricted-v2 seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP: 10.131.0.38 IPs: IP: 10.131.0.38 Controlled By: ReplicaSet/certified-operators-catalog-7f8f6598b5 Containers: registry: Container ID: cri-o://f32b8d4c31b729c1b7deef0da622ddd661d840428aa4847968b1b2b3bf76b6cf Image: registry.redhat.io/redhat/certified-operator-index:v4.11 Image ID: registry.redhat.io/redhat/certified-operator-index@sha256:93f667597eee33b9bdbc9a61af60978b414b6f6df8e7c5f496c4298c1dfe9b62 Port: 50051/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Sun, 01 Jan 2023 03:39:44 +0800 Finished: Sun, 01 Jan 2023 03:39:44 +0800 Ready: False Restart Count: 15 Requests: cpu: 10m memory: 160Mi Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3 Startup: exec [grpc_health_probe -addr=:50051] delay=0s timeout=1s period=10s #success=1 #failure=15 Environment: <none> Mounts: <none> Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: <none> QoS Class: Burstable Node-Selectors: <none> Tolerations: hypershift.openshift.io/cluster=clusters-mihuanghy:NoSchedule hypershift.openshift.io/control-plane=true:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 54m default-scheduler Successfully assigned clusters-mihuanghy/certified-operators-catalog-7f8f6598b5-2blv4 to ip-10-0-202-149.us-east-2.compute.internal Normal AddedInterface 53m multus Add eth0 [10.131.0.38/23] from openshift-sdn Normal Pulling 53m kubelet Pulling image "registry.redhat.io/redhat/certified-operator-index:v4.11" Normal Pulled 53m kubelet Successfully pulled image "registry.redhat.io/redhat/certified-operator-index:v4.11" in 40.628843349s Normal Pulled 52m (x3 over 53m) kubelet Container image "registry.redhat.io/redhat/certified-operator-index:v4.11" already present on machine Normal Created 52m (x4 over 53m) kubelet Created container registry Normal Started 52m (x4 over 53m) kubelet Started container registry Warning BackOff 3m59s (x256 over 53m) kubelet Back-off restarting failed container $ oc describe pod redhat-marketplace-catalog-77547cc685-hnh65 -n clusters-mihuanghy Name: redhat-marketplace-catalog-77547cc685-hnh65 Namespace: clusters-mihuanghy Priority: 100000000 Priority Class Name: hypershift-control-plane Node: ip-10-0-202-149.us-east-2.compute.internal/10.0.202.149 Start Time: Sun, 01 Jan 2023 02:47:03 +0800 Labels: app=redhat-marketplace-catalog hypershift.openshift.io/control-plane-component=redhat-marketplace-catalog hypershift.openshift.io/hosted-control-plane=clusters-mihuanghy olm.catalogSource=redhat-marketplace pod-template-hash=77547cc685 Annotations: hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64 k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.40" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.40" ], "default": true, "dns": {} }] openshift.io/scc: restricted-v2 seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP: 10.131.0.40 IPs: IP: 10.131.0.40 Controlled By: ReplicaSet/redhat-marketplace-catalog-77547cc685 Containers: registry: Container ID: cri-o://7afba8993dac8f1c07a2946d8b791def3b0c80ce62d5d6160770a5a9990bf922 Image: registry.redhat.io/redhat/redhat-marketplace-index:v4.11 Image ID: registry.redhat.io/redhat/redhat-marketplace-index@sha256:074498ac11b5691ba8975e8f63fa04407ce11bb035dde0ced2f439d7a4640510 Port: 50051/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Sun, 01 Jan 2023 03:39:49 +0800 Finished: Sun, 01 Jan 2023 03:39:49 +0800 Ready: False Restart Count: 15 Requests: cpu: 10m memory: 340Mi Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3 Startup: exec [grpc_health_probe -addr=:50051] delay=0s timeout=1s period=10s #success=1 #failure=15 Environment: <none> Mounts: <none> Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: <none> QoS Class: Burstable Node-Selectors: <none> Tolerations: hypershift.openshift.io/cluster=clusters-mihuanghy:NoSchedule hypershift.openshift.io/control-plane=true:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 55m default-scheduler Successfully assigned clusters-mihuanghy/redhat-marketplace-catalog-77547cc685-hnh65 to ip-10-0-202-149.us-east-2.compute.internal Normal AddedInterface 55m multus Add eth0 [10.131.0.40/23] from openshift-sdn Normal Pulling 55m kubelet Pulling image "registry.redhat.io/redhat/redhat-marketplace-index:v4.11" Normal Pulled 54m kubelet Successfully pulled image "registry.redhat.io/redhat/redhat-marketplace-index:v4.11" in 40.862526792s Normal Pulled 53m (x3 over 54m) kubelet Container image "registry.redhat.io/redhat/redhat-marketplace-index:v4.11" already present on machine Normal Created 53m (x4 over 54m) kubelet Created container registry Normal Started 53m (x4 over 54m) kubelet Started container registry Warning BackOff 21s (x276 over 54m) kubelet Back-off restarting failed container $ oc describe deployment redhat-marketplace-catalog -n clusters-mihuanghy Name: redhat-marketplace-catalog Namespace: clusters-mihuanghy CreationTimestamp: Sun, 01 Jan 2023 02:47:03 +0800 Labels: hypershift.openshift.io/managed-by=control-plane-operator Annotations: deployment.kubernetes.io/revision: 1 Selector: olm.catalogSource=redhat-marketplace Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=redhat-marketplace-catalog hypershift.openshift.io/control-plane-component=redhat-marketplace-catalog hypershift.openshift.io/hosted-control-plane=clusters-mihuanghy olm.catalogSource=redhat-marketplace Annotations: hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64 Containers: registry: Image: registry.redhat.io/redhat/redhat-marketplace-index:v4.11 Port: 50051/TCP Host Port: 0/TCP Requests: cpu: 10m memory: 340Mi Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3 Startup: exec [grpc_health_probe -addr=:50051] delay=0s timeout=1s period=10s #success=1 #failure=15 Environment: <none> Mounts: <none> Volumes: <none> Priority Class Name: hypershift-control-plane Conditions: Type Status Reason ---- ------ ------ Available False MinimumReplicasUnavailable Progressing False ProgressDeadlineExceeded OldReplicaSets: <none> NewReplicaSet: redhat-marketplace-catalog-77547cc685 (1/1 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 22m deployment-controller Scaled up replica set redhat-marketplace-catalog-77547cc685 to 1 [hmx@ovpn-12-45 hypershift]$ oc get hostedcluster -A NAMESPACE NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE clusters mihuanghy 4.12.0-rc.6 mihuanghy-admin-kubeconfig Completed True False The hosted control plane is available $ oc describe deployment certified-operators-catalog -n clusters-mihuanghy Name: certified-operators-catalog Namespace: clusters-mihuanghy CreationTimestamp: Sun, 01 Jan 2023 02:47:03 +0800 Labels: hypershift.openshift.io/managed-by=control-plane-operator Annotations: deployment.kubernetes.io/revision: 1 Selector: olm.catalogSource=certified-operators Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=certified-operators-catalog hypershift.openshift.io/control-plane-component=certified-operators-catalog hypershift.openshift.io/hosted-control-plane=clusters-mihuanghy olm.catalogSource=certified-operators Annotations: hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64 Containers: registry: Image: registry.redhat.io/redhat/certified-operator-index:v4.11 Port: 50051/TCP Host Port: 0/TCP Requests: cpu: 10m memory: 160Mi Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3 Startup: exec [grpc_health_probe -addr=:50051] delay=0s timeout=1s period=10s #success=1 #failure=15 Environment: <none> Mounts: <none> Volumes: <none> Priority Class Name: hypershift-control-plane Conditions: Type Status Reason ---- ------ ------ Available False MinimumReplicasUnavailable Progressing False ProgressDeadlineExceeded OldReplicaSets: <none> NewReplicaSet: certified-operators-catalog-7f8f6598b5 (1/1 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 21m deployment-controller Scaled up replica set certified-operators-catalog-7f8f6598b5 to 1
When a (Fibre Channel) multipath disk is discovered by the assisted-installer-agent, the wwn field is not included:
{ "bootable": true, "by_id": "/dev/disk/by-id/wwn-0xdeadbeef", "drive_type": "Multipath", "has_uuid": true, "holders": "dm-3,dm-5,dm-7", "id": "/dev/disk/by-id/wwn-0xdeadbeef", "installation_eligibility": { "eligible": true, "not_eligible_reasons": null }, "name": "dm-2", "path": "/dev/dm-2", "size_bytes": 549755813888 },
Thus there is no way to match this disk with a wwn: root device hint. Since assisted does not allow installing directly to a fibre channel disk (without multipath) until 4.19 with MGMT-19631, and there is no /dev/disk/by-path/ symlink for a multipath device, this means that when there are multiple multipath disks in the system there is no way to select between them other than by size.
When ghw lists the disks, it fills in the WWN field from the ID_WWN_WITH_EXTENSION or ID_WWN udev values. It's not clear to me how udev is creating the /dev/disk/by-id/ symlink without those fields. There is a separate DM_WWN field (DM = Device Mapper), but I don't see it used in udev rules for whole disks, only for partitions. I don't have access to any hardware so it's impossible to say what the data in /run/udev/data looks like.
Description of problem:
cluster with custom endpoints, fail to ssh to the created bastion and master vm
Version-Release number of selected component (if applicable):
4.19 pre-merge main@de563b96, merging: #9523 f1119b4a, #9397 487587cf, #9385 e365e12c
How reproducible:
Alwats
Steps to Reproduce:
1. create install-config with customer endpoint serviceEndpoints: - name: COS url: https://s3.direct.jp-tok.cloud-object-storage.appdomain.cloud 2. create the cluster 3.
Actual results:
create the cluster failed. ssh to the bootstrap and master vm failed
Expected results:
create the cluster succeed.
Additional info:
the VNC console of ci-op-lgk38x3xaa049-hk2z5-bootstrap:
Mar 05 11:36:34 ignition[783]: error at $.ignition.config.replace.source, line 1 col 1542: unable to parse url Mar 05 11:36:34 ignition[783]: error at $.ignition.config.replace.httpHeaders, line 1 col 50: unable to parse url Mar 05 11:36:34 ignition[783]: failed to fetch config: config is not valid Mar 05 11:36:34 ignition[783]: failed to acquire config: config is not valid Mar 05 11:36:34 systemd[1]: ignition-fetch-offline.service: Main process exited, code=exited, status=1/FAILURE Mar 05 11:36:34 ignition[783]: Ignition failed: config is not valid Mar 05 11:36:34 systemd[1]: ignition-fetch-offline.service: Failed with result 'exit-code'. Mar 05 11:36:34 systemd[1]: Failed to start Ignition (fetch-offline). Mar 05 11:36:34 systemd[1]: ignition-fetch-offline.service: Triggering OnFailure dependencies. Generating "/run/initramfs/rdsosreport.txt"
the VNC console of ci-op-lgk38x3xaa049-hk2z5-master-0:
[ 2284.471078] ignition[840]: GET https://api-int.ci-op-lgk38x3xaa049.private-ibmcloud-1.qe.devcluster.openshift.com:22623/config/master: attempt #460 [ 2284.477585] ignition[840]: GET error: Get "https://api-int.ci-op-lgk38x3xaa049.private-ibmcloud-1.qe.devcluster.openshift.com:22623/config/master": EOF
warning React Hook React.useMemo has a missing dependency: 'hasRevealableContent'
Description of problem:
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
Description of problem:
We should use resource kind HelmChartRepository on details page, action items and breadcrumb link
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-03-09-063419
How reproducible:
Always
Steps to Reproduce:
1. navigate to Helm -> Repositories page, click on on HelmChartRepository 2. Check the details page heading name, breadcrumb link name and action items name 3.
Actual results:
Details page heading is: Helm Chart Repository Breadcrumb link name is: Repositories -> Helm Chart Repository details Two action items are: Edit Helm Chart Repository and Delete Helm Chart Repository
Expected results:
We should use HelmChartRepository(no space between words) in these places
Additional info:
Description of problem:
Console show time out error when trying to edit deployment with annotation `image.openshift.io/triggers: ''`
Version-Release number of selected component (if applicable):
4.12
How reproducible:
Always
Steps to Reproduce:
1. Install a 4.12 cluster 2. Create a deployment withh annotation `image.openshift.io/triggers: ''` 3. Select edit deployment in console 4. Console gives time out error
Actual results:
Console gives time out error
Expected results:
Console should be able to handle bad values
Additional info:
The issue is observed when we check from actions section.Deployment-><name_of_deployment>>Actions-> Edit DeploymentThe page gives error when annotation is present as: "Oh no! Something went wrong"When annotation is removed, deployment is shown.
Description of problem:
Webhook prompt should be given when marketType is invalid like other features liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml Error from server (Forbidden): error when creating "ms1.yaml": admission webhook "validation.machineset.machine.openshift.io" denied the request: providerSpec.networkInterfaceType: Invalid value: "1": Valid values are: ENA, EFA and omitted
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-03-05-160850
How reproducible:
always
Steps to Reproduce:
1.Install an AWS cluster liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.19.0-0.nightly-2025-03-05-160850 True False 5h37m Cluster version is 4.19.0-0.nightly-2025-03-05-160850 2.Create a machineset with invalid marketType, for example, marketType: "1", the machine stuck in Provisioning, although I can see some messages in the machine providerStatus and machine-controller log, I think we should give explicit webhook prompt to be consistent with other features. huliu-aws36a-6bslb-worker-us-east-2aa-f89jk Provisioning 8m42s providerStatus: conditions: - lastTransitionTime: "2025-03-06T07:49:51Z" message: invalid MarketType "1" reason: MachineCreationFailed status: "False" type: MachineCreation E0306 08:01:07.645341 1 actuator.go:72] huliu-aws36a-6bslb-worker-us-east-2aa-f89jk error: huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: reconciler failed to Create machine: failed to launch instance: invalid MarketType "1" W0306 08:01:07.645377 1 controller.go:409] huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: failed to create machine: huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: reconciler failed to Create machine: failed to launch instance: invalid MarketType "1" E0306 08:01:07.645427 1 controller.go:341] "msg"="Reconciler error" "error"="huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: reconciler failed to Create machine: failed to launch instance: invalid MarketType \"1\"" "controller"="machine-controller" "name"="huliu-aws36a-6bslb-worker-us-east-2aa-f89jk" "namespace"="openshift-machine-api" "object"={"name":"huliu-aws36a-6bslb-worker-us-east-2aa-f89jk","namespace":"openshift-machine-api"} "reconcileID"="e3aeeeda-2537-4e83-a787-2cbcf9926646" I0306 08:01:07.645499 1 recorder.go:104] "msg"="huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: reconciler failed to Create machine: failed to launch instance: invalid MarketType \"1\"" "logger"="events" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"huliu-aws36a-6bslb-worker-us-east-2aa-f89jk","uid":"a7ef8a7b-87d5-4569-93a4-47a7a2d16325","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"118757"} "reason"="FailedCreate" "type"="Warning" liuhuali@Lius-MacBook-Pro huali-test % oc get machineset huliu-aws36a-6bslb-worker-us-east-2aa -oyaml apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64 machine.openshift.io/GPU: "0" machine.openshift.io/memoryMb: "16384" machine.openshift.io/vCPU: "4" creationTimestamp: "2025-03-06T07:49:50Z" generation: 1 labels: machine.openshift.io/cluster-api-cluster: huliu-aws36a-6bslb name: huliu-aws36a-6bslb-worker-us-east-2aa namespace: openshift-machine-api resourceVersion: "118745" uid: 65e94786-6c1a-42b8-9bf3-9fe0d3f4adf3 spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: huliu-aws36a-6bslb machine.openshift.io/cluster-api-machineset: huliu-aws36a-6bslb-worker-us-east-2aa template: metadata: labels: machine.openshift.io/cluster-api-cluster: huliu-aws36a-6bslb machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: huliu-aws36a-6bslb-worker-us-east-2aa spec: lifecycleHooks: {} metadata: {} providerSpec: value: ami: id: ami-0e763ecd8ccccbc99 apiVersion: machine.openshift.io/v1beta1 blockDevices: - ebs: encrypted: true iops: 0 kmsKey: arn: "" volumeSize: 120 volumeType: gp3 capacityReservationId: "" credentialsSecret: name: aws-cloud-credentials deviceIndex: 0 iamInstanceProfile: id: huliu-aws36a-6bslb-worker-profile instanceType: m6i.xlarge kind: AWSMachineProviderConfig marketType: "1" metadata: creationTimestamp: null metadataServiceOptions: {} placement: availabilityZone: us-east-2a region: us-east-2 securityGroups: - filters: - name: tag:Name values: - huliu-aws36a-6bslb-node - filters: - name: tag:Name values: - huliu-aws36a-6bslb-lb subnet: filters: - name: tag:Name values: - huliu-aws36a-6bslb-subnet-private-us-east-2a tags: - name: kubernetes.io/cluster/huliu-aws36a-6bslb value: owned userDataSecret: name: worker-user-data status: fullyLabeledReplicas: 1 observedGeneration: 1 replicas: 1 liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
machine stuck in Provisioning, and some messages shown in the machine providerStatus and machine-controller log
Expected results:
give explicit webhook prompt to be consistent with other features. like liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml Error from server (Forbidden): error when creating "ms1.yaml": admission webhook "validation.machineset.machine.openshift.io" denied the request: providerSpec.networkInterfaceType: Invalid value: "1": Valid values are: ENA, EFA and omitted
Additional info:
New feature testing for https://issues.redhat.com/browse/OCPCLOUD-2780
Description of the problem:
Create cluster ,booted from iSCSI multipath.
When node discovered the mpath nodes were up .
I changed one of the path's to offline by adding blackhole routing.
ip route add blackhole 192.168.145.1/32
The disk validation caught it but there is a message exposing internal kind of function..
Looks like the address that is set named ->
Iface IPaddress: [default]
Probably we will have to change the validation message to something more general like:
iSCSI ipv4 address is not routable ( and not ParseAddr)
How reproducible:
Steps to reproduce:
1.
2.
3.
Actual results:
Expected results:
Description of problem:
Create cluster on instance type Standard_M8-4ms, installer failed to provision machines. install-config: ================ controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: azure: type: Standard_M8-4ms Create cluster: ===================== $ ./openshift-install create cluster --dir ipi3 INFO Waiting up to 15m0s (until 2:31AM UTC) for machines [jimainstance01-h45wv-bootstrap jimainstance01-h45wv-master-0 jimainstance01-h45wv-master-1 jimainstance01-h45wv-master-2] to provision... ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: control-plane machines were not provisioned within 15m0s: client rate limiter Wait returned an error: context deadline exceeded INFO Shutting down local Cluster API controllers... INFO Stopped controller: Cluster API WARNING process cluster-api-provider-azure exited with error: signal: killed INFO Stopped controller: azure infrastructure provider INFO Stopped controller: azureaso infrastructure provider INFO Shutting down local Cluster API control plane... INFO Local Cluster API system has completed operation In openshift-install.log, all machines were created failed with below error: ================= time="2024-09-20T02:17:07Z" level=debug msg="I0920 02:17:07.757980 1747698 recorder.go:104] \"failed to reconcile AzureMachine: failed to reconcile AzureMachine service virtualmachine: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string '218.75' as int64: strconv.ParseInt: parsing \\\"218.75\\\": invalid syntax. Object will not be requeued\" logger=\"events\" type=\"Warning\" object={\"kind\":\"AzureMachine\",\"namespace\":\"openshift-cluster-api-guests\",\"name\":\"jimainstance01-h45wv-bootstrap\",\"uid\":\"d67a2010-f489-44b4-9be9-88d7b136a45b\",\"apiVersion\":\"infrastructure.cluster.x-k8s.io/v1beta1\",\"resourceVersion\":\"1530\"} reason=\"ReconcileError\"" ... time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-bootstrap has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-bootstrap has not yet provisioned: Failed" time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-0 has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-0 has not yet provisioned: Failed" time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-1 has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-1 has not yet provisioned: Failed" time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-2 has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-2 has not yet provisioned: Failed" ... Also see same error in .clusterapi_output/Machine-openshift-cluster-api-guests-jimainstance01-h45wv-bootstrap.yaml =================== $ yq-go r Machine-openshift-cluster-api-guests-jimainstance01-h45wv-bootstrap.yaml 'status' noderef: null nodeinfo: null lastupdated: "2024-09-20T02:17:07Z" failurereason: CreateError failuremessage: 'Failure detected from referenced resource infrastructure.cluster.x-k8s.io/v1beta1, Kind=AzureMachine with name "jimainstance01-h45wv-bootstrap": failed to reconcile AzureMachine service virtualmachine: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string ''218.75'' as int64: strconv.ParseInt: parsing "218.75": invalid syntax. Object will not be requeued' addresses: [] phase: Failed certificatesexpirydate: null bootstrapready: false infrastructureready: false observedgeneration: 1 conditions: - type: Ready status: "False" severity: Error lasttransitiontime: "2024-09-20T02:17:07Z" reason: Failed message: 0 of 2 completed - type: InfrastructureReady status: "False" severity: Error lasttransitiontime: "2024-09-20T02:17:07Z" reason: Failed message: 'virtualmachine failed to create or update. err: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string ''218.75'' as int64: strconv.ParseInt: parsing "218.75": invalid syntax. Object will not be requeued' - type: NodeHealthy status: "False" severity: Info lasttransitiontime: "2024-09-20T02:16:27Z" reason: WaitingForNodeRef message: "" From above error, seems unable to parse the memory of instance type Standard_M8-4ms, which is a decimal, not an integer. $ az vm list-skus --size Standard_M8-4ms --location southcentralus | jq -r '.[].capabilities[] | select(.name=="MemoryGB")' { "name": "MemoryGB", "value": "218.75" }
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-09-16-082730
How reproducible:
Always
Steps to Reproduce:
1. set controlPlane type as Standard_M8-4ms in install-config 2. create cluster 3.
Actual results:
Installation failed
Expected results:
Installation succeeded
Additional info:
Description of problem:
Two favorite icon is shows on same page. Operator details page with CR.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. install Red Hat Serverless operator 2. navigate to Operator details > knative serving page
Actual results:
Two star icon on the same page
Expected results:
Only one star icon should present on a page
Additional info:
This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were not completed when this image was assembled
We recently hit a limit in our subscription where we could no longer assign role assignments to service principals.
This is because we are not deleting role assignments made during our CI runs. We previously thought we didn't have to delete those, but it turns out we need to.
Although haven't seen this failure in the periodic jobs yet, seems like CAPI jobs are broken for release-ocm-2.13 - https://prow.ci.openshift.org/job-history/gs/test-platform-results/pr-logs/directory/pull-ci-openshift-assisted-service-release-ocm-2.13-e2e-ai-operator-disconnected-capi.
Description of problem:
Adding a node with `oc adm node-image` is unable to pull the release image container and fails to generate the new node ISO.
Version-Release number of selected component (if applicable):
How reproducible:
100%
Steps to Reproduce:
1. Deploy OpenShift cluster with private registry in an offline environment 2. Create the nodes-config.yaml for new nodes 3. Run "oc adm node-image create --dir=/tmp/assets
Actual results:
Command fails with error saying that it cannot pull from quay.io/openshift-release-dev/ocp-release@shaXXXXX
Expected results:
Command generates an ISO used to add the new worker nodes
Additional info:
When creating the initial agent ISO using "openshift-install agent create image" command, we can see in the output that a sub command is run, "oc adm release extract". When the install-config.yaml contains the ImageContentSourcePolicy section, or ImageDigestMirrorSet section, a flag is added to "oc adm release extract --icsp or idms" which contains the mappings from quay.io to the private registry. The oc command does not have a top level icsp or idms flag. The oc adm node-image command needs to have a flag for icsp or idms such that it is able to understand that instead of pulling the release image from quay.io it should pull the image from the private registry. Without this flag, the oc command has no way to know that it should be pulling container images from a private registry.
Description of the problem:
For some hardware, particularly simplynuc (https://edge.simplynuc.com/) it was found that when the Motherboard serial number is not set it default to "-". Since this is treated as a valid string in the UUID generation in https://github.com/openshift/assisted-installer-agent/blob/master/src/scanners/machine_uuid_scanner.go#L96-L107 it results in all hosts with the same UUID, causing installation failures.
we're just getting a regexp search bar and then a blank chart. Using the browser dev tools console we see this error:
Uncaught SyntaxError: import declarations may only appear at top level of a module timelines-chart:1:1 Uncaught ReferenceError: TimelinesChart is not defined renderChart https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-vsphere-ovn-csi/1902920985443569664/artifacts/e2e-vsphere-ovn-csi/openshift-e2e-test/artifacts/junit/e2e-timelines_spyglass_20250321-040532.html:33606 <anonymous> https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-vsphere-ovn-csi/1902920985443569664/artifacts/e2e-vsphere-ovn-csi/openshift-e2e-test/artifacts/junit/e2e-timelines_spyglass_20250321-040532.html:33650
Seems to be hitting 4.18 as well, not sure when it started exactly.
The user-ca-bundle on our managed cluster contains two copies of all the entries from the parent cluster's trusted ca bundle, resulting in a massive user-ca-bundle, around 300 entries.
Our configMap on the hub cluster that contains the registries.conf and ca-bundle.crt only has 1 ca cert, under our understanding this should be the only ca cert that is transferred into the new managed cluster.
We may be missing configuration somewhere, but we are unable to find anything and do not know where that would be configured. Our agentServiceConfig only specifies our one configMap.
We are deploying the cluster on baremetal using the ztp cluster-instance pattern.
This is causing us to be unable to deploy more clusters from our hub cluster due to the ignition file being too large.