Jump to: Incomplete Features | Incomplete Epics | Other Complete | Other Incomplete |
Note: this page shows the Feature-Based Change Log for a release
When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release
Goal:
Graduate to GA (full support) Gateway API with Istio to unify the management of cluster ingress with a common, open, expressive, and extensible API.
Description:
Gateway API is the evolution of upstream Kubernetes Ingress APIs. The upstream project is part of Kubernetes, working under SIG-NETWORK. OpenShift is contributing to the development, building a leadership position, and preparing OpenShift to support Gateway API, with Istio as our supported implementation.
The plug-able nature of the implementation of Gateway API enables support for additional and optional 3rd-party Ingress technologies.
Problem: ** As an administrator, I would like to securely expose cluster resources to remote clients and services while providing a self-service experience to application developers.
GA: A feature is implemented as GA so that developers can issue an update to the Tech Preview MVP and:
Dependencies (internal and external)
GWAPI and istio logs are not in the must-gather reports.
Add Gateway API resources and possibly OSSM resources to the operator's relatedObjects field.
Use cases:
This Epic is a place holder for stories regarding e2e and unit tests that are missing for old features and to determine whether OSSM 3.x TP2 bugs affect us before they are fixed in GA. There is already one epic for DNS and test cases should be added for any new features in the release.
Write and run test cases that are currently missing.
and https://github.com/openshift/api?tab=readme-ov-file#defining-featuregate-e2e-tests
the tests would be covered in Origin are:
Add a test to cluster-ingress-operator's E2E tests to verify that Istio is configured not to allow manual deployment.
Gateway API is the next generation of the Ingress API in upstream Kubernetes.
OpenShift Service Mesh (OSSM) and several other offering of ours like Kuadrant, Microshift and OpenShift AI all have critical dependencies on Gateway API's API resources. However, even though Gateway API is an official Kubernetes project its API resources are not available in the core API (like Ingress) and instead require the installation of Custom Resource Definitions (CRDs).
OCP will be fully in charge of managing the life-cycle of the Gateway API CRDs going forward. This will make the Gateway API a "core-like" API on OCP. If the CRDs are already present on a cluster when it upgrades to the new version where they are managed, the cluster admin is responsible for the safety of existing Gateway API implementations. The Cluster Ingress Operator (CIO) enacts a process called "CRD Management Succession" to ensure the transfer of control occurs safely, which includes multiple pre-upgrade checks and CIO startup checks.
The organization as a whole needs to be made aware of this as new projects will continue to pop up with Gateway API support over the years. This includes (but is not limited to)
Importantly our cluster infrastructure work with Cluster API (CAPI) is working through similar dilemmas for CAPI CRDs, and so we need to make sure to work directly with them as they've already broken a lot of ground here. Here are the relevant docs with the work they've done so far:
On OCP 4.19 onward we will ensure the Gateway API CRDs are present a specific version with its own feature gate which will default to true. If we can not ensure the CRDs are present at the expected version we will mark the cluster degraded.
See the description of NE-1898.
The Cluster Ingress Operator (CIO) currently provides some logic around handling the Gateway API CRDs, and a chunk of this work is simply updating that. The CIO should:
See some of the current CRD management logic here.
Cgroup V1 was deprecated in OCP 4.16 . RHEL will be removing support for cgroup v1 in RHEL 10 so we will remove it in OCP 4.19
Goal
For clusters running cgroup v1 on OpenShift 4.18 or earlier, upgrading to OpenShift 4.19 will be blocked. To proceed with the upgrade, clusters on OpenShift 4.18 must first switch from cgroup v1 to cgroup v2. Once this transition is complete, the cluster upgrade to OpenShift 4.19 can be performed.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Remove the CgroupModeV1 config option from the openshift/api repository
Ref: https://github.com/openshift/api/blob/master/config/v1/types_node.go#L84
Add a CRD validation check on the CgroupMode field of the nodes.config spec to avoid the update to "v1" and only allow the "v2" and "" as valid values.
Latest update:
Raise a PR with the updated enhancement proposal to handle the removal of cgroupsv1
OVN Kubernetes BGP support as a routing protocol for User Defined Network (Segmentation) pod and VM addressability.
OVN-Kubernetes BGP support enables the capability of dynamically exposing cluster scoped network entities into a provider’s network, as well as program BGP learned routes from the provider’s network into OVN.
OVN-Kubernetes currently has no native routing protocol integration, and relies on a Geneve overlay for east/west traffic, as well as third party operators to handle external network integration into the cluster. This enhancement adds support for BGP as a supported routing protocol with OVN-Kubernetes. The extent of this support will allow OVN-Kubernetes to integrate into different BGP user environments, enabling it to dynamically expose cluster scoped network entities into a provider’s network, as well as program BGP learned routes from the provider’s network into OVN. In a follow-on release, this enhancement will provide support for EVPN, which is a common data center networking fabric that relies on BGP.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Importing Routes from the Provider Network
Today in OpenShift there is no API for a user to be able to configure routes into OVN. In order for a user to change how cluster traffic is routed egress into the cluster, the user leverages local gateway mode, which forces egress traffic to hop through the Linux host's networking stack, where a user can configure routes inside of the host via NM State. This manual configuration would need to be performed and maintained across nodes and VRFs within each node.
Additionally, if a user chooses to not manage routes within the host and use local gateway mode, then by default traffic is always sent to the default gateway. The only other way to affect egress routing is by using the Multiple External Gateways (MEG) feature. With this feature the user may choose to have multiple different egress gateways per namespace to send traffic to.
As an alternative, configuring BGP peers and which route-targets to import would eliminate the need to manually configure routes in the host, and would allow dynamic routing updates based on changes in the provider’s network.
Exporting Routes into the Provider Network
There exists a need for provider networks to learn routes directly to services and pods today in Kubernetes. Metal LB is already one solution whereby load balancer IPs are advertised by BGP to provider networks, and this feature development does not intend to duplicate or replace the function of Metal LB. Metal LB should be able to interoperate with OVN-Kubernetes, and be responsible for advertising services to a provider’s network.
However, there is an alternative need to advertise pod IPs on the provider network. One use case is integration with 3rd party load balancers, where they terminate a load balancer and then send packets directly to OCP nodes with the destination IP address being the pod IP itself. Today these load balancers rely on custom operators to detect which node a pod is scheduled to and then add routes into its load balancer to send the packet to the right node.
By integrating BGP and advertising the pod subnets/addresses directly on the provider network, load balancers and other entities on the network would be able to reach the pod IPs directly.
Extending OVN-Kubernetes VRFs into the Provider Network
This is the most powerful motivation for bringing support of EVPN into OVN-Kubernetes. A previous development effort enabled the ability to create a network per namespace (VRF) in OVN-Kubernetes, allowing users to create multiple isolated networks for namespaces of pods. However, the VRFs terminate at node egress, and routes are leaked from the default VRF so that traffic is able to route out of the OCP node. With EVPN, we can now extend the VRFs into the provider network using a VPN. This unlocks the ability to have L3VPNs that extend across the provider networks.
Utilizing the EVPN Fabric as the Overlay for OVN-Kubernetes
In addition to extending VRFs to the outside world for ingress and egress, we can also leverage EVPN to handle extending VRFs into the fabric for east/west traffic. This is useful in EVPN DC deployments where EVPN is already being used in the TOR network, and there is no need to use a Geneve overlay. In this use case, both layer 2 (MAC-VRFs) and layer 3 (IP-VRFs) can be advertised directly to the EVPN fabric. One advantage of doing this is that with Layer 2 networks, broadcast, unknown-unicast and multicast (BUM) traffic is suppressed across the EVPN fabric. Therefore the flooding domain in L2 networks for this type of traffic is limited to the node.
Multi-homing, Link Redundancy, Fast Convergence
Extending the EVPN fabric to OCP nodes brings other added benefits that are not present in OCP natively today. In this design there are at least 2 physical NICs and links leaving the OCP node to the EVPN leaves. This provides link redundancy, and when coupled with BFD and mass withdrawal, it can also provide fast failover. Additionally, the links can be used by the EVPN fabric to utilize ECMP routing.
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
Description of problem:
I have deployed OCP 4.19 cluster on baremetal with 22 worker nodes and 2 infra nodes using 4.19.0-ec.3. Then I have applied the OVNK BGP image which is built using the PR build 4.19,openshift/ovn-kubernetes#2239
After 4 to 5 hours, I see some cluster operators getting degraded
[root@e33-h03-000-r650 debug_oc]# oc get co | grep "False True"
authentication 4.19.0-ec.3 False False True 37h OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.vkommadieip29.rdu2.scalelab.redhat.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
console 4.19.0-ec.3 False False True 37h RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.vkommadieip29.rdu2.scalelab.redhat.com): Get "https://console-openshift-console.apps.vkommadieip29.rdu2.scalelab.redhat.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
ingress 4.19.0-ec.3 True False True 3d13h The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing. Last 1 error messages:...
insights 4.19.0-ec.3 False False True 33h Failed to upload data: unable to build request to connect to Insights server: Post "https://console.redhat.com/api/ingress/v1/upload": dial tcp 23.40.100.203:443: i/o timeout
kube-controller-manager 4.19.0-ec.3 True False True 3d13h GarbageCollectorDegraded: error fetching rules: client_error: client error: 401
ingress-operator is showing below error in its logs
2025-03-24T12:12:20.904051049Z 2025-03-24T12:12:20.903Z ERROR operator.canary_controller wait/backoff.go:226 error performing canary route check {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.vkommadieip29.rdu2.scalelab.redhat.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
This is not happening when BGP patch is not applied. This is blocking BGP testing on baremetal deployment as prometheus is down.
Version-Release number of selected component (if applicable):
4.19.0-ec.3
OVNK image from build 4.19,openshift/ovn-kubernetes#2239
How reproducible:
Always
Steps to Reproduce:
1. Deploy 4.19.0-ec.3 on baremetal with 24 workers.
2. oc patch featuregate cluster --type=merge -p='{"spec":{"featureSet":"TechPreviewNoUpgrade"}}'
3. oc patch Network.operator.openshift.io cluster --type=merge -p='{"spec":{"additionalRoutingCapabilities":
{"providers": ["FRR"]}, "defaultNetwork":{"ovnKubernetesConfig":{"routeAdvertisements":"Enabled"}}}}'
4. Lable 2 nodes as infra and move ingress, registry and proemthus to infra nodes
5. oc scale --replicas=0 deploy/cluster-version-operator -n openshift-cluster-version
oc -n openshift-network-operator set env deployment.apps/network-operator OVN_IMAGE=quay.io/vkommadi/bgppr2239ovnk:latest
6. git clone -b ovnk-bgp https://github.com/jcaamano/frr-k8s
cd frr-k8s/hack/demo/
./demo.sh
7. oc apply -f ~/frr-k8s/hack/demo/configs/receive_all.yaml
8. cat ~/ra.yaml
apiVersion: k8s.ovn.org/v1
kind: RouteAdvertisements
metadata:
name: default
spec:
networkSelector:
matchLabels:
k8s.ovn.org/default-network: ""
advertisements:
- "PodNetwork"
- "EgressIP"
oc apply -f ~/ra.yaml
9. Wait for 5 to 6 hours, we can see some operators degraded because of health check failures (mainly ingress, prometheus, authentication, console)
Actual results:
health checks for operators are failing as route access is failing when BGP is enabled. We can't conduct scale tests as prometheus is down.
Expected results:
health checks for operators shouldn't fail.
Additional info:
Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.
Affected Platforms:
Is it an
If it is an internal RedHat testing failure:
Enable OpenShift to be deployed on Confidential VMs on GCP using Intel TDX technology
Users deploying OpenShift on GCP can choose to deploy Confidential VMs using Intel TDX technology to rely on confidential computing to secure the data in use
As a user, I can choose OpenShift Nodes to be deployed with the Confidential VM capability on GCP using Intel TDX technology at install time
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
This is a piece of a higher-level effort to secure data in use with OpenShift on every platform
Documentation on how to use this new option must be added as usual
As LUKS encryption is required for certain customer environments e.g. being PCI compliant and the current implementation with Network Based LUKS encryption are a) complex and b) not reliable and secure we need to support our Customers with an way to have the Root Device encrypted on a secure way with IBM HW based HSM to secure the LUKS Key. This is a kind of TPM approach to store the luks key but fence it from the user.
Hardware based LUKS encryption requires injection of the read of secure keys in clevis during boot time.
Provide hardware based root volume encryption
Provide hardware based root volume encryption with LUKS
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | Y |
Classic (standalone cluster) | Y |
Hosted control planes | Y |
Multi node, Compact (three node), or Single node (SNO), or all | Y |
Connected / Restricted Network | Y |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | IBM Z |
Operator compatibility | n/a |
Backport needed (list applicable versions) | n/a |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | n/a |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Once ignition spec 3.5 stablizes, we should switch to using spec 3.5 as the default in the MCO to enable additional features in RHCOS.
(example: https://issues.redhat.com/browse/MULTIARCH-3776 needs 3.5)
This story covers all the needed work from the code side that needs to be done to support the 3.5 ignition spec.
To support 3.5 we need to, from a high level perspective:
Done When:
This epic is used to track tasks/stories delivered from OCP core engineering control plane gorup.
As a developer of TNF, I need:
Acceptance Criteria
As a developer of 2NO, I need:
Acceptance Criteria
As a developer of TNF, I need:
Acceptance Criteria
As a developer of 2NO, I need:
Acceptance Criteria
As a developer of 2NO, I need:
Acceptance Criteria
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Add a new topology metrics in https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/pkg/operator/configmetrics/configmetrics.go#L16-L44
This was discussed and recommended in the OCP Arch Call
As a developer of TNF, I need:
Acceptance Criteria
In order to add the TNF support to the authenticator operator it would be best to do the dependency update in a separate PR to avoid behavior differences between dep changes and TNF change.
As a developer of 2NO, I need:
Acceptance Criteria
This feature aims to comprehensively refactor and standardize various components across HCP, ensuring consistency, maintainability, and reliability. The overarching goal to increase customer satisfaction by increasing speed to market and saving engineering budget by reducing incidents/bugs. This will be achieved by reducing technical debt, improving code quality, and simplifying the developer experience across multiple areas, including CLI consistency, NodePool upgrade mechanisms, networking flows, and more. By addressing these areas holistically, the project aims to create a more sustainable and scalable codebase that is easier to maintain and extend.
Over time, the HyperShift project has grown organically, leading to areas of redundancy, inconsistency, and technical debt. This comprehensive refactor and standardization effort is a response to these challenges, aiming to improve the project's overall health and sustainability. By addressing multiple components in a coordinated way, the goal is to set a solid foundation for future growth and development.
Ensure all relevant project documentation is updated to reflect the refactored components, new abstractions, and standardized workflows.
This overarching feature is designed to unify and streamline the HCP project, delivering a more consistent, maintainable, and reliable platform for developers, operators, and users.
Goal
Refactor and modularize controllers and other components to improve maintainability, scalability, and ease of use.
Move bash kas bootstrapping into testable binary
As a (user persona), I want to be able to:
https://issues.redhat.com//browse/HOSTEDCP-1801 introduced a new abstraction to be used by ControlPlane components. We need to refactor every component to use this abstraction.
Description of criteria:
All ControlPlane Components are refactored:
Example PR to refactor cloud-credential-operator : https://github.com/openshift/hypershift/pull/5203
docs: https://github.com/openshift/hypershift/blob/main/support/controlplane-component/README.md
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
Upgrade the OCP console to Pattern Fly 6.
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
The core OCP Console should be upgraded to PF 6 and the Dynamic Plugin Framework should add support for PF6 and deprecate PF4.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Console, Dynamic Plugin Framework, Dynamic Plugin Template, and Examples all should be upgraded to PF6 and all PF4 code should be removed.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
As a company we have all agreed to getting our products to look and feel the same. The current level is PF6.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Some of the PatternFly releases in https://github.com/openshift/console/pull/14621 are prereleases. Once final releases are available (v.6.2.0 is scheduled for the end of March), we should update to them.
Also update https://github.com/openshift/console/blob/900c19673f6f3cebc1b57b6a0a9cadd1573950d9/dynamic-demo-plugin/package.json#L21-L24 to the same versions.
Most of the *-theme-dark classes defined in the console code base were for PF5 and are likely unnecessary in PF6 (although the version number was updated). We should evaluate each class and determine if it is still necessary. If it is not, we should remove it.
Console is adopting PF6 and removing the PF4 support. It creates lots of UI issues in the Developer Console which we need to support to fix.
Fix all the UI issues in the ODC related to PF6 upgrade
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
<your text here>
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
<your text here>
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Hypershift currently allows NodePools to be up to three minor versions behind the HostedCluster control plane (y-3), by virtue of refererencing the floating upstream docs (which changed from n-2 to n-3), but only tests configurations up to two minor versions behind at best (y-2).
This feature will align the allowed NodePool skew with the tested and supported versions to improve stability and prevent users from deploying unsupported configurations.
Hypershift currently allows for NodePool minor version skew based on the upstream Kubernetes skew policy. However, our testing capacity only allows us to fully validate up to y-2 skew at best. This mismatch creates a potential risk for users deploying unsupported configurations.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | both |
Classic (standalone cluster) | |
Hosted control planes | yes |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Customers who have deployed NodePools with a skew greater than y-2 may need to upgrade their NodePools before upgrading the HostedCluster control plane in the future.
The HCP documentation ] on NodePool versioning and upgrading needs to be updated to reflect the new supported skew limits.
Impacts ROSA/ARO HCP
The goal of this feature is to align the allowed NodePool minor version skew with the tested and supported versions (y-2) to improve stability and prevent users from deploying unsupported configurations. This feature ensures that only configurations that have been fully validated and tested are deployed, reducing the risk of instability or issues with unsupported version skews.
This is important because the current mismatch between the allowed NodePool skew (which allows up to y-3) and the actual tested configurations (which only support up to y-2) creates a risk for users deploying unsupported configurations. These unsupported configurations could lead to untested or unstable deployments, causing potential issues or failures within the cluster. By enforcing a stricter version skew policy, this change will:
Main Success Scenario:
Alternative Flow Scenario:
What items must be delivered by other teams/groups to enable delivery of this epic.
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
As a developer, I want to be able to:
Description of criteria:
Goal
Support for more than one disk in machineset API for vSphere provider
Feature description
Customers using vSphere should be able to create machines with more than one disk. This is already available for other cloud and on-prem providers.
Why do customers need this?
To have Proper disk layout that better address their needs. Some examples are using the local storage operator or ODF.
Affected packages or components
RHCOS, Machine API, Cluster Infrastructure, CAPV.
User Story:
As an OpenShift administrator, I need to be able to configure my OpenShift cluster to have additional disks on each vSphere VM so that I can use the new data disks for various OS needs.
Description:
This goal of this epic is to be able to allow the cluster administrator to install and configure after install new machines with additional disks attached to each virtual machine for various OS needs.
Required:
Nice to Have:
Acceptance Criteria:
Notes:
USER STORY:
As an OpenShift administrator, I want to be able to configure thin provisioned for my new data disks so that adjust the behavior that may be different than my default storage policy.
DESCRIPTION:
Currently, we have the machine api changes forcing the thin provisioned flag to true. We need to add a flag to allow admin to configure this. The default behavior will be to not set the flag and use default storage policy.
ACCEPTANCE CRITERIA:
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
To align with the 4.19 release, dependencies need to be updated to 1.30. This should be done by rebasing/updating as appropriate for the repository
We need to maintain our dependencies across all the libraries we use in order to stay in compliance.
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
<your text here>
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.
As a user, I do not want to load polyfills for browsers that OCP console no longer supports.
Add unit tests for the Timestamp component to prevent regressions like https://issues.redhat.com/browse/OCPBUGS-51202
AC:
Note: This feature will be a TechPreview in 4.16 since the newly introduced API must graduate to v1.
Overarching Goal
Customers should be able to update and boot a cluster without a container registry in disconnected environments. This feature is for Baremetal disconnected cluster.
Background
This epic describes the work required to GA a minimal viable version of the Machine Config Node feature to enable the subsequent GAing of the Pinned Image Sets feature. The GAing of status reporting as well as any further enhancements for the Machine Config Node feature will be tracked in MCO-1506.
Related Items:
Done when:
The first step in GAing the MCN API is finalizing the v1alpha1 API. This will allow for testing of the final API design before the API is graduated to V1. Since there are a fair amount of changes likely to be made for the MCN API, making our changes in v1alpha1 first seems to follow the API team’s preference of V1 API graduations only having minor changes.
Done when:
In order for Managed OpenShift Hosted Control Planes to run as part of the Azure Redhat OpenShift, it is necessary to support the new AKS design for secrets/identities.
Hosted Cluster components use the secrets/identities provided/referenced in the Hosted Cluster resources creation.
All OpenShift Hosted Cluster components running with the appropriate managed or workload identity.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | Managed |
Classic (standalone cluster) | No |
Hosted control planes | Yes |
Multi node, Compact (three node), or Single node (SNO), or all | All supported ARO/HCP topologies |
Connected / Restricted Network | All supported ARO/HCP topologies |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | All supported ARO/HCP topologies |
Operator compatibility | All core operators |
Backport needed (list applicable versions) | OCP 4.18.z |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | No |
Other (please specify) |
This is a follow-up to OCPSTRAT-979 required by an AKS sweeping change to how identities need to be handled.
Should only affect ARO/HCP documentation rather than Hosted Control Planes documentation.
Does not affect ROSA or any of the supported self-managed Hosted Control Planes platforms
As an ARO HCP user, I want to be able to:
so that
Description of criteria:
Updating any HyperShift-only components that run in the HCP
This does not require a design proposal.
This does not require a feature gate.
As an ARO HCP user, I want to be able to:
so that
Description of criteria:
Updating any external OpenShift components that run in the HCP
This does not require a design proposal.
This does not require a feature gate.
The installation process for the OpenShift Virtualization Engine (OVE) has been identified as a critical area for improvement to address customer concerns regarding its complexity compared to competitors like VMware, Nutanix, and Proxmox. Customers often struggle with disconnected environments, operator configuration, and managing external dependencies, making the initial deployment challenging and time-consuming.
To resolve these issues, the goal is to deliver a streamlined, opinionated installation workflow that leverages existing tools like the Agent-Based Installer, the Assisted Installer, and the OpenShift Appliance (all sharing the same underlying technology) while pre-configuring essential operators and minimizing dependencies, especially the need for an image registry before installation.
By focusing on enterprise customers, particularly VMware administrators working in isolated networks, this effort aims to provide a user-friendly, UI-based installation experience that simplifies cluster setup and ensures quick time-to-value.
VMware administrators transitioning to OpenShift Virtualization in isolated/disconnected environments.
The first area of focus is a disconnected environment. We target these environments with the Agent-Based Installer.
The current docs for installing on disconnected environment are very long and hard to follow.
The image registry is required in disconnected installations before the installation process can start. We must simplify this point so that users can start the installation with one image, without having to explicitly install one.
This isn't a new requirement and in the past we've analyzed options for this and even did a POC, we could revisit this point, see Deploy OpenShift without external registry in disconnected environments.
The OpenShift Appliance can in fact be installed without a registry.
Additionally, we started work in this direction AGENT-262 (Strategy to complete installations where there isn't a pre-existing registry).
We also had the field (Brandon Jozsa) doing a POC which was promising:
https://gist.github.com/v1k0d3n/cbadfb78d45498b79428f5632853112a
The type of users coming from VMware vSphere expect a UI. They aren't used to writing YAML files and this has been identified as a blocker for some of them. We must provide a simple UI to stand up a cluster.
https://miro.com/app/board/uXjVLja4xXQ=/
Currently the builder script embeds the agent-setup-tui.service in the ignition files, but the script directly in the ISO. For consistency, also the script should be placed inside the ISO ignition
Recently the appliance allowed using an internal registry (see https://github.com/openshift/appliance/pull/349).
Modify the script to use that (instead of the external one), and test the installation workflow.
Once CCM was moved out-of-tree for Azure the 'azurerm_user_assigned_identity' resource the Installer creates is not required anymore. To make sure the Installer only creates the minimum permissions required to deploy OpenShift on Azure this resource created at install time needs to be removed
The installer doesn't create the 'azurerm_user_assigned_identity' resource anymore that is no longer required for the Nodes
**
The Installer only creates the minimum permissions required to deploy OpenShift on Azure
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Once CCM was moved out-of-tree this permission is not required anymore. We are implementing this change into 4.19 and backported to 4.18.z
At the same time, for customers running previous OpenShift releases, we will test upgrades between EUS releases (4.14.z - 4.16.z - 4.18.z) when `azurerm_user_assigned_identity` resource is removed previously to ensure the upgrade process is working with no issues and OpenShift is not reporting any issues because of this change
A KCS will be created for customers running previous OpenShift releases who want to remove this resource
The new permissions requirements will be documented
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Enable OpenShift to be deployed on Confidential VMs on GCP using AMD SEV-SNP technology
Users deploying OpenShift on GCP can choose to deploy Confidential VMs using AMD SEV-SNP technology to rely on confidential computing to secure the data in use
As a user, I can choose OpenShift Nodes to be deployed with the Confidential VM capability on GCP using AMD SEV-SNP technology at install time
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
This is a piece of a higher-level effort to secure data in use with OpenShift on every platform
Documentation on how to use this new option must be added as usual
Goal
Add Nutanix platform integration support to the Agent-based Installer
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Implement Migration core for MAPI to CAPI for AWS
When customers use CAPI, There must be no negative effect to switching over to using CAPI . Seamless migration of Machine resources. the fields in MAPI/CAPI should reconcile from both CRDs.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
When converting CAPI2MAPI, we convert CAPA's `AdditionalSecurityGroups` into the security groups for MAPA. While this looks correct, there are also fields like `SecurityGroupOverrides` which when present currently, would cause an error.
We need to understand how security groups work today in MAPA, compare that to CAPA, and be certain that we are correctly handling the conversion here.
Is CAPA doing anything else under the hood? Is it currently applying extra security groups that are standard that would otherwise cause issues?
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
To enable CAPI MachineSets to still mirror MAPI MachineSets accurately, and to enable MAPI MachineSets to be implemented by CAPI MachineSets in the future, we need to implement a way to convert CAPI Machines back into MAPI Machines.
These steps assume that the CAPI Machine is authoritative, or, that there is no MAPI Machines.
Presently, the mapi2capi and capi2mapi code cannot handle translations of owner references.
We need to be able to map CAPI/MAPI machines to their correct CAPI/MAPI MachineSet/CPMS and have the owner references correctly set.
This requires identifying the correct owner and determining the correct UID to set.
This will likely mean extending the conversion utils to be able to make API calls to identify the correct owners.
Owner references for non-MachineSet types should still cause an error.
PERSONAS:
The following personas are borrowed from Hypershift docs used in the user stories below.
USER STORY:
ACCEPTANCE CRITERIA:
What is "done", and how do we measure it? You might need to duplicate this a few times.
Given a
When b
Then c
CUSTOMER EXPERIENCE:
Only fill this out for Product Management / customer-driven work. Otherwise, delete it.
BREADCRUMBS:
Where can SREs look for additional information? Mark with "N/A" if these items do not exist yet so Functional Teams know they need to create them.
NOTES:
If there's anything else to add.
As a hypershift CLI user, I want to be able to disable the image registry capability when creating hosted clusters via `hypershift create cluster`.
Mark with an X when done; strikethrough for non-applicable items. All items
must be considered before closing this issue.
[ ] Ensure all pull request (PR) checks, including ci & e2e, are passing
[ ] Document manual test steps and results
[ ] Manual test steps executed by someone other than the primary implementer or a test artifact such as a recording are attached
[ ] All PRs are merged
[ ] Ensure necessary actions to take during this change's release are communicated and documented
[ ] Troubleshooting Guides (TSGs), ADRs, or other documents are updated as necessary
TBD
GROOMING CHECKLIST:
You can find out more information about ARO workflow, including roles and responsibilities here. Some items in the list should be left for Team Leads (TL) and Region Leads (RL) to perform. Otherwise, all other fields should be populated.
USER STORY:
What are we attempting to achieve? You might need to duplicate this a few times.
As a/an a
I want b
So that c
ACCEPTANCE CRITERIA:
What is "done", and how do we measure it? You might need to duplicate this a few times.
Given a
When b
Then c
CUSTOMER EXPERIENCE:
Only fill this out for Product Management / customer-driven work. Otherwise, delete it.
BREADCRUMBS:
Where can SREs look for additional information? Mark with "N/A" if these items do not exist yet so Functional Teams know they need to create them.
NOTES:
If there's anything else to add.
This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were not completed when this image was assembled
All images using cachito on Brew should also work with cachi2 on Konflux. https://issues.redhat.com/browse/ART-11902 outlines the ART automation that will support these changes, but ARTists can start testing by adding the annotations to the PipelineRun directly.
If an image build fails on konflux that requires changes to the Dockerfile, an OCPBUGS ticket should be raised. The process doc (which is attached to this ticket) should also be attached to the bugs ticket. ARTists will work with the image owners to hash out any issues until the image builds successful on both Konflux and Brew
Deprecate high_availability_mode as it was replaced by control_plane_count
high_availability_mode is no longer used in our code
Yes
Following our migration to konflux in MGMT-18343, we will use this epic for future tasks related to konflux.
More and more tasks are becoming mandatory in Konflux pipeline
Konflux used to have an automation that opened PR to add those tasks. It seems it's not triggered anymore, so we ave to add those tasks manually.
As of today, it raises a warning in the IntegrationTest pipeline that is very likely not seen by anyone. (The build pipeline is not raising any warning)
In the short term we have to add those tasks to all pipelines (maybe only the product one ? I haven't checked)
In the long term, if we can't have the konflux PR back, we should have some automation that detects the warning and inform us we have to update the pipelines
Slack thread: https://redhat-internal.slack.com/archives/C04PZ7H0VA8/p1741091688194839
PR example: https://github.com/openshift/assisted-service/pull/7358
Add support for syncing CA bundle to the credentials generated by Cloud Credential Operator.
It it generally necessary to provide a CA file to OpenStack clients in order to communicate with a cloud that uses self-signed certificates. The cloud-credential-operator syncs clouds.yaml files to various namespaces so that services running in those namespaces are able to communicate with the cloud, but it does not sync the CA file. Instead, this must be managed using another mechanism. This has led to some odd situations, such as the Cinder CSI driver operator inspecting cloud-provider configuration to pull out this file.
We should start syncing not only the clouds.yaml file but also the CA file to anyone that requests it via a CredentialsRequest. Once we've done this, we can modify other components such as the Installer, CSI Driver Operator, Hypershift, and CCM Operator to pull the CA file from the same secrets that they pull the clouds.yaml from, rather than the litany of places they currently use.
None.
None.
None.
The Installer creates the initial version of the root credential secret at kube-system / openstack-credentials, which cloud-credential-operator (CCO) will consume. Once we have support in CCO for consuming a CA cert from this root credential, we should modify the Installer to start populating the CA cert field. We should also stop adding the CA cert to the openshift-cloud-controller-manager / cloud-conf config map since the Cloud Config Operator (and CSI Drivers) will be able to start consuming the CA cert from the secret instead. This may need to be done separately depending on the order that patches land in.
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
Why is this important? (mandatory)
What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Feature gates must demonstrate completeness and reliability.
As per https://github.com/openshift/api?tab=readme-ov-file#defining-featuregate-e2e-tests:
If your FeatureGate lacks automated testing, there is an exception process that allows QE to sign off on the promotion by commenting on the PR.
The introduced functionality is not that complex. The only newly introduced ability is to modify the CVO log level using the API. However, we should still introduce an e2e test or tests to demonstrate that the CVO correctly reconciles the new configuration API.
The tests may be:
Definition of Done:
This epic is part of the 4.18 initiatives we discussed, it includes:
The annotation code in origin and k8s-tests should be removed and replaced, or refactored to at least not inject the annotations into the test names themselves. After TRT-1840 and TRT-1852 you can skip based on labels and other criteria. Skip information should be decided at run-time, and should not require revendoring.
This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were completed when this image was assembled
warning React Hook React.useMemo has a missing dependency: 'hasRevealableContent'
Description of problem:
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
This fix updates OpenShift 4.19 to Kubernetes v1.32.3, incorporating the latest upstream changes and fixes.
For details on the changes included in this update, see the Kubernetes changelog:
Description of problem:
The TestControllerEventuallyReconciles within the e2e-gcp-op-ocl test suite fails very often, which prevents the rest of the job from running. This causes reduced confidence in the test suite and lowers the overall quality signal for OCL.
Version-Release number of selected component (if applicable):
N/A
How reproducible:
Often.
Steps to Reproduce:
Run the e2e-gcp-op-ocl job by opening a PR. The job will eventually fail on this test.
Actual results:
The test, TestControllerEventuallyReconciles fails on a fairly consistent basis.
Expected results:
The test should pass.
Additional info:
I suspect that part of the problem is that the "success" criteria between the Build Controller and the e2e test suite are not the same. As part of the potential fix I've found, I exported the success criteria function so that it can be reused with the e2e test suite and I've also set certain hard-coded values as constants instead so that they can be adjusted from one central place.
Description of problem:
Azure creates a nic in "provisioning failed" and the code is not checking the provisioning status.
Version-Release number of selected component (if applicable):
4.12
How reproducible:
100%
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
https://github.com/openshift/machine-api-provider-azure/blob/main/pkg/cloud/azure/actuators/machine/reconciler.go https://pkg.go.dev/github.com/Azure/azure-sdk-for-go@v68.0.0+incompatible/services/network/mgmt/2021-02-01/network#InterfacePropertiesFormat
Description of problem:
When debugging a node using the OpenShift Console, the logs of the <NodeName>-debug pod are not accessible from either the Console UI or the CLI. However, when debugging the node via CLI (oc debug node/<node_name>), the logs are accessible as expected.
Version-Release number of selected component (if applicable):
OpenShift Versions Tested: 4.8.14, 4.8.18, 4.9.0 ... so 4.12
How reproducible:
always
Steps to Reproduce:
1. Open OpenShift Console. 2. Navigate to Compute → Node → <node_name> → Terminal. 3. Run any command in the terminal. 4. A new <NodeName>-debug pod is created in a dynamically generated namespace (openshift-debug-node-xxx). 5. Try to access logs: Console UI: Workloads → Pod → <NodeName>-debug → Logs → Logs not visible. CLI: Run oc logs <NodeName-debug_pod> -n <openshift-debug-node-xxx> → No logs available.
Actual results:
Logs of the <NodeName>-debug pod are not available in either the Console UI or CLI when debugging via Console.
Expected results:
The <NodeName>-debug pod logs should be accessible in both the Console UI and CLI, similar to the behavior observed when debugging via oc debug node/<node_name>.
Additional info:
Debugging via CLI (oc debug node/<node_name>) creates the debug pod in the current namespace (e.g., <project_name>). Logs are accessible via: $ oc logs -n <project_name> -f <NodeName-debug_pod> Debugging via Console creates the pod in a new dynamic namespace (openshift-debug-node-xxx), and logs are not accessible. Possible Cause: Namespace issue - Debug pod is created in openshift-debug-node-xxx, which may not be configured to expose logs correctly.
Description of problem:
The OpenShift-Installer does not validate if the apiVIPs and ingressVIPs are specified when the load balancer is configured as UserManaged and fall back to the default behaviour where it picks the 5th and 7th IPs of the machine network
Version-Release number of selected component (if applicable):
4.18
How reproducible:
100%
Steps to Reproduce:
1. Create an install-config.yaml file with the following content: $ cat ocp4/install-config.yaml apiVersion: v1 baseDomain: mydomain.test compute: - name: worker platform: openstack: type: m1.xlarge replicas: 3 controlPlane: name: master platform: openstack: type: m1.xlarge replicas: 3 metadata: name: mycluster networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 machineNetwork: - cidr: 192.168.10.0/24 platform: openstack: loadBalancer: type: UserManaged 2. Run the following command to generate manifests: $ openshift-installer create manifests --dir ocp4 3. Check the generated cluster-config.yaml: $ cat ocp4/manifests/cluster-config.yaml 4.Observe the following unexpected output: platform: openstack: cloud: openstack externalDNS: null apiVIPs: - 192.168.10.5 ingressVIPs: - 192.168.10.7 loadBalancer: type: UserManaged
Actual results:
The apiVIPs and ingressVIPs fields are unexpectedly added to cluster-config.yaml.
Expected results:
The apiVIPs and ingressVIPs fields should not be automatically assigned.
Additional info:
Description of problem:
Create cluster on instance type Standard_M8-4ms, installer failed to provision machines. install-config: ================ controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: azure: type: Standard_M8-4ms Create cluster: ===================== $ ./openshift-install create cluster --dir ipi3 INFO Waiting up to 15m0s (until 2:31AM UTC) for machines [jimainstance01-h45wv-bootstrap jimainstance01-h45wv-master-0 jimainstance01-h45wv-master-1 jimainstance01-h45wv-master-2] to provision... ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: control-plane machines were not provisioned within 15m0s: client rate limiter Wait returned an error: context deadline exceeded INFO Shutting down local Cluster API controllers... INFO Stopped controller: Cluster API WARNING process cluster-api-provider-azure exited with error: signal: killed INFO Stopped controller: azure infrastructure provider INFO Stopped controller: azureaso infrastructure provider INFO Shutting down local Cluster API control plane... INFO Local Cluster API system has completed operation In openshift-install.log, all machines were created failed with below error: ================= time="2024-09-20T02:17:07Z" level=debug msg="I0920 02:17:07.757980 1747698 recorder.go:104] \"failed to reconcile AzureMachine: failed to reconcile AzureMachine service virtualmachine: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string '218.75' as int64: strconv.ParseInt: parsing \\\"218.75\\\": invalid syntax. Object will not be requeued\" logger=\"events\" type=\"Warning\" object={\"kind\":\"AzureMachine\",\"namespace\":\"openshift-cluster-api-guests\",\"name\":\"jimainstance01-h45wv-bootstrap\",\"uid\":\"d67a2010-f489-44b4-9be9-88d7b136a45b\",\"apiVersion\":\"infrastructure.cluster.x-k8s.io/v1beta1\",\"resourceVersion\":\"1530\"} reason=\"ReconcileError\"" ... time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-bootstrap has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-bootstrap has not yet provisioned: Failed" time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-0 has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-0 has not yet provisioned: Failed" time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-1 has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-1 has not yet provisioned: Failed" time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-2 has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-2 has not yet provisioned: Failed" ... Also see same error in .clusterapi_output/Machine-openshift-cluster-api-guests-jimainstance01-h45wv-bootstrap.yaml =================== $ yq-go r Machine-openshift-cluster-api-guests-jimainstance01-h45wv-bootstrap.yaml 'status' noderef: null nodeinfo: null lastupdated: "2024-09-20T02:17:07Z" failurereason: CreateError failuremessage: 'Failure detected from referenced resource infrastructure.cluster.x-k8s.io/v1beta1, Kind=AzureMachine with name "jimainstance01-h45wv-bootstrap": failed to reconcile AzureMachine service virtualmachine: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string ''218.75'' as int64: strconv.ParseInt: parsing "218.75": invalid syntax. Object will not be requeued' addresses: [] phase: Failed certificatesexpirydate: null bootstrapready: false infrastructureready: false observedgeneration: 1 conditions: - type: Ready status: "False" severity: Error lasttransitiontime: "2024-09-20T02:17:07Z" reason: Failed message: 0 of 2 completed - type: InfrastructureReady status: "False" severity: Error lasttransitiontime: "2024-09-20T02:17:07Z" reason: Failed message: 'virtualmachine failed to create or update. err: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string ''218.75'' as int64: strconv.ParseInt: parsing "218.75": invalid syntax. Object will not be requeued' - type: NodeHealthy status: "False" severity: Info lasttransitiontime: "2024-09-20T02:16:27Z" reason: WaitingForNodeRef message: "" From above error, seems unable to parse the memory of instance type Standard_M8-4ms, which is a decimal, not an integer. $ az vm list-skus --size Standard_M8-4ms --location southcentralus | jq -r '.[].capabilities[] | select(.name=="MemoryGB")' { "name": "MemoryGB", "value": "218.75" }
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-09-16-082730
How reproducible:
Always
Steps to Reproduce:
1. set controlPlane type as Standard_M8-4ms in install-config 2. create cluster 3.
Actual results:
Installation failed
Expected results:
Installation succeeded
Additional info:
We need to bump the Kubernetes Version. To the latest API version OCP is using.
This what was done last time:
https://github.com/openshift/cluster-samples-operator/pull/409
Find latest stable version from here: https://github.com/kubernetes/api
This is described in wiki: https://source.redhat.com/groups/public/appservices/wiki/cluster_samples_operator_release_activities
When installing into a new stateroot, if the image was already unencapsulated and its found on the fs, ostree panics and the installation fails
Description of problem:
Console show time out error when trying to edit deployment with annotation `image.openshift.io/triggers: ''`
Version-Release number of selected component (if applicable):
4.12
How reproducible:
Always
Steps to Reproduce:
1. Install a 4.12 cluster 2. Create a deployment withh annotation `image.openshift.io/triggers: ''` 3. Select edit deployment in console 4. Console gives time out error
Actual results:
Console gives time out error
Expected results:
Console should be able to handle bad values
Additional info:
The issue is observed when we check from actions section.Deployment-><name_of_deployment>>Actions-> Edit DeploymentThe page gives error when annotation is present as: "Oh no! Something went wrong"When annotation is removed, deployment is shown.
Description of problem:
The position of the play/pause button in the events page is different when there are no events vs when there are events
Version-Release number of selected component (if applicable):
4.19.0
How reproducible:
always
Steps to Reproduce:
1. open the events page 2. observe play/pause button position shift
Actual results:
the button moves
Expected results:
no shift
Additional info:
Description of problem:
/k8s/all-namespaces/volumesnapshots returns 404 Page Not Found
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2025-03-17-135359
How reproducible:
Always
Steps to Reproduce:
1. navigate to Storage -> VolumeSnapshots, make sure 'All Projects' selected 2. Click on 'Create VolumeSnapshot' button, user will be redirected to /k8s/ns/default/volumesnapshots/~new/form page and project selection will be changed to `default 3. open project selector dropdown and change project to 'All Projects' again $ oc get volumesnapshots -A No resources found
Actual results:
3. URL path will be changed to /k8s/all-namespaces/volumesnapshots and we will see error 404: Page Not Found The server doesn't have a resource type "volumesnapshots". Try refreshing the page if it was recently added.
Expected results:
3. should display volumesnapshots in all projects, volumesnapshots resources can be successfully listed/queried $ oc get volumesnapshots -A No resources found
Additional info:
Description of problem:
We should use resource kind HelmChartRepository on details page, action items and breadcrumb link
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-03-09-063419
How reproducible:
Always
Steps to Reproduce:
1. navigate to Helm -> Repositories page, click on on HelmChartRepository 2. Check the details page heading name, breadcrumb link name and action items name 3.
Actual results:
Details page heading is: Helm Chart Repository Breadcrumb link name is: Repositories -> Helm Chart Repository details Two action items are: Edit Helm Chart Repository and Delete Helm Chart Repository
Expected results:
We should use HelmChartRepository(no space between words) in these places
Additional info:
Description of problem:
During debugging ocp-42855 failure, hostedcluster conditions Degraded is True
Version-Release number of selected component (if applicable):
quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64
How reproducible:
follow ocp-42855 test steps
Steps to Reproduce:
1.Create a basic hosted cluster using hypershift tool 2.check hostedcluster conditions
Actual results:
[hmx@ovpn-12-45 hypershift]$ oc get pods -n clusters-mihuanghy NAME READY STATUS RESTARTS AGE aws-ebs-csi-driver-controller-9c46694f-mqrlc 7/7 Running 0 55m aws-ebs-csi-driver-operator-5d7867bc9f-hqzd5 1/1 Running 0 55m capi-provider-6df855dbb5-tcmvq 2/2 Running 0 58m catalog-operator-7544b8d6d8-dk4hh 2/2 Running 0 57m certified-operators-catalog-7f8f6598b5-2blv4 0/1 CrashLoopBackOff 15 (4m20s ago) 57m cloud-network-config-controller-545fcfc797-mgszj 3/3 Running 0 55m cluster-api-54c7f7c477-kgvzn 1/1 Running 0 58m cluster-autoscaler-658756f99-vr2hk 1/1 Running 0 58m cluster-image-registry-operator-84d84dbc9f-zpcsq 3/3 Running 0 57m cluster-network-operator-9b6985cc8-sd7d7 1/1 Running 0 57m cluster-node-tuning-operator-65c8f6fbb9-xzpws 1/1 Running 0 57m cluster-policy-controller-b5c76cf58-b4rth 1/1 Running 0 57m cluster-storage-operator-7474f76c99-9chl7 1/1 Running 0 57m cluster-version-operator-646d97ccc9-l72m5 1/1 Running 0 57m community-operators-catalog-774fdb48fc-z6s4d 1/1 Running 0 57m control-plane-operator-5bc8c4c996-4nz8c 2/2 Running 0 58m csi-snapshot-controller-5b7d6bb685-vf8rf 1/1 Running 0 55m csi-snapshot-controller-operator-6f74db85c6-89bts 1/1 Running 0 57m csi-snapshot-webhook-57c5bd7f85-lqnwf 1/1 Running 0 55m dns-operator-767c5bbdd8-rb7fl 1/1 Running 0 57m etcd-0 2/2 Running 0 58m hosted-cluster-config-operator-88b9d49b7-2gvbt 1/1 Running 0 57m ignition-server-949d9fd8c-cgtxb 1/1 Running 0 58m ingress-operator-5c6f5d4f48-gh7fl 3/3 Running 0 57m konnectivity-agent-79c5ff9585-pqctc 1/1 Running 0 58m konnectivity-server-65956d468c-lpwfv 1/1 Running 0 58m kube-apiserver-d9f887c4b-xwdcx 5/5 Running 0 58m kube-controller-manager-64b6f757f9-6qszq 2/2 Running 0 52m kube-scheduler-58ffcdf789-fch2n 1/1 Running 0 57m machine-approver-559d66d4d6-2v64w 1/1 Running 0 58m multus-admission-controller-8695985fbc-hjtqb 2/2 Running 0 55m oauth-openshift-6b9695fc7f-pf4j6 2/2 Running 0 55m olm-operator-bf694b84-gvz6x 2/2 Running 0 57m openshift-apiserver-55c69bc497-x8bft 2/2 Running 0 52m openshift-controller-manager-8597c66d58-jb7w2 1/1 Running 0 57m openshift-oauth-apiserver-674cd6df6d-ckg55 1/1 Running 0 57m openshift-route-controller-manager-76d78f897c-9mfmj 1/1 Running 0 57m ovnkube-master-0 7/7 Running 0 55m packageserver-7988d8ddfc-wnh6l 2/2 Running 0 57m redhat-marketplace-catalog-77547cc685-hnh65 0/1 CrashLoopBackOff 15 (4m15s ago) 57m redhat-operators-catalog-7784d45f54-58lgg 1/1 Running 0 57m { "lastTransitionTime": "2022-12-31T18:45:28Z", "message": "[certified-operators-catalog deployment has 1 unavailable replicas, redhat-marketplace-catalog deployment has 1 unavailable replicas]", "observedGeneration": 3, "reason": "UnavailableReplicas", "status": "True", "type": "Degraded" },
Expected results:
Degraded is False
Additional info:
$ oc describe pod certified-operators-catalog-7f8f6598b5-2blv4 -n clusters-mihuanghy Name: certified-operators-catalog-7f8f6598b5-2blv4 Namespace: clusters-mihuanghy Priority: 100000000 Priority Class Name: hypershift-control-plane Node: ip-10-0-202-149.us-east-2.compute.internal/10.0.202.149 Start Time: Sun, 01 Jan 2023 02:47:03 +0800 Labels: app=certified-operators-catalog hypershift.openshift.io/control-plane-component=certified-operators-catalog hypershift.openshift.io/hosted-control-plane=clusters-mihuanghy olm.catalogSource=certified-operators pod-template-hash=7f8f6598b5 Annotations: hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64 k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.38" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.38" ], "default": true, "dns": {} }] openshift.io/scc: restricted-v2 seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP: 10.131.0.38 IPs: IP: 10.131.0.38 Controlled By: ReplicaSet/certified-operators-catalog-7f8f6598b5 Containers: registry: Container ID: cri-o://f32b8d4c31b729c1b7deef0da622ddd661d840428aa4847968b1b2b3bf76b6cf Image: registry.redhat.io/redhat/certified-operator-index:v4.11 Image ID: registry.redhat.io/redhat/certified-operator-index@sha256:93f667597eee33b9bdbc9a61af60978b414b6f6df8e7c5f496c4298c1dfe9b62 Port: 50051/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Sun, 01 Jan 2023 03:39:44 +0800 Finished: Sun, 01 Jan 2023 03:39:44 +0800 Ready: False Restart Count: 15 Requests: cpu: 10m memory: 160Mi Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3 Startup: exec [grpc_health_probe -addr=:50051] delay=0s timeout=1s period=10s #success=1 #failure=15 Environment: <none> Mounts: <none> Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: <none> QoS Class: Burstable Node-Selectors: <none> Tolerations: hypershift.openshift.io/cluster=clusters-mihuanghy:NoSchedule hypershift.openshift.io/control-plane=true:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 54m default-scheduler Successfully assigned clusters-mihuanghy/certified-operators-catalog-7f8f6598b5-2blv4 to ip-10-0-202-149.us-east-2.compute.internal Normal AddedInterface 53m multus Add eth0 [10.131.0.38/23] from openshift-sdn Normal Pulling 53m kubelet Pulling image "registry.redhat.io/redhat/certified-operator-index:v4.11" Normal Pulled 53m kubelet Successfully pulled image "registry.redhat.io/redhat/certified-operator-index:v4.11" in 40.628843349s Normal Pulled 52m (x3 over 53m) kubelet Container image "registry.redhat.io/redhat/certified-operator-index:v4.11" already present on machine Normal Created 52m (x4 over 53m) kubelet Created container registry Normal Started 52m (x4 over 53m) kubelet Started container registry Warning BackOff 3m59s (x256 over 53m) kubelet Back-off restarting failed container $ oc describe pod redhat-marketplace-catalog-77547cc685-hnh65 -n clusters-mihuanghy Name: redhat-marketplace-catalog-77547cc685-hnh65 Namespace: clusters-mihuanghy Priority: 100000000 Priority Class Name: hypershift-control-plane Node: ip-10-0-202-149.us-east-2.compute.internal/10.0.202.149 Start Time: Sun, 01 Jan 2023 02:47:03 +0800 Labels: app=redhat-marketplace-catalog hypershift.openshift.io/control-plane-component=redhat-marketplace-catalog hypershift.openshift.io/hosted-control-plane=clusters-mihuanghy olm.catalogSource=redhat-marketplace pod-template-hash=77547cc685 Annotations: hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64 k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.40" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.40" ], "default": true, "dns": {} }] openshift.io/scc: restricted-v2 seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP: 10.131.0.40 IPs: IP: 10.131.0.40 Controlled By: ReplicaSet/redhat-marketplace-catalog-77547cc685 Containers: registry: Container ID: cri-o://7afba8993dac8f1c07a2946d8b791def3b0c80ce62d5d6160770a5a9990bf922 Image: registry.redhat.io/redhat/redhat-marketplace-index:v4.11 Image ID: registry.redhat.io/redhat/redhat-marketplace-index@sha256:074498ac11b5691ba8975e8f63fa04407ce11bb035dde0ced2f439d7a4640510 Port: 50051/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Sun, 01 Jan 2023 03:39:49 +0800 Finished: Sun, 01 Jan 2023 03:39:49 +0800 Ready: False Restart Count: 15 Requests: cpu: 10m memory: 340Mi Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3 Startup: exec [grpc_health_probe -addr=:50051] delay=0s timeout=1s period=10s #success=1 #failure=15 Environment: <none> Mounts: <none> Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: <none> QoS Class: Burstable Node-Selectors: <none> Tolerations: hypershift.openshift.io/cluster=clusters-mihuanghy:NoSchedule hypershift.openshift.io/control-plane=true:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 55m default-scheduler Successfully assigned clusters-mihuanghy/redhat-marketplace-catalog-77547cc685-hnh65 to ip-10-0-202-149.us-east-2.compute.internal Normal AddedInterface 55m multus Add eth0 [10.131.0.40/23] from openshift-sdn Normal Pulling 55m kubelet Pulling image "registry.redhat.io/redhat/redhat-marketplace-index:v4.11" Normal Pulled 54m kubelet Successfully pulled image "registry.redhat.io/redhat/redhat-marketplace-index:v4.11" in 40.862526792s Normal Pulled 53m (x3 over 54m) kubelet Container image "registry.redhat.io/redhat/redhat-marketplace-index:v4.11" already present on machine Normal Created 53m (x4 over 54m) kubelet Created container registry Normal Started 53m (x4 over 54m) kubelet Started container registry Warning BackOff 21s (x276 over 54m) kubelet Back-off restarting failed container $ oc describe deployment redhat-marketplace-catalog -n clusters-mihuanghy Name: redhat-marketplace-catalog Namespace: clusters-mihuanghy CreationTimestamp: Sun, 01 Jan 2023 02:47:03 +0800 Labels: hypershift.openshift.io/managed-by=control-plane-operator Annotations: deployment.kubernetes.io/revision: 1 Selector: olm.catalogSource=redhat-marketplace Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=redhat-marketplace-catalog hypershift.openshift.io/control-plane-component=redhat-marketplace-catalog hypershift.openshift.io/hosted-control-plane=clusters-mihuanghy olm.catalogSource=redhat-marketplace Annotations: hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64 Containers: registry: Image: registry.redhat.io/redhat/redhat-marketplace-index:v4.11 Port: 50051/TCP Host Port: 0/TCP Requests: cpu: 10m memory: 340Mi Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3 Startup: exec [grpc_health_probe -addr=:50051] delay=0s timeout=1s period=10s #success=1 #failure=15 Environment: <none> Mounts: <none> Volumes: <none> Priority Class Name: hypershift-control-plane Conditions: Type Status Reason ---- ------ ------ Available False MinimumReplicasUnavailable Progressing False ProgressDeadlineExceeded OldReplicaSets: <none> NewReplicaSet: redhat-marketplace-catalog-77547cc685 (1/1 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 22m deployment-controller Scaled up replica set redhat-marketplace-catalog-77547cc685 to 1 [hmx@ovpn-12-45 hypershift]$ oc get hostedcluster -A NAMESPACE NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE clusters mihuanghy 4.12.0-rc.6 mihuanghy-admin-kubeconfig Completed True False The hosted control plane is available $ oc describe deployment certified-operators-catalog -n clusters-mihuanghy Name: certified-operators-catalog Namespace: clusters-mihuanghy CreationTimestamp: Sun, 01 Jan 2023 02:47:03 +0800 Labels: hypershift.openshift.io/managed-by=control-plane-operator Annotations: deployment.kubernetes.io/revision: 1 Selector: olm.catalogSource=certified-operators Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=certified-operators-catalog hypershift.openshift.io/control-plane-component=certified-operators-catalog hypershift.openshift.io/hosted-control-plane=clusters-mihuanghy olm.catalogSource=certified-operators Annotations: hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.12.0-rc.6-x86_64 Containers: registry: Image: registry.redhat.io/redhat/certified-operator-index:v4.11 Port: 50051/TCP Host Port: 0/TCP Requests: cpu: 10m memory: 160Mi Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3 Startup: exec [grpc_health_probe -addr=:50051] delay=0s timeout=1s period=10s #success=1 #failure=15 Environment: <none> Mounts: <none> Volumes: <none> Priority Class Name: hypershift-control-plane Conditions: Type Status Reason ---- ------ ------ Available False MinimumReplicasUnavailable Progressing False ProgressDeadlineExceeded OldReplicaSets: <none> NewReplicaSet: certified-operators-catalog-7f8f6598b5 (1/1 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 21m deployment-controller Scaled up replica set certified-operators-catalog-7f8f6598b5 to 1
Description of problem:
The "Tell us about your experience" modal has a light theme graphic in dark mode, making it hard to find the close button
Version-Release number of selected component (if applicable):
4.19.0
How reproducible:
always
Steps to Reproduce:
1. Open the feedback modal in dark theme 2. 3.
Actual results:
the graphic is not dark mode
Expected results:
the graphic is dark mode
Additional info:
When a (Fibre Channel) multipath disk is discovered by the assisted-installer-agent, the wwn field is not included:
{ "bootable": true, "by_id": "/dev/disk/by-id/wwn-0xdeadbeef", "drive_type": "Multipath", "has_uuid": true, "holders": "dm-3,dm-5,dm-7", "id": "/dev/disk/by-id/wwn-0xdeadbeef", "installation_eligibility": { "eligible": true, "not_eligible_reasons": null }, "name": "dm-2", "path": "/dev/dm-2", "size_bytes": 549755813888 },
Thus there is no way to match this disk with a wwn: root device hint. Since assisted does not allow installing directly to a fibre channel disk (without multipath) until 4.19 with MGMT-19631, and there is no /dev/disk/by-path/ symlink for a multipath device, this means that when there are multiple multipath disks in the system there is no way to select between them other than by size.
When ghw lists the disks, it fills in the WWN field from the ID_WWN_WITH_EXTENSION or ID_WWN udev values. It's not clear to me how udev is creating the /dev/disk/by-id/ symlink without those fields. There is a separate DM_WWN field (DM = Device Mapper), but I don't see it used in udev rules for whole disks, only for partitions. I don't have access to any hardware so it's impossible to say what the data in /run/udev/data looks like.
Description of problem:
"Export as CSV" on "Observe"->"Alerting" page is not marked for i18n.
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2024-12-12-133926
How reproducible:
Always
Steps to Reproduce:
1.Check "Export as CSV" on "Observe"->"Alerting" page. 2. 3.
Actual results:
1. It's not marked for i18n
Expected results:
1. Should marked for i18n
Additional info:
"Export as CSV" also need i18n for each languages.
Description of problem:
Two favorite icon is shows on same page. Operator details page with CR.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. install Red Hat Serverless operator 2. navigate to Operator details > knative serving page
Actual results:
Two star icon on the same page
Expected results:
Only one star icon should present on a page
Additional info:
HyperShift currently seems to only maintain one version at a time in status on a FeatureGate resource. For example, in a HostedControlPlane that had been installed a while back, and recently done 4.14.37 > 4.14.38 > 4.14.39, the only version in FeatureGate was 4.14.39:
$ jq -r '.status.featureGates[].version' featuregates.yaml 4.14.39
Compare that with standalone clusters, where FeatureGates status is appended with each release. For example, in this 4.18.0-rc.0 to 4.18.0-rc.1 CI run:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/release-openshift-origin-installer-e2e-aws-upgrade/1865110488958898176/artifacts/e2e-aws-upgrade/must-gather.tar | tar -xOz quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-b7fd0a8ff4df55c00e9e4e676d8c06fad2222fe83282fbbea3dad3ff9aca1ebb/cluster-scoped-resources/config.openshift.io/featuregates/cluster.yaml | yaml2json | jq -r '.status.featureGates[].version' 4.18.0-rc.1 4.18.0-rc.0
The append approach allows consumers to gracefully transition over time, as they each update from the outgoing version to the incoming version. With the current HyperShift logic, there's a race between the FeatureGate status bump and the consuming component bumps:
In this bug, I'm asking for HyperShift to adopt the standalone approach of appending to FeatureGate status instead of dropping the outgoing version, to avoid that kind of race window. At least until there's some assurance that the update to the incoming version has completely rolled out. Standalone pruning removes versions that no longer exist in ClusterVersion history. Checking a long-lived standalone cluster I have access to, I see:
$ oc get -o json featuregate cluster | jq -r '.status.featureGates[].version' 4.18.0-ec.4 4.18.0-ec.3 ... 4.14.0-ec.1 4.14.0-ec.0 $ oc get -o json featuregate cluster | jq -r '.status.featureGates[].version' | wc -l 27
so it seems like pruning is currently either non-existent, or pretty relaxed.
Seen in a 4.14.38 to 4.14.39 HostedCluster update. May or may not apply to more recent 4.y.
Unclear
Steps to Reproduce
When vB is added to FeatureGate status, vA is dropped.
If the CPO gets stuck during the transition, some management-cluster-side pods (cloud-network-config-controller, cluster-network-operator, ingress-operator, cluster-storage-operator, etc.) crash loop with logs like:
E1211 15:43:58.314619 1 simple_featuregate_reader.go:290] cluster failed with : unable to determine features: missing desired version "4.14.38" in featuregates.config.openshift.io/cluster E1211 15:43:58.635080 1 simple_featuregate_reader.go:290] cluster failed with : unable to determine features: missing desired version "4.14.38" in featuregates.config.openshift.io/cluster
vB is added to FeatureGate status early in the update, and vA is preserved through much of the update, and only removed when it seems like there might not be any more consumers (when a version is dropped from ClusterVersion history, if you want to match the current standalone handling on this).
None yet.
we're just getting a regexp search bar and then a blank chart. Using the browser dev tools console we see this error:
Uncaught SyntaxError: import declarations may only appear at top level of a module timelines-chart:1:1 Uncaught ReferenceError: TimelinesChart is not defined renderChart https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-vsphere-ovn-csi/1902920985443569664/artifacts/e2e-vsphere-ovn-csi/openshift-e2e-test/artifacts/junit/e2e-timelines_spyglass_20250321-040532.html:33606 <anonymous> https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-vsphere-ovn-csi/1902920985443569664/artifacts/e2e-vsphere-ovn-csi/openshift-e2e-test/artifacts/junit/e2e-timelines_spyglass_20250321-040532.html:33650
Seems to be hitting 4.18 as well, not sure when it started exactly.
Description of problem:
Webhook prompt should be given when marketType is invalid like other features liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml Error from server (Forbidden): error when creating "ms1.yaml": admission webhook "validation.machineset.machine.openshift.io" denied the request: providerSpec.networkInterfaceType: Invalid value: "1": Valid values are: ENA, EFA and omitted
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-03-05-160850
How reproducible:
always
Steps to Reproduce:
1.Install an AWS cluster liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.19.0-0.nightly-2025-03-05-160850 True False 5h37m Cluster version is 4.19.0-0.nightly-2025-03-05-160850 2.Create a machineset with invalid marketType, for example, marketType: "1", the machine stuck in Provisioning, although I can see some messages in the machine providerStatus and machine-controller log, I think we should give explicit webhook prompt to be consistent with other features. huliu-aws36a-6bslb-worker-us-east-2aa-f89jk Provisioning 8m42s providerStatus: conditions: - lastTransitionTime: "2025-03-06T07:49:51Z" message: invalid MarketType "1" reason: MachineCreationFailed status: "False" type: MachineCreation E0306 08:01:07.645341 1 actuator.go:72] huliu-aws36a-6bslb-worker-us-east-2aa-f89jk error: huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: reconciler failed to Create machine: failed to launch instance: invalid MarketType "1" W0306 08:01:07.645377 1 controller.go:409] huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: failed to create machine: huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: reconciler failed to Create machine: failed to launch instance: invalid MarketType "1" E0306 08:01:07.645427 1 controller.go:341] "msg"="Reconciler error" "error"="huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: reconciler failed to Create machine: failed to launch instance: invalid MarketType \"1\"" "controller"="machine-controller" "name"="huliu-aws36a-6bslb-worker-us-east-2aa-f89jk" "namespace"="openshift-machine-api" "object"={"name":"huliu-aws36a-6bslb-worker-us-east-2aa-f89jk","namespace":"openshift-machine-api"} "reconcileID"="e3aeeeda-2537-4e83-a787-2cbcf9926646" I0306 08:01:07.645499 1 recorder.go:104] "msg"="huliu-aws36a-6bslb-worker-us-east-2aa-f89jk: reconciler failed to Create machine: failed to launch instance: invalid MarketType \"1\"" "logger"="events" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"huliu-aws36a-6bslb-worker-us-east-2aa-f89jk","uid":"a7ef8a7b-87d5-4569-93a4-47a7a2d16325","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"118757"} "reason"="FailedCreate" "type"="Warning" liuhuali@Lius-MacBook-Pro huali-test % oc get machineset huliu-aws36a-6bslb-worker-us-east-2aa -oyaml apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64 machine.openshift.io/GPU: "0" machine.openshift.io/memoryMb: "16384" machine.openshift.io/vCPU: "4" creationTimestamp: "2025-03-06T07:49:50Z" generation: 1 labels: machine.openshift.io/cluster-api-cluster: huliu-aws36a-6bslb name: huliu-aws36a-6bslb-worker-us-east-2aa namespace: openshift-machine-api resourceVersion: "118745" uid: 65e94786-6c1a-42b8-9bf3-9fe0d3f4adf3 spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: huliu-aws36a-6bslb machine.openshift.io/cluster-api-machineset: huliu-aws36a-6bslb-worker-us-east-2aa template: metadata: labels: machine.openshift.io/cluster-api-cluster: huliu-aws36a-6bslb machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: huliu-aws36a-6bslb-worker-us-east-2aa spec: lifecycleHooks: {} metadata: {} providerSpec: value: ami: id: ami-0e763ecd8ccccbc99 apiVersion: machine.openshift.io/v1beta1 blockDevices: - ebs: encrypted: true iops: 0 kmsKey: arn: "" volumeSize: 120 volumeType: gp3 capacityReservationId: "" credentialsSecret: name: aws-cloud-credentials deviceIndex: 0 iamInstanceProfile: id: huliu-aws36a-6bslb-worker-profile instanceType: m6i.xlarge kind: AWSMachineProviderConfig marketType: "1" metadata: creationTimestamp: null metadataServiceOptions: {} placement: availabilityZone: us-east-2a region: us-east-2 securityGroups: - filters: - name: tag:Name values: - huliu-aws36a-6bslb-node - filters: - name: tag:Name values: - huliu-aws36a-6bslb-lb subnet: filters: - name: tag:Name values: - huliu-aws36a-6bslb-subnet-private-us-east-2a tags: - name: kubernetes.io/cluster/huliu-aws36a-6bslb value: owned userDataSecret: name: worker-user-data status: fullyLabeledReplicas: 1 observedGeneration: 1 replicas: 1 liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
machine stuck in Provisioning, and some messages shown in the machine providerStatus and machine-controller log
Expected results:
give explicit webhook prompt to be consistent with other features. like liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml Error from server (Forbidden): error when creating "ms1.yaml": admission webhook "validation.machineset.machine.openshift.io" denied the request: providerSpec.networkInterfaceType: Invalid value: "1": Valid values are: ENA, EFA and omitted
Additional info:
New feature testing for https://issues.redhat.com/browse/OCPCLOUD-2780
Description of problem:
Fix labels for allow opentelemetry allow list, currently all labels has exporter/receiver postfix on it. This is incorrect, because the name of the exporter/importer doesn't contain such postfix.
Description of problem:
the CIS "plugin did not respond" blocked the public install
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2025-03-14-195326
How reproducible:
Always
Steps to Reproduce:
1.create public ipi cluster on IBMCloud platform 2. 3.
Actual results:
level=info msg=Creating infrastructure resources... msg=Error: Plugin did not respond ... msg=panic: runtime error: invalid memory address or nil pointer dereference msg=[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x24046dc]256level=error257level=error msg=goroutine 2090 [running]:258level=error msg=github.com/IBM-Cloud/terraform-provider-ibm/ibm/service/cis.ResourceIBMCISDnsRecordRead(0xc003573900, {0x4ed2fa0?, 0xc00380c008?})
Expected results:
create cluster succeed.
Additional info:
https://github.com/IBM-Cloud/terraform-provider-ibm/issues/6066 ibm_cis_dns_record leads to plugin crash
Description of problem:
cluster with custom endpoints, fail to ssh to the created bastion and master vm
Version-Release number of selected component (if applicable):
4.19 pre-merge main@de563b96, merging: #9523 f1119b4a, #9397 487587cf, #9385 e365e12c
How reproducible:
Alwats
Steps to Reproduce:
1. create install-config with customer endpoint serviceEndpoints: - name: COS url: https://s3.direct.jp-tok.cloud-object-storage.appdomain.cloud 2. create the cluster 3.
Actual results:
create the cluster failed. ssh to the bootstrap and master vm failed
Expected results:
create the cluster succeed.
Additional info:
the VNC console of ci-op-lgk38x3xaa049-hk2z5-bootstrap:
Mar 05 11:36:34 ignition[783]: error at $.ignition.config.replace.source, line 1 col 1542: unable to parse url Mar 05 11:36:34 ignition[783]: error at $.ignition.config.replace.httpHeaders, line 1 col 50: unable to parse url Mar 05 11:36:34 ignition[783]: failed to fetch config: config is not valid Mar 05 11:36:34 ignition[783]: failed to acquire config: config is not valid Mar 05 11:36:34 systemd[1]: ignition-fetch-offline.service: Main process exited, code=exited, status=1/FAILURE Mar 05 11:36:34 ignition[783]: Ignition failed: config is not valid Mar 05 11:36:34 systemd[1]: ignition-fetch-offline.service: Failed with result 'exit-code'. Mar 05 11:36:34 systemd[1]: Failed to start Ignition (fetch-offline). Mar 05 11:36:34 systemd[1]: ignition-fetch-offline.service: Triggering OnFailure dependencies. Generating "/run/initramfs/rdsosreport.txt"
the VNC console of ci-op-lgk38x3xaa049-hk2z5-master-0:
[ 2284.471078] ignition[840]: GET https://api-int.ci-op-lgk38x3xaa049.private-ibmcloud-1.qe.devcluster.openshift.com:22623/config/master: attempt #460 [ 2284.477585] ignition[840]: GET error: Get "https://api-int.ci-op-lgk38x3xaa049.private-ibmcloud-1.qe.devcluster.openshift.com:22623/config/master": EOF
Description of problem:
when trying to use the ImageSetConfig as described below i see that oc-mirror gets killed abruptly. kind: ImageSetConfiguration apiVersion: mirror.openshift.io/v2alpha1 mirror: platform: channels: - name: stable-4.16 # Version of OpenShift to be mirrored minVersion: 4.16.30 # Minimum version of OpenShift to be mirrored maxVersion: 4.16.30 # Maximum version of OpenShift to be mirrored shortestPath: true type: ocp graph: true operators: - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.16 full: false - catalog: registry.redhat.io/redhat/certified-operator-index:v4.16 full: false - catalog: registry.redhat.io/redhat/community-operator-index:v4.16 full: false helm: {}
Version-Release number of selected component (if applicable):
4.18
How reproducible:
Always
Steps to Reproduce:
1. Use the ImageSetConfig as above 2. Run command `oc-mirror -c /tmp/config.yaml file://test --v2` 3.
Actual results:
oc-mirror command gets killed even after having about 24GB of Ram and 12 core cpu, for some customers even after having 64GB of ram it looks like it never worked. 2025/03/03 10:40:01 [INFO] : :mag: collecting operator images... 2025/03/03 10:40:01 [DEBUG] : [OperatorImageCollector] setting copy option o.Opts.MultiArch=all when collecting operator images 2025/03/03 10:40:01 [DEBUG] : [OperatorImageCollector] copying operator image registry.redhat.io/redhat/redhat-operator-index:v4.16 (24s) Collecting catalog registry.redhat.io/redhat/redhat-operator-index:v4.16 2025/03/03 10:40:26 [DEBUG] : [OperatorImageCollector] manifest 2be15a52aa4978d9134dfb438e51c01b77c9585578244b97b8ba1d4f5e6c0ea1 (5m59s) Collecting catalog registry.redhat.io/redhat/redhat-operator-index:v4.16 2025/03/03 10:46:01 [WARN] : error parsing image registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9 : registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9 unable to parse image correctly : tag and dige ✓ (5m59s) Collecting catalog registry.redhat.io/redhat/redhat-operator-index:v4.16 2025/03/03 10:46:01 [DEBUG] : [OperatorImageCollector] copying operator image registry.redhat.io/redhat/certified-operator-index:v4.16 ⠦ (2s) Collecting catalog registry.redhat.io/redhat/certified-operator-index:v4.16 2025/03/03 10:46:03 [DEBUG] : [OperatorImageCollector] manifest 816c65bcab1086e3fa158e2391d84c67cf96916027c59ab8fe44cf68a1bfe57a 2025/03/03 10:46:03 [DEBUG] : [OperatorImageCollector] label /configs ✓ (51s) Collecting catalog registry.redhat.io/redhat/certified-operator-index:v4.16 2025/03/03 10:46:53 [DEBUG] : [OperatorImageCollector] copying operator image registry.redhat.io/redhat/community-operator-index:v4.16 ⠇ (11s) Collecting catalog registry.redhat.io/redhat/community-operator-index:v4.16 2025/03/03 10:47:04 [DEBUG] : [OperatorImageCollector] manifest 7a8cb7df2447b26c43b274f387197e0789c6ccc55c18b48bf0807ee00286550d ⠹ (34m26s) Collecting catalog registry.redhat.io/redhat/community-operator-index:v4.16 Killed
Expected results:
oc-mirror process should not get killed abruptly.
Additional info:
More info in the link here: https://redhat-internal.slack.com/archives/C02JW6VCYS1/p1740783474190069
Description of problem:
platform.powervs.clusterOSImage is still required and should not be removed from the install-config
Version-Release number of selected component (if applicable):
4.19.0
Steps to Reproduce:
1. Specify OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE and try to deploy 2. The deploy does not use the override value
Actual results:
The value of platform.powervs.clusterOSImage will be ignored.
Expected results:
The deploy uses the overriden value of OS_IMAGE_OVERRIDE
Additional info:
This enabled machineset preflights by default https://github.com/kubernetes-sigs/cluster-api/pull/11228
We won't to disable this functionality in hcp because of the following reasons:
MachineSetPreflightCheckKubeadmVersionSkew
This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were not completed when this image was assembled
Description of problem:
Adding a node with `oc adm node-image` is unable to pull the release image container and fails to generate the new node ISO.
Version-Release number of selected component (if applicable):
How reproducible:
100%
Steps to Reproduce:
1. Deploy OpenShift cluster with private registry in an offline environment 2. Create the nodes-config.yaml for new nodes 3. Run "oc adm node-image create --dir=/tmp/assets
Actual results:
Command fails with error saying that it cannot pull from quay.io/openshift-release-dev/ocp-release@shaXXXXX
Expected results:
Command generates an ISO used to add the new worker nodes
Additional info:
When creating the initial agent ISO using "openshift-install agent create image" command, we can see in the output that a sub command is run, "oc adm release extract". When the install-config.yaml contains the ImageContentSourcePolicy section, or ImageDigestMirrorSet section, a flag is added to "oc adm release extract --icsp or idms" which contains the mappings from quay.io to the private registry. The oc command does not have a top level icsp or idms flag. The oc adm node-image command needs to have a flag for icsp or idms such that it is able to understand that instead of pulling the release image from quay.io it should pull the image from the private registry. Without this flag, the oc command has no way to know that it should be pulling container images from a private registry.
We recently hit a limit in our subscription where we could no longer assign role assignments to service principals.
This is because we are not deleting role assignments made during our CI runs. We previously thought we didn't have to delete those, but it turns out we need to.
Description of the problem:
For some hardware, particularly simplynuc (https://edge.simplynuc.com/) it was found that when the Motherboard serial number is not set it default to "-". Since this is treated as a valid string in the UUID generation in https://github.com/openshift/assisted-installer-agent/blob/master/src/scanners/machine_uuid_scanner.go#L96-L107 it results in all hosts with the same UUID, causing installation failures.
USER STORY:
As a developer, I need to remove all feature gates around vSphere CPMS support one release after GA so that feature gate logic is removed and all functions are no longer needing feature gate protections.
DESCRIPTION:
This story will clean up all of the feature gate logic for vSphere CPMS. Currently there are several projects that check to see if the feature gate is enabled in order for the logic to be performed. As part of being GA, the code is enabled by default. You can still force disable it in install-config if you wish, which is why we left it in GA+0, but GA+1 we are to remove it assuming all major bugs are fixed.
ACCEPTANCE CRITERIA:
All components referencing the vSphere CPMS feature gate have been updated to no longer use it and point to a version of API where the feature gate no longer exists.
ENGINEERING DETAILS:
TBD