Jump to: Incomplete Features | Incomplete Epics | Other Complete | Other Incomplete |
Note: this page shows the Feature-Based Change Log for a release
When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release
Provide a simple way to get a VM-friendly networking setup, without having to configure the underlying physical network.
Primary used-defined networks can be managed from the UI and the user flow is seamless.
As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer – specifically for IPI deployments which currently use Terraform for setting up the infrastructure.
To avoid an increased support overhead once the license changes at the end of the year, we want to provision IBM Cloud VPC infrastructure without the use of Terraform.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
Provisioning bootstrap and control plane machines using CAPI.
1. Pre-installation:
2. Installation:
3. Update:
4. Uninstallation/Deletion:
5. Disconnected Environments for High-Security Workloads:
6. [Tech Preview] Signature Validation for Secure Workflows:
All the expected user outcomes and the acceptance criteria in the engineering epics are covered.
OLM: Gateway to the OpenShift Ecosystem
Operator Lifecycle Manager (OLM) has been a game-changer for OpenShift Container Platform (OCP) 4. Since its launch in 2019, OLM has fostered a rich ecosystem, expanding from a curated set of 25 operators to over 100 officially supported Red Hat operators and hundreds more from certified ISVs and the community.
OLM empowers users to manage diverse technologies with ease, including ACM, ACS, Quay, GitOps, Pipelines, Service Mesh, Serverless, and Virtualization. It has also facilitated the introduction of groundbreaking operators for entirely new workloads, like Nvidia GPU, PTP, Windows Machine Config, SR-IOV networking, and more. Today, a staggering 91% of our connected customers leverage OLM's capabilities.
OLM v0: A Stepping Stone
While OLM v0 has been instrumental, it has limitations. The API design, not fully GitOps-friendly or entirely declarative, presents a steeper learning curve due to its complexity. Furthermore, OLM v0 was designed with the assumption of namespace-scoped CRDs (Custom Resource Definitions), allowing for independent operator installations and parallel versions within a single cluster. However, this functionality never materialized in core Kubernetes, and OLM v0's attempt to simulate it has introduced limitations and bugs.
The Operator Framework Team: Building the Future
The Operator Framework team is the cornerstone of the OpenShift ecosystem. They build and manage OLM, the Operator SDK, operator catalog formats, and tooling (opm, file-based catalogs). Their work directly impacts how operators are developed, packaged, delivered, and managed by users and SRE teams on OpenShift clusters.
A Streamlined Future with OLM v1
The Operator Framework team has undergone significant restructuring to focus on the next generation of OLM – OLM v1. This transition includes moving the Operator SDK to a feature-complete state with ongoing maintenance for compatibility with the latest Kubernetes and controller-runtime libraries. This strategic shift allows the team to dedicate resources to completely revamping OLM's API and management concepts for catalog content delivery.
Leveraging learnings and customer feedback since OCP 4's inception, OLM v1 is designed to be a major overhaul, and it will be shipped as a Generally Available (GA) feature in OpenShift 4.17.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
1. Pre-installation:
2. Installation:
3. Update:
4. Uninstallation/Deletion:
1. Pre-installation:
2. Installation:
3. Update:
4. Uninstallation/Deletion:
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Update cluster-olm-operator manifests to be in the payload by default.
A/C:
- Removed "release.openshift.io/feature-set: TechPreviewNoUpgrade" annotation
- Ensure the following cluster profiles are targeted by all manifests:
- include.release.openshift.io/hypershift: "true"
- include.release.openshift.io/ibm-cloud-managed: "true"
- include.release.openshift.io/self-managed-high-availability: "true"
- include.release.openshift.io/single-node-developer: "true"
- No installation related annotations are present in downstream operator-controller and catalogd manifests
OpenShift offers a "capabilities" to allow users to select which components to include in the cluster at install time.
It was decided the capability name should be: OperatorLifecycleManagerV1 [ref
A/C:
- ClusterVersion resource updated with OLM v1 capability
- cluster-olm-operator manifests updated with capability.openshift.io/name=OperatorLifecycleManagerV1 annotation
Customers who deploy a large number of OpenShift on OpenStack clusters want to minimise the resource requirements of their cluster control planes.
Customers deploying RHOSO (OpenShift services for OpenStack, i.e. OpenStack control plane on bare metal OpenShift) already have a bare metal management cluster capable of serving Hosted Control Planes.
We should enable self-hosted (i.e. on-prem) Hosted Control Planes to serve Hosted Control Planes to OpenShift on OpenStack clusters, with a specific focus of serving Hosted Control Planes from the RHOSO management cluster.
As an enterprise IT department and OpenStack customer, I want to provide self-managed OpenShift clusters to my internal customers with minimum cost to the business.
As an internal customer of said enterprise, I want to be able to provision an OpenShift cluster for myself using the business's existing OpenStack infrastructure.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
TBD
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
\
In a HCP deployment, the hosted-cluster-config-operator is responsible for deploying other operators, such as the cluster-storage-operator. We need to modify this operator to deploy cluster-storage-operator and enable the openstack-cinder-csi-driver-operator when deployed in an OpenStack environment.
In a HCP deployment, the hosted-cluster-config-operator is responsible for deploying other operators, such as the cluster-storage-operator. We need to modify this operator to deploy cluster-storage-operator and enable the openstack-manila-csi-driver-operator when deployed in an OpenStack environment.
In OSASINFRA-3608, we merged the openshift/openstack-cinder-csi-driver-operator repository into openshift/csi-operator and modified it to take advantage of the new generator framework provided therein. Now, we want to build on this, adding Hypershift-specific assets and tweaking whatever else is needed.
Implement authorization to secure API access for different user personas/actors in the agent-based installer.
User Personas:
This is
The agent-based installer APIs have implemented basic security measures through authentication, as covered in AGENT-145.
To further enhance security, it is crucial to implement user persona/actor-based authorization, allowing for differentiated access control, such as read-only or read-write permissions, based on the user's role.
The goal of this implementation is to provide a more robust and secure API framework, ensuring that users can only perform actions appropriate to their role.
As a ABI user, I want to be able to:
so that I can achieve
Description of criteria:
Detail about what is specifically not being delivered in the story
This requires/does not require a design proposal.
This requires/does not require a feature gate.
Improve the cluster expansion with the agent workflow added in OpenShift 4.16 (TP) and OpenShift 4.17 (GA) with:
Improve the user experience and functionality of the commands to add nodes to clusters using the image creation functionality.
Currently all the *.iso generated by the node-joiner tool are copied back to the user. Since the node-joiner created unconditionally also the node-config, this one is copied even if it not requested, resulting than confusing for the end user.
Currently the oc node-image create command does not report any revelant information that could help the user to understand which element was retrieved from (for example, the SSH key), thus making more difficult to troubleshoot an eventual issue.
For this reason, it could be useful that the node-joiner tool would produce a proper json file, reporting all the details about the relevent resources fetched for generating image. The oc command should be able to expose them when required (ie via command flag)
Make more similar the two commands output, by using the recently introduced base command logger
As part of being a first party Azure offering, ARO HCP needs to adhere to Microsoft secure supply chain software requirements. In order to do this, we require setting a label on all pods that run in the hosted cluster namespace.
Implement Mechanism for Labeling Hosted Cluster Control Plane Pods
kubernetes.azure.com/managedby: sub_1d3378d3-5a3f-4712-85a1-2485495dfc4b
As a (user persona), I want to be able to:
so that I can achieve
Description of criteria:
This does not require a design proposal.
This does not require a feature gate.
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
<your text here>
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
<your text here>
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
link back to OCPSTRAT-1644 somehow
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
Why is this important? (mandatory)
What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Add OpenStackLoadBalancerParameters and add an option for setting the load-balancer IP address for only those platforms where it can be implemented.
As a user of on-prem OpenShift, I need to manage DNS for my OpenShift cluster manually. I can already specify an IP address for the API server, but I cannot do this for Ingress. This means that I have to:
I would like to simplify this workflow to:
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
\
Allow to configure a pre-created floating IP when creating HostedClusters which will set the Service.Spec.FloatingIP for the router-default.
We need to do a lot of R&D and fix some known issues (e.g., see linked BZs).
R&D targetted at 4.16 and productisation of this feature in 4.17
Goal
To make the current implementation of the HAProxy config manager the default configuration.
Objectives
The goal of this user story is to combine the code from the smoke test user story and results from the spike into an implementation PR.
Since multiple gaps were discovered a feature gate will be needed to ensure stability of OCP before the feature can be enabled by default.
https://issues.redhat.com/browse/NE-1788 describes 3 gaps in the implementation of DAC:
Additional gaps were discovered along the way:
This story aims at fixing those gaps.
Add support for the Installer to configure IPV4Subnet to customize internal OVN network in BYO VPC.
As an OpenShift user I'm able to provide IPv4 subnets to the Installer so I can customize the OVN networks at install time
The Installer will allow the user to provide the information via the install config manifest and this information will be used at install time to configure the OVN network and deploy the cluster into an existing VPC provided by the user.
This is a requirement for ROSA, ARO and OSD
As any other option for the Installer this will be documented as usual.
Terraform is used for creating or referencing VPCs
OCP/Telco Definition of Done
Epic Template descriptions and documentation.{}
Configure IPV4Subnet to customize internal OVN network in BYOVPC
Users are able to successfully provide IPV4Subnets through the install config that are used to customize the OVN networks.
ROSA, ARO and OSD needs this for their product.
-
Other cloud platforms except AWS
-
-
-
-
Done Checklist
Goal Summary
This feature aims to make sure that the HyperShift operator and the control-plane it deploys uses Managed Service Identities (MSI) and have access to scoped credentials (also via access to AKS's image gallery potentially). Additionally, for operators deployed in customers account (system components), they would be scoped with Azure workload identities.
The image registry can authenticate with Service Principal backed by a certificate stored in an Azure Key Vault. The Secrets CSI driver will be used to mount the certificate as a volume on the image registry deployment in a hosted control plane.
Azure SDK
Which degree of coverage should run on AKS e2e vs on existing e2es
CI - Existing CI is running, tests are automated and merged.
CI - AKS CI is running, tests are automated and merged.
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>
As a ARO HCP user, I want to be able to:
so that I can
Description of criteria:
N/A
This requires/does not require a design proposal.
This requires/does not require a feature gate.
Goal:
As an administrator, I would like to use my own managed DNS solution instead of only specific openshift-install supported DNS services (such as AWS Route53, Google Cloud DNS, etc...) for my OpenShift deployment.
Problem:
While cloud-based DNS services provide convenient hostname management, there's a number of regulatory (ITAR) and operational constraints customers face prohibiting the use of those DNS hosting services on public cloud providers.
Why is this important:
Dependencies (internal and external):
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL):
Previous Work:
Open questions:
Link to Epic: https://docs.google.com/document/d/1OBrfC4x81PHhpPrC5SEjixzg4eBnnxCZDr-5h3yF2QI/edit?usp=sharing
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Update MCO to start in-cluster CoreDNS pods for AWS when userProvisionedDNS is configured. Use the GCP implementation https://github.com/openshift/machine-config-operator/pull/4018 for reference.
As a (user persona), I want to be able to:
so that I can achieve
Description of criteria:
Detail about what is specifically not being delivered in the story
This requires/does not require a design proposal.
This requires/does not require a feature gate.
This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were not completed when this image was assembled
Content-Security-Policy (CSP) header provides a defense-in-depth measure in client-side security, as a second layer of protection against Cross-site Scripting (XSS) and clickjacking attacks.
It is not yet implemented in the OpenShift web console, however, there are some other related security headers present in the OpenShift console that cover some aspects of CSP functionality:
This is a follow-up to CONSOLE-4263
As part of handling CSP violation events in Console, we should send the relevant CSP report data to telemetry service.
AC:
Epic Goal
Through this epic, we will update our CI to use a have an available agent-based workflow instead of the libvirt openshift-installer, allowing us to eliminate the use of terraform in our deployments.
Why is this important?
There is an active initiative in openshift to remove terraform from the openshift installer.
Acceptance Criteria
Done Checklist
Context thread.
Description of problem:
Monitoring the 4.18 agent-based installer CI job for s390x (https://github.com/openshift/release/pull/50293) I discovered unexpected behavoir onces the installation triggers reboot into disk step for the 2nd and 3rd control plane nodes. (The first control plane node is rebooted last because it's also the bootstrap node). Instead of rebooting successully as expected, it fails to find the OSTree and drops to dracut, stalling the installation.
Version-Release number of selected component (if applicable):
OpenShift 4.18 on s390x only; discovered using agent installer
How reproducible:
Try to install OpenShift 4.18 using agent-based installer on s390x
Steps to Reproduce:
1. Boot nodes with XML (see attached) 2. Wait for installation to get to reboot phase.
Actual results:
Control plane nodes fail to reboot.
Expected results:
Control plane nodes reboot and installation progresses.
Additional info:
See attached logs.
The history of this epic starts with this PR which triggered a lengthy conversation around the workings of the image API with respect to importing imagestreams images as single vs manifestlisted. The imagestreams today by default have the `importMode` flag set to `Legacy` to avoid breaking behavior of existing clusters in the field. This makes sense for single arch clusters deployed with a single arch payload, but when users migrate to use the multi payload, more often than not, their intent is to add nodes of other architecture types. When this happens - it gives rise to problems when using imagestreams with the default behavior of importing a single manifest image. The oc commands do have a new flag to toggle the importMode, but this breaks functionality of existing users who just want to create an imagestream and use it with existing commands.
There was a discussion with David Eads and other staff engineers and it was decided that the approach to be taken is to default imagestreams' importMode to `preserveOriginal` if the cluster is installed with/ upgraded to a multi payload. So a few things need to happen to achieve this:
Some open questions:
For the apiserver operator to figure out the payload type and set the import mode defaults, the CVO needs to expose that value through the status field. This information is available today in the conditions list, but it's not pretty to extract it and infer the payload type as it is contained in the message string. The way to do it today is shown here. It would be better for CVO to expose it as a separate field which can be easily consumed by any controller and also be used for telemetry in the future.
Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF).
Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.
(Using separate cards for each driver because these updates can be more complicated)
Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.
Please wait for openshift/api, openshift/library-go, and openshift/client-go are updated to the newest Kubernetes release! There may be non-trivial changes in these libraries.
This includes (but is not limited to):
Operators:
(please cross-check with *-operator + vsphere-problem-detector in our tracking sheet)
EOL, do not upgrade:
The following operators were migrated to csi-operator, do not update these obsolete repos:
tools/library-bump.py and tools/bump-all may be useful. For 4.16, this was enough:
mkdir 4.16-bump cd 4.16-bump ../library-bump.py --debug --web <file with repo list> STOR-1574 --run "$PWD/../bump-all github.com/google/cel-go@v0.17.7" --commit-message "Bump all deps for 4.16"
4.17 perhaps needs an older prometheus:
../library-bump.py --debug --web <file with repo list> STOR-XXX --run "$PWD/../bump-all github.com/google/cel-go@v0.17.8 github.com/prometheus/common@v0.44.0 github.com/prometheus/client_golang@v1.16.0 github.com/prometheus/client_model@v0.4.0 github.com/prometheus/procfs@v0.10.1" --commit-message "Bump all deps for 4.17"
4.18 special:
Add "spec.unhealthyEvictionPolicy: AlwaysAllow" to all PodDisruptionBudget objects of all our operators + operands. See WRKLDS-1490 for details
There has been change in library-go function called `WithReplicasHook`. See https://github.com/openshift/library-go/pull/1796.
This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were completed when this image was assembled
Description of problem:
During the integration of Manila into csi-operator a new controller was added to csi-operator that checks if a precondition is valid in order to trigger all the other controllers. The precondition defined for manila checks that manila shares exists and if that is the case it syncs the CSI Driver and the Storage Classes. We need to handle the error returned in case those syncs fail.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
Description of problem:
The finally tasks do not get removed and remain in the pipeline.
Version-Release number of selected component (if applicable):
In all supported OCP version
How reproducible:
Always
Steps to Reproduce:
1. Create a finally task in a pipeline in pipeline builder 2. Save pipeline 3. Edit pipeline and remove finally task in pipeline builder 4. Save pipeline 5. Observe that the finally task has not been removed
Actual results:
The finally tasks do not get removed and remain in the pipeline.
Expected results:
Finally task gets removed from pipeline when removing the finally tasks and saving the pipeline in the "pipeline builder" mode.
Additional info:
Description of problem:
`tag:UntagResources` is required for the AWS SDK call [UntagResourcesWithContext](https://github.com/openshift/installer/blob/master/pkg/destroy/aws/shared.go#L121) when removing the "shared" tag from the IAM profile.
Version-Release number of selected component (if applicable):
4.17+
How reproducible:
always
Steps to Reproduce:
1. 2. 3.
Actual results:
time="2024-11-19T12:22:19Z" level=debug msg="search for IAM instance profiles" time="2024-11-19T12:22:19Z" level=debug msg="Search for and remove tags in us-east-1 matching kubernetes.io/cluster/ci-op-y8wbktiq-e515e-q6kvb: shared" time="2024-11-19T12:22:19Z" level=debug msg="Nothing to clean for shared iam resource" arn="arn:aws:iam::460538899914:instance-profile/ci-op-y8wbktiq-e515e-byo-profile-worker" time="2024-11-19T12:22:19Z" level=debug msg="Nothing to clean for shared iam resource" arn="arn:aws:iam::460538899914:instance-profile/ci-op-y8wbktiq-e515e-byo-profile-master" time="2024-11-19T12:22:19Z" level=info msg="untag shared resources: AccessDeniedException: User: arn:aws:iam::460538899914:user/ci-op-y8wbktiq-e515e-minimal-perm is not authorized to perform: tag:UntagResources because no identity-based policy allows the tag:UntagResources action\n\tstatus code: 400, request id: 464de6ab-3de5-496d-a163-594dade11619" See: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/58833/rehearse-58833-pull-ci-openshift-installer-release-4.18-e2e-aws-ovn-custom-iam-profile/1858807924600606720
Expected results:
The perm is added to the required list when BYO IAM profile and the "shared" tag is removed from the profiles.
Additional info:
Component Readiness has found a potential regression in the following test:
install should succeed: infrastructure
installer fails with:
time="2024-10-20T04:34:57Z" level=error msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: control-plane machines were not provisioned within 15m0s: client rate limiter Wait returned an error: context deadline exceeded"
Significant regression detected.
Fishers Exact probability of a regression: 99.96%.
Test pass rate dropped from 98.94% to 89.29%.
Sample (being evaluated) Release: 4.18
Start Time: 2024-10-14T00:00:00Z
End Time: 2024-10-21T23:59:59Z
Success Rate: 89.29%
Successes: 25
Failures: 3
Flakes: 0
Base (historical) Release: 4.17
Start Time: 2024-09-01T00:00:00Z
End Time: 2024-10-01T23:59:59Z
Success Rate: 98.94%
Successes: 93
Failures: 1
Flakes: 0
Description of problem:
If zones are not specified in the install-config.yaml, the installer will discover all the zones available for the region. Then it will try to filter those zones based on the instance type, which requires the `ec2:DescribeInstanceTypeOfferings` permission.
Version-Release number of selected component (if applicable):
4.16+
How reproducible:
Always by not specifying zones in the install-config.yaml and installing cluster with a minimal permissions user.
Steps to Reproduce:
1. 2. 3.
Actual results:
TBA
Expected results:
A failure message indicating that `ec2:DescribeInstanceTypeOfferings` is need when zones are not set.
Additional info:
This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were not completed when this image was assembled
In payloads 4.18.0-0.ci-2024-11-01-110334 and 4.18.0-0.nightly-2024-11-01-101707 we observed GCP install failures
Container test exited with code 3, reason Error --- ails: level=error msg=[ level=error msg= { level=error msg= "@type": "type.googleapis.com/google.rpc.ErrorInfo", level=error msg= "domain": "googleapis.com", level=error msg= "metadatas": { level=error msg= "consumer": "projects/711936183532", level=error msg= "quota_limit": "ListRequestsFilterCostOverheadPerMinutePerProject", level=error msg= "quota_limit_value": "75", level=error msg= "quota_location": "global", level=error msg= "quota_metric": "compute.googleapis.com/filtered_list_cost_overhead", level=error msg= "service": "compute.googleapis.com" level=error msg= }, level=error msg= "reason": "RATE_LIMIT_EXCEEDED" level=error msg= }, level=error msg= { level=error msg= "@type": "type.googleapis.com/google.rpc.Help", level=error msg= "links": [ level=error msg= { level=error msg= "description": "The request exceeds API Quota limit, please see help link for suggestions.", level=error msg= "url": "https://cloud.google.com/compute/docs/api/best-practices#client-side-filter" level=error msg= } level=error msg= ] level=error msg= } level=error msg=] level=error msg=, rateLimitExceeded
Patrick Dillon Noted ListRequestsFilterCostOverheadPerMinutePerProject can not have it's quota limit increased.
The problem subsided over the weekend presumably with fewer jobs run but has started to appear again. opening to track ongoing issue and potential work arounds.
This contributes to the following test failures for GCP
install should succeed: configuration install should succeed: overall
IBI and IBU use diffrent lables for var-lib-containers partitioin.
This results in failure to mount the partition in case of label mismatch (var-lib-containers vs varlibcontainers).
We should always use `var-lib-containers` as the label
See more details in the slack thread
https://redhat-internal.slack.com/archives/C05JHD9QYTC/p1731542185936629
Installer part
Lca part (less interesting as config will be generated in installer)
https://github.com/openshift-kni/lifecycle-agent/blob/main/api/ibiconfig/ibiconfig.go#L20
Description of problem:
[vmware-vsphere-csi-driver-operator] driver controller/node/webhook update events repeat pathologically
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2024-11-03-161006
How reproducible:
Always
Steps to Reproduce:
1. Install an Openshift cluster on vSphere of version 4.17 nightly. 2. Upgrade the cluster to 4.18 nightly. 3. Check the driver controller/node/webhook update events should not repeat pathologically.
CI failure record -> https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-vsphere-ovn-upgrade/1854191939318976512
Actual results:
In step 3: the driver controller/node/webhook update events repeat pathologically
Expected results:
In step 3: the driver controller/node/webhook update events should not repeat pathologically
Additional info: