Feature CNV-51201: Integration between VMs and primary user-defined networks

View the Description

Provide a simple way to get a VM-friendly networking setup, without having to configure the underlying physical network.

Epic CNV-46603: UI for OVN Kubernetes: Primary user-defined networks

View the Description View the linked PRs

Goal

Primary used-defined networks can be managed from the UI and the user flow is seamless.

User Stories

As a cluster admin,
I want to use the UI to define a ClusterUserDefinedNetwork, assigned with a namespace selector.
As a project admin,
I want to use the UI to define a UserDefinedNetwork in my namespace.
As a project admin,
I want to be queried to create a UserDefinedNetwork before I create any Pods/VMs in my new project.
As a project admin running VMs in a namespace with UDN defined,
I expect the "pod network" to be called "user-defined primary network",
and I expect that when using it, the proper network binding is used.
As a project admin,
I want to use the UI to request a specific IP for my VM connected to UDN.

UX doc

https://docs.google.com/document/d/1WqkTPvpWMNEGlUIETiqPIt6ZEXnfWKRElBsmAs9OVE0/edit?tab=t.0#heading=h.yn2cvj2pci1l

Non-Requirements

<List of things not included in this epic, to alleviate any doubt raised during the grooming process.>

Notes

The user-defined networks design, including the API, is available here: https://github.com/openshift/enhancements/blob/master/enhancements/network/user-defined-network-segmentation.md

https://github.com/openshift/networking-console-plugin/pull/133

Feature OCPSTRAT-1003: Remove Terraform from the IBM Cloud VPC IPI installer

View the Description

Feature Overview (aka. Goal Summary)

As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer – specifically for IPI deployments which currently use Terraform for setting up the infrastructure.

To avoid an increased support overhead once the license changes at the end of the year, we want to provision IBM Cloud VPC infrastructure without the use of Terraform.

Requirements (aka. Acceptance Criteria):

The IBM Cloud VPC IPI Installer no longer contains or uses Terraform.
The new provider should aim to provide the same results and have parity with the existing IBM Cloud VPC Terraform provider. Specifically, we should aim for feature parity against the install config and the cluster it creates to minimize impact on existing customers' UX.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

Epic CORS-3278: Replace Terraform with CAPI Provider for IBM Cloud

View the Description

Epic Goal

Replace Terraform infrastructure and machine (bootstrap, control plane) provisioning with CAPI-based approach.

Story CORS-3282: Machine Provisioning

View the Description View the linked PRs

Provisioning bootstrap and control plane machines using CAPI.

https://github.com/openshift/installer/pull/9200

Feature OCPSTRAT-1347: [GA release] Next-gen OLM (OLM v1)

View the Description

Feature Overview (aka. Goal Summary)

With this next-gen OLM GA release (graduated from ‘Tech Preview’), customers can:
- discover collections of k8s extension/operator contents released in the FBC format with richer visibility into their release channels, versions, update graphs, and the deprecation information (if any) to make informed decisions about installation and/or update them.
- install a k8s extension/operator declaratively and potentially automate with GitOps to ensure predictable and reliable deployments.
- update a k8s extension/operator to a desired target version or keep it updated within a specific version range for security fixes without breaking changes.
- remove a k8s extension/operator declaratively and entirely including cleaning up its CRDs and other relevant on-cluster resources (with a way to opt out of this coming up in a later release).

To address the security needs of 30% of our customers who run clusters in disconnected environments, the GA release will include cluster extension lifecycle management functionality for offline environments.

[Tech Preview] (Cluster)Extension lifecycle management can handle runtime signature validation for container images to support OpenShift’s integration with the rising Sigstore project for secure validation of cloud-native artifacts,

Goals (aka. expected user outcomes)

1. Pre-installation:

Customers can access a collection of k8s extension contents from a set of default catalogs leveraging the existing catalog images shipped with OpenShift (in the FBC format) with the new Catalog API from the OLM v1 GA release.

With the new GAed Catalog API, customers get richer package content visibility in their release channels, versions, update graphs, and the deprecation information (if any) to help make informed decisions about installation and/or update.

With the new GAed Catalog API, customers can render the catalog content in their clusters with fewer resources in terms of CPU and memory usage and faster performance.

Customers can filter the available packages based on the package name and see the relevant information from the metadata shipped within the package.

2. Installation:

Customers using a ServiceAccount with sufficient permissions can install a k8s extension/operator with a desired target version or the latest version within a specific version range (from the associated channel) to get the latest security fixes.

Customers can easily automate the installation flow declaratively with GitOps to ensure predictable and reliable deployments.

Customers get protection from having two conflicting k8s extensions/operators owning the same API objects, i.e., no conflicting ownership, ensuring cluster stability.

Customers can access the* metadata of the installed k8s extension/operator to see essential information such as its provided APIs, example YAMLs of its provided APIs, descriptions, infrastructure features, valid subscriptions, etc.

3. Update:

Customers can see what updates are available for their k8s extension/operators in the form of immediate target versions and the associated update channels.

Customers can trigger the update of a k8s extension/operator with a desired target version or the latest version within a specific version range (from the associated channel) to get the latest security fixes.

Customers get protection from workload or k8s extension/operator breakage due to CustomResourceDefinition (CRD) being upgraded to a backward incompatible version during an update.

During OpenShift cluster update, customers* get Informed when installed k8s extensions/operators ** do not support the next OpenShift version *(when annotated by the package author/provider). Customers must update those k8s extensions/operators to a newer/compatible version before OLM unblocks the OpenShift cluster update.

4. Uninstallation/Deletion:

Customers can cleanly remove an installed k8s extension/operator including deleting CustomResourceDefinitions (CRDs), custom resource objects (CRs) of the CRDs, and other relevant resources to revert the cluster to its original state before the installation declaratively.

5. Disconnected Environments for High-Security Workloads:

Approximately 30% of our customers prioritize high security by running their clusters in internet-disconnected environments, especially for mission-critical production workloads. To benefit these users, our supported GA release needs to include cluster extension lifecycle management functionality that functions within these disconnected environments.

6. [Tech Preview] Signature Validation for Secure Workflows:

The Red Hat-sponsored Sigstore project is gaining traction in the Kubernetes community, aiming to simplify the signing of cloud-native artifacts. OpenShift leverages Sigstore tooling to enable scalable and flexible signature validation, including support for disconnected environments. This functionality will be available as a Tech Preview in 4.17 and is targeted for ~~General Availability (GA)~~ Tech Preview Phase 2 in the upcoming 4.18 release. To ~~fully~~ support this integration as a Tech Preview release, the (cluster)extension lifecycle management needs to (be prepared to) handle runtime validation of Sigstore signatures for container images.

Requirements (aka. Acceptance Criteria):

All the expected user outcomes and the acceptance criteria in the engineering epics are covered.

Background

OLM: Gateway to the OpenShift Ecosystem

Operator Lifecycle Manager (OLM) has been a game-changer for OpenShift Container Platform (OCP) 4. Since its launch in 2019, OLM has fostered a rich ecosystem, expanding from a curated set of 25 operators to over 100 officially supported Red Hat operators and hundreds more from certified ISVs and the community.

OLM empowers users to manage diverse technologies with ease, including ACM, ACS, Quay, GitOps, Pipelines, Service Mesh, Serverless, and Virtualization. It has also facilitated the introduction of groundbreaking operators for entirely new workloads, like Nvidia GPU, PTP, Windows Machine Config, SR-IOV networking, and more. Today, a staggering 91% of our connected customers leverage OLM's capabilities.

OLM v0: A Stepping Stone

While OLM v0 has been instrumental, it has limitations. The API design, not fully GitOps-friendly or entirely declarative, presents a steeper learning curve due to its complexity. Furthermore, OLM v0 was designed with the assumption of namespace-scoped CRDs (Custom Resource Definitions), allowing for independent operator installations and parallel versions within a single cluster. However, this functionality never materialized in core Kubernetes, and OLM v0's attempt to simulate it has introduced limitations and bugs.

The Operator Framework Team: Building the Future

The Operator Framework team is the cornerstone of the OpenShift ecosystem. They build and manage OLM, the Operator SDK, operator catalog formats, and tooling (opm, file-based catalogs). Their work directly impacts how operators are developed, packaged, delivered, and managed by users and SRE teams on OpenShift clusters.

A Streamlined Future with OLM v1

The Operator Framework team has undergone significant restructuring to focus on the next generation of OLM – OLM v1. This transition includes moving the Operator SDK to a feature-complete state with ongoing maintenance for compatibility with the latest Kubernetes and controller-runtime libraries. This strategic shift allows the team to dedicate resources to completely revamping OLM's API and management concepts for catalog content delivery.

Leveraging learnings and customer feedback since OCP 4's inception, OLM v1 is designed to be a major overhaul, and it will be shipped as a Generally Available (GA) feature in OpenShift 4.17.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

1. Pre-installation:

[GA release] Docs provide instructions on how to add Red Hat-provided Operator catalogs with the pull secret for catalogs hosted on a secure registry.

[GA release] Docs provide instructions on how to discover the Operator packages from a catalog.

[GA release] Docs provide instructions on how to query and inspect the metadata of Operator bundles and find feasible ones to be installed with the OLM v1.

2. Installation:

[GA release] Docs provide instructions on how to use a ServiceAccount with sufficient permissions to install a k8s extension/operator with a desired target version or the latest version within a specific version range to get the latest security fixes.

[GA release] Docs provide instructions on how to automate the installation flow declaratively with GitOps to ensure predictable and reliable deployments.

[GA release] Docs mention the OLM v1’s protection from having two conflicting k8s extensions/operators owning the same API objects, i.e., no conflicting ownership, ensuring cluster stability.

[GA release] Docs provide instructions on how to access the metadata of the installed k8s extension/operator to see essential information such as its provided APIs, example YAMLs of its provided APIs, descriptions, infrastructure features, valid subscriptions, etc.

[GA release] Docs explain how to create RBACs from a CRD to grant cluster users access to the installed k8s extension/operator's provided APIs.

3. Update:

[GA release] Docs provide instructions on how to see what updates are available for their k8s extension/operators in the form of immediate target versions and the associated update channels.

[GA release] Docs provide instructions on how to trigger the update of a k8s extension/operator with a desired target version or the latest version within a specific version range to get the latest security fixes.

[GA release] Docs mention OLM v1’s protection from workload or k8s extension/operator breakage due to CustomResourceDefinition (CRD) being upgraded to a backward incompatible version during an update.

[GA release] Docs mention OLM v1 will block the OpenShift cluster update if installed k8s extensions/operators do not support the next OpenShift version (when annotated by the package author/provider). Provide instructions on how to find and update to a newer/compatible version before OLM unblocks the OpenShift cluster update.

4. Uninstallation/Deletion:

[GA release] Docs provide instructions on how to cleanly remove an installed k8s extension/operator including deleting CustomResourceDefinitions (CRDs), custom resource objects (CRs) of the CRDs, and other relevant resources.

[GA release] Docs provide instructions to verify the cluster has been reverted to its original state after uninstalling a k8s extension/operator.

Relevant upstream CNCF OLM v1 requirements, engineering brief, and epics:

1. Pre-installation:

F1 - Extension catalogs
F2 - Extension catalog discovery
Brief: Catalogd Content Storage and Serving
epic#242 Catalogd webserver uses HTTPS
epic#239 Finalize Catalogd API Definitions for Phase 1 (API Review)

2. Installation:

F7 - Extension installation
Brief: ClusterExtension Controller uses Helm for managing installed content
Brief: ClusterExtension support for simple registry+v1 bundles
epic#733 ClusterExtension uses Helm for deploying bundle content
epic#734 ClusterExtension supports only: ‘Registry+v1 bundles’, ‘AllNamespaces mode’, ‘No webhooks’, ‘No dependencies’. If criteria not met, OLMv1 will block installation
epic#735 Remove Extension API from main branch
epic#736 Implement initial solution for no two (Cluster)Extension objects managing the same underlying object
epic#737 ClusterExtension uses service account provided in spec
epic#740 Finalize ClusterExtension API Definitions for Phase 1 (API Review)

3. Update:

F10 - Extension updates
F8 - Semver-based update policy
Brief: CRD Upgrade Safety
epic#657 CRD Upgrade Safety
epic#738 Default and full support for Replaces, Skips, SkipRange
epic#739 Support OperatorConditions expectations of registry+v1 bundles
[Downstream epic] Inform when installed operators do not support the next Kubernetes minor version
epic#740 Finalize ClusterExtension API Definitions for Phase 1 (API Review)

4. Uninstallation/Deletion:

F17 - Extension cascading removal
Brief: ClusterExtension Controller uses Helm for managing installed content
epic#733 ClusterExtension uses Helm for deploying bundle content
epic#740 Finalize ClusterExtension API Definitions for Phase 1 (API Review)

Relevant documents:

Epic OPRUN-3380: Enable OLM v1 to be on by default in the release payload

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Once OLM v1.0.0 is feature complete and the team feels comfortable enabling it by default, we should remove the OLM v1 feature flag and deploy it on all clusters by default.
We should also introduce OLMv1 behind a CVO capability to give customers the option of leaving it disabled in their clusters.

Why is this important?

…

Scenarios

...

Acceptance Criteria

OLMv1 is enabled by default in OCP
OLMv1 can be fully disabled at install/upgrade time using CVO capabilities

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

...

Previous Work (Optional):

…

Open questions::

…

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OPRUN-3605: last minute GA default payload adjustments

View the Description View the linked PRs

We are encountering toleration misses in origin azure tests which are preventing our components from stabilizing.
cluster-olm-operator is spamming api server with condition lastUpdateTimes
disconnected environment in CI/origin is different from OLMv1 expectations (but we do feel that v1 disconnected functionality is getting enough validation elsewhere to be confident). Created OCPBUGS-44810 to align expectations of the disconnected environments

Story OPRUN-3401: Update cluster-olm-operator manifests to be in the payload by default

View the Description View the linked PRs

Update cluster-olm-operator manifests to be in the payload by default.

A/C:

- Removed "release.openshift.io/feature-set: TechPreviewNoUpgrade" annotation
- Ensure the following cluster profiles are targeted by all manifests:
- include.release.openshift.io/hypershift: "true"
- include.release.openshift.io/ibm-cloud-managed: "true"
- include.release.openshift.io/self-managed-high-availability: "true"
- include.release.openshift.io/single-node-developer: "true"
- No installation related annotations are present in downstream operator-controller and catalogd manifests

Story OPRUN-3588: Ability to disable OLM v1 at installation time

View the Description View the linked PRs

OpenShift offers a "capabilities" to allow users to select which components to include in the cluster at install time.

It was decided the capability name should be: OperatorLifecycleManagerV1 [ref

A/C:

- ClusterVersion resource updated with OLM v1 capability
- cluster-olm-operator manifests updated with capability.openshift.io/name=OperatorLifecycleManagerV1 annotation

https://github.com/openshift/cluster-olm-operator/pull/74

Feature OCPSTRAT-1430: Hosted Control Plane for OpenStack clusters-DevPreview

View the Description

Feature Overview (aka. Goal Summary)

Customers who deploy a large number of OpenShift on OpenStack clusters want to minimise the resource requirements of their cluster control planes.

Customers deploying RHOSO (OpenShift services for OpenStack, i.e. OpenStack control plane on bare metal OpenShift) already have a bare metal management cluster capable of serving Hosted Control Planes.

We should enable self-hosted (i.e. on-prem) Hosted Control Planes to serve Hosted Control Planes to OpenShift on OpenStack clusters, with a specific focus of serving Hosted Control Planes from the RHOSO management cluster.

Goals (aka. expected user outcomes)

As an enterprise IT department and OpenStack customer, I want to provide self-managed OpenShift clusters to my internal customers with minimum cost to the business.

As an internal customer of said enterprise, I want to be able to provision an OpenShift cluster for myself using the business's existing OpenStack infrastructure.

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.

TBD

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both
Classic (standalone cluster)
Hosted control planes
Multi node, Compact (three node), or Single node (SNO), or all
Connected / Restricted Network
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)
Operator compatibility
Backport needed (list applicable versions)
UI need (e.g. OpenShift Console, dynamic plugin, OCM)
Other (please specify)

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

Epic OSASINFRA-3536: Support OpenStack CSIs with Hypershift

View the Description

Goal

Ability to run cinder and manila operators as controller Pods in a hosted control plane
Ability to run Node DaemonSet in a guest clusters

Why is this important?

Continue supporting usage of CSIs for the guest cluster just how it's possible with standalone OpenShift clusters.

Scenarios

\

...

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement
...

Dependencies (internal and external)

...

Previous Work (Optional):

Open questions::

…

Story OSASINFRA-3636: Configure hosted-cluster-config-operator to deploy Cinder CSI

View the Description View the linked PRs

In a HCP deployment, the hosted-cluster-config-operator is responsible for deploying other operators, such as the cluster-storage-operator. We need to modify this operator to deploy cluster-storage-operator and enable the openstack-cinder-csi-driver-operator when deployed in an OpenStack environment.

https://github.com/openshift/hypershift/pull/4936

Story OSASINFRA-3639: Configure hosted-cluster-config-operator to deploy Manila CSI

View the Description View the linked PRs

In a HCP deployment, the hosted-cluster-config-operator is responsible for deploying other operators, such as the cluster-storage-operator. We need to modify this operator to deploy cluster-storage-operator and enable the openstack-manila-csi-driver-operator when deployed in an OpenStack environment.

https://github.com/openshift/hypershift/pull/4988

Story OSASINFRA-3535: Add Cinder CSI support with Hypershift in the csi-operator

View the Description View the linked PRs

In OSASINFRA-3608, we merged the openshift/openstack-cinder-csi-driver-operator repository into openshift/csi-operator and modified it to take advantage of the new generator framework provided therein. Now, we want to build on this, adding Hypershift-specific assets and tweaking whatever else is needed.

https://github.com/openshift/csi-operator/pull/321

Feature OCPSTRAT-1532: Support multiple NICs in Nutanix

View the Description

Feature Overview

Ability to install OpenShift on Nutanix with nodes having multiple NICs (multiple subnets) from IPI and for autoscaling with MachineSets.

Epic CORS-3741: Support multiple NICs in Nutanix

View the Description View the linked PRs

Feature Overview

Ability to install OpenShift on Nutanix with nodes having multiple NICs (multiple subnets) from IPI and for autoscaling with MachineSets.

Feature OCPSTRAT-1571: Add Authorization to internal Components of Agent-Based Installer

View the Description

Feature Overview

Implement authorization to secure API access for different user personas/actors in the agent-based installer.

User Personas:

Read-Only Access: For "wait-for" and "monitor-add-nodes" commands.
Read-Write Access: For systemd services and the agent service.

This is

Goals

The agent-based installer APIs have implemented basic security measures through authentication, as covered in ~~AGENT-145~~.

To further enhance security, it is crucial to implement user persona/actor-based authorization, allowing for differentiated access control, such as read-only or read-write permissions, based on the user's role.

The goal of this implementation is to provide a more robust and secure API framework, ensuring that users can only perform actions appropriate to their role.

Epic AGENT-931: Add Authorization to internal Components of Agent Installer

View the Description

Epic Goal

Implement authorization to secure API access for different user personas/actors in the agent-based installer.
User Personas:
- Read-Only Access: For "wait-for" and "monitor-add-nodes" commands.
- Read-Write Access: For systemd services and the agent service.

Why is this important?

The agent-based installer APIs have implemented basic security measures through authentication, as covered in ~~AGENT-145~~. To further enhance security, it is crucial to implement user persona/actor-based authorization, allowing for differentiated access control, such as read-only or read-write permissions, based on the user's role. This approach will provide a more robust and secure API framework, ensuring that users can only perform actions appropriate to their role.

Scenarios

Users running the wait-for or monitor-add-nodes commands should have read-only permissions. They should not be able to write to the API. If they attempt to perform write operations, appropriate error messages could be displayed, indicating that they are not authorized to write.
Users associated with running systemd services should have both read and write permissions.
Users associated with running the agent service should also have read and write permissions.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

...

Previous Work (Optional):

Open questions::

…

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story AGENT-1028: Save all the 3 auth tokens in a secret and refresh the asset store when tokens expire.

View the Description View the linked PRs

User Story:

As a ABI user, I want to be able to:

add worker nodes on day2 when the authorization implementation creates 3 seperate auth tokens for each user persona
save 3 auth tokens generated when creating nodes iso into a cluster as a secret
regenerate the auth tokens and refresh the asset store if the tokens stored in cluster secret are expired.

so that I can achieve

successful installation
adding workers to a cluster

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/9210

Feature OCPSTRAT-1582: Add nodes to clusters enhancements

View the Description

Feature Overview

Improve the cluster expansion with the agent workflow added in OpenShift 4.16 (TP) and OpenShift 4.17 (GA) with:

Caching RHCOS image for faster node addition, i.e. no extraction of image every time)
Add a single node with just one command, no need to write config files describing node
Support creating PXE artifacts

Goals

Improve the user experience and functionality of the commands to add nodes to clusters using the image creation functionality.

Epic AGENT-939: Day2 add node via agent-install leftover tasks

View the Description

Epic Goal

Cleanup/carryover work from ~~AGENT-682~~ and ~~WRKLDS-937~~ that were non-urgent for GA of the day 2 implementation

Bug AGENT-1022: oc adm node-image create command: filter out node-config image

View the Description View the linked PRs

Currently all the *.iso generated by the node-joiner tool are copied back to the user. Since the node-joiner created unconditionally also the node-config, this one is copied even if it not requested, resulting than confusing for the end user.

https://github.com/openshift/installer/pull/9214

Story AGENT-965: Improve troubleshooting information for the create command

View the Description View the linked PRs

Currently the oc node-image create command does not report any revelant information that could help the user to understand which element was retrieved from (for example, the SSH key), thus making more difficult to troubleshoot an eventual issue.

For this reason, it could be useful that the node-joiner tool would produce a proper json file, reporting all the details about the relevent resources fetched for generating image. The oc command should be able to expose them when required (ie via command flag)

https://github.com/openshift/installer/pull/9213

Bug AGENT-1032: Align create and monitor logs output

View the Description View the linked PRs

Make more similar the two commands output, by using the recently introduced base command logger

https://github.com/openshift/oc/pull/1919

Feature OCPSTRAT-1657: Add a Mechanism to Label all Pods for a Hosted Cluster in the Control Plane Namespace

View the Description

Background

As part of being a first party Azure offering, ARO HCP needs to adhere to Microsoft secure supply chain software requirements. In order to do this, we require setting a label on all pods that run in the hosted cluster namespace.

Goal

Implement Mechanism for Labeling Hosted Cluster Control Plane Pods

Use-cases

Adherance to Microsoft 1p Resource Provider Requirements

Components

Any pods that hypershift deploys or run in the hosted cluster namespace.

Epic HOSTEDCP-2004: Add a Mechanism to Label all Pods in the Control Plane Namespace

View the Description

Goal

Hypershift has a mechanism for Labeling Control Plane Pods
Cluster service should be able to set the label for a given hosted cluster

Why is this important?

As part of being a first party Azure offering, ARO HCP needs to adhere to Microsoft secure supply chain software requirements. In order to do this, we require setting a label on all pods that run in the hosted cluster namespace.
See Documentation: https://eng.ms/docs/more/containers-secure-supply-chain/other

Scenarios

Given a subscriptionID of "1d3378d3-5a3f-4712-85a1-2485495dfc4b", there needs to be the following label on all pods hosted on behalf of the customer:
```
kubernetes.azure.com/managedby: sub_1d3378d3-5a3f-4712-85a1-2485495dfc4b
```

Acceptance Criteria

Dev - Has a valid enhancement if necessary
CI - MUST be running successfully with tests automated
QE - covered in Polarion test plan and tests implemented
Release Technical Enablement - Must have TE slides
...

Dependencies (internal and external)

...

Previous Work (Optional):

…

Open questions:

…

Done Checklist

CI - CI is running, tests are automated and merged.
Release Technical Enablement <link to Feature Enablement Presentation>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Enhancement merged: <link to meaningful PR or GitHub Issue>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story HOSTEDCP-2166: Add New API field to HostedCluster to allow setting Labels on all HCP Pods

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

specify a set of labels for all Control Plane Pods in the HCP namespace.

so that I can achieve

Compliance tracking

Acceptance Criteria:

Description of criteria:

The change is added to all components.
The change is also added to v2 components.

Engineering Details:

This does not require a design proposal.
This does not require a feature gate.

https://github.com/openshift/hypershift/pull/5114

Feature OCPSTRAT-1733: MultiOperatorManager Phase 1

View the Description

Feature Overview (aka. Goal Summary)

An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both
Classic (standalone cluster)
Hosted control planes
Multi node, Compact (three node), or Single node (SNO), or all
Connected / Restricted Network
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)
Operator compatibility
Backport needed (list applicable versions)
UI need (e.g. OpenShift Console, dynamic plugin, OCM)
Other (please specify)

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

Epic API-1835: Produce MultiOperatorManager POC

View the Description View the linked PRs

link back to OCPSTRAT-1644 somehow

Epic Goal*

What is our purpose in implementing this? What new capability will be available to customers?

Why is this important? (mandatory)

What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1767

Feature OCPSTRAT-1753: Support for Specification of a Pre-created Loadbalancer IP on OpenStack

View the Description

Feature Overview (aka. Goal Summary)

Add OpenStackLoadBalancerParameters and add an option for setting the load-balancer IP address for only those platforms where it can be implemented.

Goals (aka. expected user outcomes)

As a user of on-prem OpenShift, I need to manage DNS for my OpenShift cluster manually. I can already specify an IP address for the API server, but I cannot do this for Ingress. This means that I have to:

Manually create the API endpoint IP
Add DNS for the API endpoint
Create the cluster
Discover the created Ingress endpoint
Add DNS for the Ingress endpoint

I would like to simplify this workflow to:

Manually create the API and Ingress endpoint IPs
Add DNS for the API and Ingress endpoints
Create the cluster

Requirements (aka. Acceptance Criteria):

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both
Classic (standalone cluster)
Hosted control planes
Multi node, Compact (three node), or Single node (SNO), or all
Connected / Restricted Network
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)
Operator compatibility
Backport needed (list applicable versions)
UI need (e.g. OpenShift Console, dynamic plugin, OCM)
Other (please specify)

Questions to Answer (Optional):

Out of Scope

Although the Service API's loadBalancerIP API field was defined to be platform-agnostic, it wasn't consistently supported across platforms, and Kubernetes 1.24 has even deprecated it for this reason: https://github.com/kubernetes/kubernetes/pull/107235. We would not want to add a generic option to set loadBalancerIP given that it is deprecated and that it would work only on some platforms and not on others.

Background

This request is similar to RFE-843 (for AWS), RFE-2238 (for GCP), RFE-2824 (for AWS and MetalLB, and maybe others), RFE-2884 (for AWS, Azure, and GCP), and RFE-3498 (for AWS). However, it might make sense to keep this RFE specifically for OpenStack.

Customer Considerations

Documentation Considerations

Interoperability Considerations

Epic OSASINFRA-3489: Support for Specification of a Pre-created Loadbalancer IP on OpenStack

View the Description

Goal

Make Ingress working on day 1 without extra step for the customer

Why is this important?

…

Scenarios

\

...

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement
...

Dependencies (internal and external)

...

Previous Work (Optional):

…

Open questions::

…

Task OSASINFRA-3644: Support for Service.Spec.FloatingIP in Hypershift

View the Description View the linked PRs

Allow to configure a pre-created floating IP when creating HostedClusters which will set the Service.Spec.FloatingIP for the router-default.

https://github.com/openshift/hypershift/pull/4835

Feature OCPSTRAT-525: Enable HAProxy Dynamic Configuration Manager for OpenShift - Tech Preview

View the Description

We need to do a lot of R&D and fix some known issues (e.g., see linked BZs).

R&D targetted at 4.16 and productisation of this feature in 4.17

Epic NE-879: Enable the dynamic config manager

View the Description

Goal
To make the current implementation of the HAProxy config manager the default configuration.

Objectives

Disable pre-allocation route blueprints
- The route blueprints sub-feature should not be used to reduce the impact of the feature.
Limit dynamic server allocation
- Set the maximum number of dynamic servers to a minimal value to prevent high resource consumption.
Provide customer opt-out
- Offer customers a handler to opt out of the default config manager implementation.

Story NE-1790: Enable Dynamic Configuration Manager

View the Description View the linked PRs

The goal of this user story is to combine the code from the smoke test user story and results from the spike into an implementation PR.

Since multiple gaps were discovered a feature gate will be needed to ensure stability of OCP before the feature can be enabled by default.

https://github.com/openshift/cluster-ingress-operator/pull/1159

Story NE-1815: Fix implementation gaps discovered during the smoke tests

View the Description View the linked PRs

https://issues.redhat.com/browse/NE-1788 describes 3 gaps in the implementation of DAC:

Idled services are waken up by the health check from the servers set by DAC (server-template).
ALPN TLS extension is not enabled for reencrypt routes.
Dynamic servers produce dummy metrics.

Additional gaps were discovered along the way:

No cookie value is set by router if DAC is enabled without blueprints. This means no sticky sessions which we have by default. The cookie is defined in the template but its value is not set at runtime or in the template. For edge routes the value seemed to be designed to be generated dynamically but the corresponding option is missing in the cookie config directive. Needs template changes and e2e tests.
verifyhost directive is not used in dynamic servers added by DAC since it requires FQDN name at config parsing time (service name).

This story aims at fixing those gaps.

https://github.com/openshift/router/pull/637

Feature OCPSTRAT-719: Support custom IPv4 subnets on OVN for BYO VPC deployments

View the Description

Feature Overview (aka. Goal Summary)

Add support for the Installer to configure IPV4Subnet to customize internal OVN network in BYO VPC.

Goals (aka. expected user outcomes)

As an OpenShift user I'm able to provide IPv4 subnets to the Installer so I can customize the OVN networks at install time

Requirements (aka. Acceptance Criteria):

The Installer will allow the user to provide the information via the install config manifest and this information will be used at install time to configure the OVN network and deploy the cluster into an existing VPC provided by the user.

Background

This is a requirement for ROSA, ARO and OSD

Documentation Considerations

As any other option for the Installer this will be documented as usual.

Implementation Considerations

Terraform is used for creating or referencing VPCs

Epic CORS-2753: Hybrid SRE: Configure IPV4Subnet to customize internal OVN network in BYOVPC

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.{}

Feature Overview (aka. Goal Summary)

Configure IPV4Subnet to customize internal OVN network in BYOVPC

Goals (aka. expected user outcomes)

Users are able to successfully provide IPV4Subnets through the install config that are used to customize the OVN networks.

Requirements (aka. Acceptance Criteria):

Install config parameter is added to accept user input.
Input is provided to the OVN network during installation and is used to install them onto the BYOVPC

Use Cases (Optional):

ROSA, ARO and OSD needs this for their product.

Questions to Answer (Optional):

-

Out of Scope

Other cloud platforms except AWS

Background

-

Customer Considerations

-

Documentation Considerations

-

Interoperability Considerations

-

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2754: Hybrid SRE:[spike] Configure IPV4Subnet to customize internal OVN network in BYOVPC

View the linked PRs

https://github.com/openshift/installer/pull/9212

Feature OCPSTRAT-979: Integrate Azure Workload Identities and Managed Service Identity (MSI) for Operators (control plane/data plane)

View the Description

Goal Summary

This feature aims to make sure that the HyperShift operator and the control-plane it deploys uses Managed Service Identities (MSI) and have access to scoped credentials (also via access to AKS's image gallery potentially). Additionally, for operators deployed in customers account (system components), they would be scoped with Azure workload identities.

Epic IR-493: Azure Service Principal Support with Mounted Credentials

View the Description

Epic Goal

The image registry can authenticate with Service Principal backed by a certificate stored in an Azure Key Vault. The Secrets CSI driver will be used to mount the certificate as a volume on the image registry deployment in a hosted control plane.

Why is this important?

This is needed to enable authentication with Service Principal with backing certificates for ARO HCP.

Acceptance Criteria

Image registry is able to authenticate with Azure in ARO HCP using Service Principal with a backing certificate.
Updated documentation
ARO HCP CI coverage

Dependencies (internal and external)

Azure SDK

Previous Work (Optional):

~~IR-460~~

Open questions:

Which degree of coverage should run on AKS e2e vs on existing e2es

Done Checklist

CI - Existing CI is running, tests are automated and merged.
CI - AKS CI is running, tests are automated and merged.
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story HOSTEDCP-2022: Update Image Registry Deployment to Mount Cert Using Secrets Store CSI Driver

View the Description View the linked PRs

User Story:

As a ARO HCP user, I want to be able to:

mount certificates from Key Vault using the Secrets Store CSI Driver on AKS

so that I can

use certificates to authenticate with Azure API for image registry on the HCP

Acceptance Criteria:

Description of criteria:

Upstream documentation
HyperShift PR with the changes to mount the certificate to the image registry deployment using the Secrets Store CSI driver

(optional) Out of Scope:

N/A

Engineering Details:

This expects the AKS management cluster to have the Secrets Store CSI driver installed, for example, through the flag `--enable-addons azure-keyvault-secrets-provider`.
https://learn.microsoft.com/en-us/azure/aks/csi-secrets-store-driver

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/4888

Feature OCPSTRAT-992: [Tech Preview]: Allow customer managed DNS solutions for AWS: Implementation

View the Description

Goal:

As an administrator, I would like to use my own managed DNS solution instead of only specific openshift-install supported DNS services (such as AWS Route53, Google Cloud DNS, etc...) for my OpenShift deployment.

Problem:

While cloud-based DNS services provide convenient hostname management, there's a number of regulatory (ITAR) and operational constraints customers face prohibiting the use of those DNS hosting services on public cloud providers.

Why is this important:

Provides customers with the flexibility to leverage their own custom managed ingress DNS solutions already in use within their organizations.
Required for regions like AWS GovCloud in which many customers may not be able to use the Route53 service (only for commercial customers) for both internal or ingress DNS.
OpenShift managed internal DNS solution ensures cluster operation and nothing breaks during updates.

Dependencies (internal and external):

DNS work for KNI
https://docs.google.com/document/d/1VsukDGafynKJoQV8Au-dvtmCfTjPd3X9Dn7zltPs8Cc/edit

This is a prerequisite for the internal clusters epic: https://docs.google.com/document/d/1gxtIW6OlasVQtQLTyOl6f9H9CMuxiDNM5hQFNd3xubE/edit#

Prioritized epics + deliverables (in scope / not in scope):

Ability to bootstrap cluster without an OpenShift managed internal DNS service running yet
Scalable, cluster (internal) DNS solution that’s not dependent on the operation of the control plane (in case it goes down)
Ability to automatically propagate DNS record updates to all nodes running the DNS service within the cluster
Option for connecting cluster to customers ingress DNS solution already in place within their organization

Estimate (XS, S, M, L, XL, XXL):

Previous Work:

Open questions:

Link to Epic: https://docs.google.com/document/d/1OBrfC4x81PHhpPrC5SEjixzg4eBnnxCZDr-5h3yF2QI/edit?usp=sharing

Epic CORS-3292: Start in-cluster DNS to resolve API and API-Int URLs by starting CoreDNS pods on bootstrap and Control plane nodes

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Render CoreDNS pod definition and Corefile using information in the Infrastructure CR. Start CoreDNS pods on the bootstrap and Control plane nodes.

Why is this important?

…

Scenarios

...

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

...

Previous Work (Optional):

…

Open questions::

…

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-3713: Create CoreDNS pods on bootstrap and Control Plane nodes

View the Description View the linked PRs

Update MCO to start in-cluster CoreDNS pods for AWS when userProvisionedDNS is configured. Use the GCP implementation https://github.com/openshift/machine-config-operator/pull/4018 for reference.

User Story:

As a (user persona), I want to be able to:

Capability 1
Capability 2
Capability 3

so that I can achieve

Outcome 1
Outcome 2
Outcome 3

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/machine-config-operator/pull/4711

4.19.0-0.nightly-2024-11-22-174136

Changes from 4.18.0-ec.4

Incomplete Features

Goal

User Stories

UX doc

Non-Requirements

Notes

Feature Overview (aka. Goal Summary)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Epic Goal

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Background

Customer Considerations

Documentation Considerations

Relevant upstream CNCF OLM v1 requirements, engineering brief, and epics:

Relevant documents:

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Feature Overview

Feature Overview

Feature Overview

Goals

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

User Story:

Acceptance Criteria:

(optional) Out of Scope:

Engineering Details:

Feature Overview

Goals

Epic Goal

Background

Goal

Use-cases

Components

Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):