Back to index

4.9.0-0.ci-2023-09-10-173339

Jump to: Complete Features | Incomplete Features | Complete Epics | Incomplete Epics | Other Complete | Other Incomplete |

Changes from 4.8.57

Note: this page shows the Feature-Based Change Log for a release

Complete Features

These features were completed when this image was assembled

Currently the Get started with on-premise host inventory quickstart gets delivered in the Core console. If we are going to keep it here we need to add the MCE or ACM operator as a prerequisite, otherwise it's very confusing.

Feature Overview

  • This Section:* High-Level description of the feature ie: Executive Summary
  • Note: A Feature is a capability or a well defined set of functionality that delivers business value. Features can include additions or changes to existing functionality. Features can easily span multiple teams, and multiple releases.

 

Goals

  • This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

 

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

 

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

(Optional) Use Cases

This Section: 

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

 

Questions to answer…

  • ...

 

Out of Scope

 

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

 

Assumptions

  • ...

 

Customer Considerations

  • ...

 

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

Problem Alignment

The Problem

Customers typically run more than one cluster and/or applications deployed across different regions. In such a hybrid cloud environment, aggregating metrics is a key requirement to avoid admins and or applications owners to drop in into individual clusters to troubleshoot specific problems. And since Red Hat does not offer a standalone metrics aggregation service, customers have started to use existing, home-grown technologies based on, for example, InfluxDB or Kafka to achieve that.

In summary:

  • OpenShift Monitoring is optimized for short-term retention only.
  • Red Hat does not offer a central metrics aggregation service yet.
  • Customers use existing, home-grown technologies to distribute information across other stakeholders in their company.

High-Level Approach

Expose Prometheus remote-write configuration via our OpenShift Monitoring (Cluster and User Workload) ConfigMap to allow customers to push time-series data to a remote location.

Please note that we do not plan to support certain third party “receivers” with this solution. Customers will be responsible to ensure an appropriate receiving component is up and running that implements the “remote-write” API. Here is a list of possible “receiver” plugins.

Goal & Success

  • Introduce some “ease of use” features to configure certain parts for remote-write to decrease possible misconfigurations.
  • Allow customers to push metrics off the cluster to allow aggregation use cases and more options for our partners to integrate into OpenShift - e.g. to allow long-term retention or security/analytics scenarios.

Solution Alignment

Key Capabilities

  • As an OpenShift administrator, I want to configure remote-write for both the OOTB infrastructure bundle and the user workload stack, so that time-series data will be available on the system of my choice.
  • As an OpenShift administrator, I want to easily build an allow list of metrics that should be pushed externally.

Key Flows

User configures one of the available ConfigMaps to allow node_cpu_seconds_total to be written into a remote Thanos system.

  • Administrator opens the cluster-monitoring-config ConfigMap.
  • They add a new field to configure remote write.
  • They add the node_cpu_seconds_total metric to the allow list.
  • They add the remote URL for the Thanos receiver.
  • They add a Secret to configure authentication against the remote service.

Additional resources

Remote write allows to replicate time-series data to a remote location. This is important for several scenarios like you want to use "remote-write enabled" systems (e.g. InfluxDB) for long-term storage and historical analysis; as well as for aggregating metrics across multiple clusters.

Currently, remote-write is in an experimental stage in Prometheus[1] but the chances are high that it will be stable some time this year. Furthermore, we are using remote-write pretty extensively already for Telemetry as well as ACM in the near future. With that in mind, we think that we are in a perfect spot to move what we already have[2] from dev preview to at least tech preview.

Acceptance criteria

  • mTLS support (important for positioning Red Hat's Advanced Cluster Manager (ACM) as they will need it for pushing metrics from OpenShift clusters into their central management solution backed by Observatorium.
  • Default configurations coming from Red Hat (such as Telemetry and ACM) should not be overridden. ACM for example may inject their configuration automatically post installation (mechanism to be discussed).

Non-goals

  • Configuration isolation for cluster and user workload monitoring ConfigMap to allow separating remote-write configuration per "tenant" or "user".
  • Remote write for Thanos Ruler (this isn't supported yet, see https://github.com/thanos-io/thanos/issues/1724).

Open questions

  • Do we want to expose a different API to make configuring an allow list easier for everyone rather than exposing relabeling configuration directly? Reason is that we want to avoid validating "syntax" requests in a BZ or internal.

Documentation

  • New section inside the configuration chapter that describes how to setup remote-write with an example on how it looks like for standard remote write implementation. For both CMO and UWM.
  • A small note about implications on setting up remote-write to the overall Prometheus cluster.
  • How to configure security/auth (e.g. (m)TLS).
  • The API.
  • Tuning.
  • Proxy configuration (if not supported, then we need a statement).

Other resources

[1] https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#prometheusspec - "If specified, the remote_write spec. This is an experimental feature, it may change in any upcoming release in a breaking way." The experimental flag was removed.

We'll want to give user the option to add remote_write configs to both the cluster monitoring and UWM.
AC:

  • decide what features we want to give users
  • decide what API we want to expose to users, i.e. basic rw config, low-level relabel-config, or high-level-streamlined API
  • implement the API

https://issues.redhat.com/browse/MON-1069?focusedCommentId=16252560&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16252560

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

As a cluster administrator,

I want OpenShift to include a recent CoreDNS version,

so that I have the latest available performance and security fixes.

 

We should strive to follow upstream CoreDNS releases by bumping openshift/coredns with every OpenShift 4.y release, so that OpenShift benefits from upstream performance and security fixes, and so that we avoid large version-number jumps when an urgently needed change necessitates bumping CoreDNS to the latest upstream release. This bump should happen as early as possible in the OpenShift release cycle, so as to maximize soak time.

 

For OpenShift 4.9, this means bumping from CoreDNS 1.8.1 to 1.8.3, or possibly a later release should one ship before we do the bump.

 

Note that CoreDNS upstream does not maintain release branches—that is, once CoreDNS is released, there will be no further 1.8.z releases—so we may be better off updating to 1.9 as soon as it is released, rather than staying on the 1.8 series which would then be unmaintained.

 

We may consider bumping CoreDNS again during the OpenShift 4.9 release cycle if upstream ships additional releases during the 4.9 development cycle. However, we will need to weigh the risks and available remaining soak time in the release schedule before doing so, should that contingency arise.

 

Feature Overview

As a OpenShift administrator, I would like a solution that allows me to upgrade from one EUS version to another with very few steps and only minimum disruption to application workloads while still allowing new application services to be deployed.

Goals

4.8

  • Spike, Design, and Scope
  • Begin foundational development if possible

4.9

  • Foundational items delivered and back ported as necessary

4.10

  • Remaining delivery artifacts complete
  • Documentation and enablement complete
  • Full testing complete

Requirements

Functional requirements break down into the following prioritized list:

 

  1. Make serial upgrades safe
    1. Prevent upgrades before the core components are ready (version skewing, incompatible APIs)
    2. Prevent upgrades before operators or ready
      1. Ensure Operators have a way to express max version
      2. Ensure OLM policy is clear on what happens if max version is not specified
    3. Make back pressure items (reasons you cannot upgrade) clear to administrators along with the actions to resolve
    4. CI MUST be running with test automation
    5. Note: Forcing an upgrade is still possible
  2. Make updates faster
    1. Optimize where possible to increase speed of upgrade for core components (SDN/Daemonsets)
  3. Reduce the amount of workload disruption
    1. Work load disruption is not just reboots it is any disruption to workloads during the upgrade, of which a reboot is likely the worst case scenario.  This may also include things like rescheduling of workloads.
    2. We will not change the model of how components are deployed, changes to the host still require a reboot
    3. Discover and document any necessary guidelines to reduce the number of items that are developed which would cause a reboot between EUS releases where possible (4.8, 4.9).  
    4. As a stretch goal, discover if it is possible to reduce the reboots between 4.6 and 4.7 
  4. Should take into consideration clusters with RHEL workers

 

Non-Functional Requirements

Requirement Notes isMvp?
Release Technical Enablement Provide necessary release enablement details and documents. YES
Documentation This is a requirement for ALL end user facing features YES

Questions to answer…

Out of Scope

  • It is not intended to support version skews that fall outside the upstream version skew policy
  • It is not intended to eliminate all reboots
  • It is not intended to skip releases at this time

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?
  • New Content, Updates to existing content, Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

EUS to EUS Focus Area Discussion: https://docs.google.com/document/d/17I1Wd7-R1wRxmboyv1jUFHFkqQcBTorJccdGi1ZqjQE/edit?usp=sharing

EUS Feature: https://issues.redhat.com/browse/OCPPLAN-5484

Epic Goal

  • Ensure the user experience for upgrades in console supports EUS -> EUS upgrades.

Why is this important?

  • This is a product-wide initiative.

Scenarios

  1. The console cluster settings page should inform administrators of upgrade requirements prior to the first upgrade step.
  2. The console cluster settings page should report problems during an upgrade.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Dependencies (internal and external)

  1. CVO - Sufficient APIs (ClusterVersion, Alerts) for console to show requirements before an upgrade and problems during an upgrade to an administrator.

Previous Work (Optional):

Open questions::

  1. We have an R&D story to investigate what the console experience should be and what APIs might be necessary.

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Use case
As an Admin, one of my operators says it can't be upgraded. An action is required, as I will be unable to upgrade to a .y minor release until I fix the problem.
 
Possible Design Solution 
Create a message saying you can upgrade to .z patch releases even when one of your cluster operators says it's not upgradeable.

Ideally, the message string on the condition explains what the admin needs to resolve , and until they resolve the issue they can only update within their current z stream.

 
Questions
Need to do a little R&D to find out when this happens and what happens when you're in this state.

Designs (WIP)
Doc: https://docs.google.com/document/d/1iUZlHbv5nTYtb7Cq4rn_bYPqD4Jtie59xIogxN-2Eyc/edit#heading=h.5eoflxvaj1m4

Summary (PM+lead)

Configure audit logging to capture login, logout and login failure details

Motivation (PM+lead)

TODO(PM): update this

Customer who needs login, logout and login failure details inside the openshift container platform.
I have checked for this on my test cluster but the audit logs do not contain any user name specifying login or logout details. For successful logins or logout, on CLI and openshift console as well we can see 'Login successful' or 'Invalid credentials'.

Expected results: Login, logout and login failures should be captured in audit logging.

Goals (lead)

  1. Login, logout and login failures should be captured in audit logs

Non-Goals (lead)

  1. Don't attempt to log login failures in the IdP login flow that goes beyond timeout, if it the information is not available in explicit oauth-server requests (e.g. github password login error).
  2. Logout does not involve oauth-server (but is a simple API object deletion in oauth-apiserver). Hence, the audit log discussed here won't include logout.

Deliverables

  1. Changes to oauth-server to log into /varLog/oauth-server/audit.log on the master node.
  2. Documentation

Proposal (lead)

The apiserver pods today have ´/var/log/<kube|oauth|openshift>-apiserver` mounted from the host and create audit files there using the upstream audit event format (JSON lines following https://github.com/kubernetes/apiserver/blob/92392ef22153d75b3645b0ae339f89c12767fb52/pkg/apis/audit/v1/types.go#L72). These events are apiserver specific, but as oauth authentication flow events are also requests, we can use the apiserver event format to log logins, login failures and logouts. Hence, we propose to make oauth-server to create /var/log/oauth-server/audit.log files on the master nodes using that format.

When the login flow does not finish within a certain time (e.g. 10min), we can artificially create an event to show a login failure in the audit logs.

User Stories (PM)

Dependencies (internal and external, lead)

Previous Work (lead)

Open questions (lead)

  1. ...

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

 

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • ...

Why is this important?

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

OCP/Telco Definition of Done
Feature Template descriptions and documentation.

Feature Overview.

Early customer feedback is that they see SNO as a great solution covering smaller footprint deployment, but are wondering what is the evolution story OpenShift is going to provide where more capacity or high availability are needed in the future.

While migration tooling (moving workload/config to new cluster) could be a mid-term solution, customer desire is not to include extra hardware to be involved in this process.

 For Telecommunications Providers, at the Far Edge they intend to start small and then grow. Many of these operators will start with a SNO-based DU deployment as an initial investment, but as DUs evolve, different segments of the radio spectrum are added, various radio hardware is provisioned and features delivered to the Far Edge, the Telecommunication Providers desire the ability for their Far Edge deployments to scale up from 1 node to 2 nodes to n nodes. On the opposite side of the spectrum from SNO is MMIMO where there is a robust cluster and workloads use HPA.

Goals

  • Provide the capability to expand a single replica control plane topology to host more workloads capacity - add worker
  • Provide the capability to expand a single replica control plane to be a highly available control plane
  • To satisfy MMIMO Telecommunications providers will want the ability to scale a SNO to a multi-node cluster that can support HPA.
  • Telecommunications providers do not want workload (DU specifically) downtime when migrating from SNO to a multi-node cluster.
  • Telecommunications providers wish to be able to scale from one to two or more nodes to support a variety of radio hardware.
  • Support CP scaling (CP HA) for 2 node cluster, 3 node cluster and n node cluster. As the number of nodes in the cluster increases so does the failure domain of the cluster. The cluster is now supporting more cell sectors and therefore has more of a need for HA and resiliency including the cluster CP.

Requirements

  • TBD
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

(Optional) Use Cases

This Section:

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

Questions to answer…

  • ...

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

  • ...

Customer Considerations

  • ...

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?
  • New Content, Updates to existing content, Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic Goal

  • Documented and supported flow for adding 1, 2, 3 or more workers to a Single Node OpenShift (SNO) deployment without requiring cluster downtime and the understanding that this action will not make the cluster itself highly available.

Why is this important?

  • Telecommunications and Edge scenarios where HA is handled via failover to another site but single site capacity may vary or need to be expanded over time.
  • Similar scenarios exist for some ISV vendors where OpenShift is an implementation detail of how they deliver their solution on top of another platform (e.g. VMware).

Scenarios

  1. Adding a worker to a single node openshift cluster.
  2. Adding a second worker to a single node openshift cluster.
  3. Adding a third worker to a single node openshift cluster.
  4. Removing a worker node from a single node openshift cluster that has had 1 or more workers added.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • Customer facing documentation of the add worker flow for SNO.

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

  1. Presumably there is a scale limit on how many workers could be added to an SNO control plane, and it is lower than the limit for a "normal" 3 node control plane. It is not anticipated that this limit will be established in this epic. Intent is to focus on small scale sites where adding 1-3 worker nodes would be beneficial.

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Epic Goal

  • Make it possible to disable the console operator at install time, while still having a supported+upgradeable cluster.

Why is this important?

  • It's possible to disable console itself using spec.managementState in the console operator config. There is no way to remove the console operator, though. For clusters where an admin wants to completely remove console, we should give the option to disable the console operator as well.

Scenarios

  1. I'm an administrator who wants to minimize my OpenShift cluster footprint and who does not want the console installed on my cluster

Acceptance Criteria

  • It is possible at install time to opt-out of having the console operator installed. Once the cluster comes up, the console operator is not running.

Dependencies (internal and external)

  1. Composable cluster installation

Previous Work (Optional):

  1. https://docs.google.com/document/d/1srswUYYHIbKT5PAC5ZuVos9T2rBnf7k0F1WV2zKUTrA/edit#heading=h.mduog8qznwz
  2. https://docs.google.com/presentation/d/1U2zYAyrNGBooGBuyQME8Xn905RvOPbVv3XFw3stddZw/edit#slide=id.g10555cc0639_0_7

Open questions::

  1. The console operator manages the downloads deployment as well. Do we disable the downloads deployment? Long term we want to move to CLI manager: https://github.com/openshift/enhancements/blob/6ae78842d4a87593c63274e02ac7a33cc7f296c3/enhancements/oc/cli-manager.md

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

In the console-operator repo we need to add `capability.openshift.io/console` annotation to all the manifests that the operator either contains creates on the fly.

 

Manifests are currently present in /bindata and /manifest directories.

 

Here is example of the insights-operator change.

Here is the overall enhancement doc.

 

We need to continue to maintain specific areas within storage, this is to capture that effort and track it across releases.

Goals

  • To allow OCP users and cluster admins to detect problems early and with as little interaction with Red Hat as possible.
  • When Red Hat is involved, make sure we have all the information we need from the customer, i.e. in metrics / telemetry / must-gather.
  • Reduce storage test flakiness so we can spot real bugs in our CI.

Requirements

Requirement Notes isMvp?
Telemetry   No
Certification   No
API metrics   No
     

Out of Scope

n/a

Background, and strategic fit
With the expected scale of our customer base, we want to keep load of customer tickets / BZs low

Assumptions

Customer Considerations

Documentation Considerations

  • Target audience: internal
  • Updated content: none at this time.

Notes

In progress:

  • CI flakes:
    • Configurable timeouts for e2e tests
      • Azure is slow and times out often
      • Cinder times out formatting volumes
      • AWS resize test times out

 

High prio:

  • Env. check tool for VMware - users often mis-configure permissions there and blame OpenShift. If we had a tool they could run, it might report better errors.
    • Should it be part of the installer?
    • Spike exists
  • Add / use cloud API call metrics
    • Helps customers to understand why things are slow
    • Helps build cop to understand a flake
      • With a post-install step that filters data from Prometheus that’s still running in the CI job.
    • Ideas:
      • Cloud is throttling X% of API calls longer than Y seconds
      • Attach / detach / provisioning / deletion / mount / unmount / resize takes longer than X seconds?
    • Capture metrics of operations that are stuck and won’t finish.
      • Sweep operation map from executioner???
      • Report operation metric into the highest bucket after the bucket threshold (i.e. if 10minutes is the last bucket, report an operation into this bucket after 10 minutes and don’t wait for its completion)?
      • Ask the monitoring team?
    • Include in CSI drivers too.
      • With alerts too

Unsorted

  • As the number of storage operators grows, it would be grafana board for storage operators
    • CSI driver metrics (from CSI sidecars + the driver itself  + its operator?)
    • CSI migration?
  • Get aggregated logs in cluster
    • They're rotated too soon
    • No logs from dead / restarted pods
    • No tools to combine logs from multiple pods (e.g. 3 controller managers)
  • What storage issues customers have? it was 22% of all issues.
    • Insufficient docs?
    • Probably garbage
  • Document basic storage troubleshooting for our supports
    • What logs are useful when, what log level to use
    • This has been discussed during the GSS weekly team meeting; however, it would be beneficial to have this documented.
  • Common vSphere errors, their debugging and fixing. 
  • Document sig-storage flake handling - not all failed [sig-storage] tests are ours

Epic Goal

  • Update all images that we ship with OpenShift to the latest upstream releases and libraries.
  • Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.

Why is this important?

  • We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

  • Kubernetes:
    • client-go
    • controller-runtime
  • OCP:
    • library-go
    • openshift/api
    • openshift/client-go
    • operator-sdk

Operators:

  • aws-ebs-csi-driver-operator 
  • aws-efs-csi-driver-operator
  • azure-disk-csi-driver-operator
  • azure-file-csi-driver-operator
  • openstack-cinder-csi-driver-operator
  • gcp-pd-csi-driver-operator
  • gcp-filestore-csi-driver-operator
  • manila-csi-driver-operator
  • ovirt-csi-driver-operator
  • vmware-vsphere-csi-driver-operator
  • alibaba-disk-csi-driver-operator
  • ibm-vpc-block-csi-driver-operator
  • csi-driver-shared-resource-operator

 

  • cluster-storage-operator
  • csi-snapshot-controller-operator
  • local-storage-operator
  • vsphere-problem-detector

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

Feature Overview

OpenShift console supports new features and elevated experience for Operator Lifecycle Manager (OLM) Operators and Cluster Operators.

Goal:

OCP Console improves the controls and visibility for managing vendor-provided software in customers’ infrastructure and making these solutions available for customers' internal users.

 

To achieve this, 

  • Operator Lifecycle Manager (OLM) teams have been introducing new features aiming towards simplification and ease of use for both developers and cluster admins.
  • On the Cluster Operators side, the console iteratively improves the visibilities to the resources being associated with the Operators to improve the overall managing experience.

We want to make sure OLM’s and Cluster Operators' new features are exposed in the console so admin console users can benefit from them.

Benefits:

  • Cluster admin/Operator consumers:
    • Able to see, learn, and interact with OLM managed and/or Cluster Operators associated resources in openShift console.

Requirements

Requirement Notes isMvp?
OCP console supports the latest OLM APIs and features This is a requirement for ALL features. YES
OCP console improves visibility to Cluster Operators related resources and features. This is a requirement for ALL features. YES
     

 


(Optional) Use Cases
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Main success scenarios - high-level user stories
* Alternate flow/scenarios - high-level user stories
* ...

Questions to answer...
How will the user interact with this feature?
Which users will use this and when will they use it?
Is this feature used as part of the current user interface?

Out of Scope
<--- Remove this text when creating a Feature in Jira, only for reference --->
# List of non-requirements or things not included in this feature
# ...

Background, and strategic fit
<--- Remove this text when creating a Feature in Jira, only for reference --->
What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there assumptions being made regarding prerequisites and dependencies?
* Are there assumptions about hardware, software or people resources?
* ...

Customer Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there specific customer environments that need to be considered (such as working with existing h/w and software)?
...

Documentation Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic Goal

  • OCP console supports devs to easier focus and create Operand/CR instances on the creation form page.
  • OCP console supports cluster admins to better see/understand the Operator installation status in the OperatorHub page.

Why is this important?

  • OperatorHub page currently shows an Operator as Installed as long as a Subscription object exists for that operator in the current namespace, which can be misleading because the installation could be stalled or require additional interactions from the user (e.g. "manual upgrade approval") in order to complete the installation.
  • Some Operator managed services use these advanced properties in their CRD validation schema, but the current form generator in the console ignores/skips them. Hence, those fields on the creation form are missing.

Scenarios

  1. As a user of OperatorHub, I'd like to have an improved "status display" for Operators being installed before so I can better understand if those Operators actually being successfully installed or require additional actions from me to complete the installation.
  2. As a user of the OCP console, I'd like to Operand/CR creation form that covers advanced JSONSchema validation properties so I can create a CR instance solely with the form view.

Acceptance Criteria

  • Console improves the visibility of Operator installation status on OperatorHub page
  • Console operand creation form adds support for `allOf`, `anyOf`, `oneOf`, and `additionalProperties` JSONSchema validation keywords so the creation form UI can render them and not skipping those properties/fields.
  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

  • Options

 

OLM is adding a property to the CSV to signal that the operator should clean up the operand on operator uninstall. See https://github.com/operator-framework/enhancements/pull/46

Console will need to add a checkbox to the UI to prompt ask the user if the operand should be cleaned up (with strong warnings about what this means). On delete, console should set the `spec.cleanup` property on the CSV to indicate whether cleanup should happen.

Additionally, console needs to be able to show proper status for CSVs that are terminating in the UI so it's clear the operator is being deleted and cleanup is in progress. If there are errors with cleanup, those should be surfaced back through the UI.

Depends on OLM-1733

cc Ali Mobrem Tony Wu Daniel Messer Peter Kreuser

User Story

As a user of OperatorHub, I'd like to have an improved "status display" for Operators being installed before so I can better understand if those Operators actually being successfully installed or require additional actions from me to complete the installation.

Desired Outcome

Improve visibility of Operator installation status on OperatorHub page

Why this is important?

OperatorHub page currently shows an Operator as Installed as long as a Subscription object exists for that operator in the current namespace.

This can be misleading because the installation could be stalled or require additional interactions from the user (e.g. "manual upgrade approval") in order to complete the installation.

The console could potentially have some indication of an "in-between" or "requires attention" state for Operators that are in these states + links to the actual "Installed Operators" page for more details.

Related Info:

1. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1899359
2. RFE: https://issues.redhat.com/browse/RFE-1691

Feature Overview

  • As an infrastructure owner, I want a repeatable method to quickly deploy the initial OpenShift cluster.
  • As an infrastructure owner, I want to install the first (management, hub, “cluster 0”) cluster to manage other (standalone, hub, spoke, hub of hubs) clusters.

Goals

  • Enable customers and partners to successfully deploy a single “first” cluster in disconnected, on-premises settings

Requirements

4.11 MVP Requirements

  • Customers and partners needs to be able to download the installer
  • Enable customers and partners to deploy a single “first” cluster (cluster 0) using single node, compact, or highly available topologies in disconnected, on-premises settings
  • Installer must support advanced network settings such as static IP assignments, VLANs and NIC bonding for on-premises metal use cases, as well as DHCP and PXE provisioning environments.
  • Installer needs to support automation, including integration with third-party deployment tools, as well as user-driven deployments.
  • In the MVP automation has higher priority than interactive, user-driven deployments.
  • For bare metal deployments, we cannot assume that users will provide us the credentials to manage hosts via their BMCs.
  • Installer should prioritize support for platforms None, baremetal, and VMware.
  • The installer will focus on a single version of OpenShift, and a different build artifact will be produced for each different version.
  • The installer must not depend on a connected registry; however, the installer can optionally use a previously mirrored registry within the disconnected environment.

Use Cases

  • As a Telco partner engineer (Site Engineer, Specialist, Field Engineer), I want to deploy an OpenShift cluster in production with limited or no additional hardware and don’t intend to deploy more OpenShift clusters [Isolated edge experience].
  • As a Enterprise infrastructure owner, I want to manage the lifecycle of multiple clusters in 1 or more sites by first installing the first  (management, hub, “cluster 0”) cluster to manage other (standalone, hub, spoke, hub of hubs) clusters [Cluster before your cluster].
  • As a Partner, I want to package OpenShift for large scale and/or distributed topology with my own software and/or hardware solution.
  • As a large enterprise customer or Service Provider, I want to install a “HyperShift Tugboat” OpenShift cluster in order to offer a hosted OpenShift control plane at scale to my consumers (DevOps Engineers, tenants) that allows for fleet-level provisioning for low CAPEX and OPEX, much like AKS or GKE [Hypershift].
  • As a new, novice to intermediate user (Enterprise Admin/Consumer, Telco Partner integrator, RH Solution Architect), I want to quickly deploy a small OpenShift cluster for Poc/Demo/Research purposes.

Questions to answer…

  •  

Out of Scope

Out of scope use cases (that are part of the Kubeframe/factory project):

  • As a Partner (OEMs, ISVs), I want to install and pre-configure OpenShift with my hardware/software in my disconnected factory, while allowing further (minimal) reconfiguration of a subset of capabilities later at a different site by different set of users (end customer) [Embedded OpenShift].
  • As an Infrastructure Admin at an Enterprise customer with multiple remote sites, I want to pre-provision OpenShift centrally prior to shipping and activating the clusters in remote sites.

Background, and strategic fit

  • This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

  1. The user has only access to the target nodes that will form the cluster and will boot them with the image presented locally via a USB stick. This scenario is common in sites with restricted access such as government infra where only users with security clearance can interact with the installation, where software is allowed to enter in the premises (in a USB, DVD, SD card, etc.) but never allowed to come back out. Users can't enter supporting devices such as laptops or phones.
  2. The user has access to the target nodes remotely to their BMCs (e.g. iDrac, iLo) and can map an image as virtual media from their computer. This scenario is common in data centers where the customer provides network access to the BMCs of the target nodes.
  3. We cannot assume that we will have access to a computer to run an installer or installer helper software.

Customer Considerations

  • ...

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?
  • New Content, Updates to existing content, Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

 

References

 

 

Epic Goal

  • As an OpenShift infrastructure owner, I want to specify static networking inputs to the installer, where hosts can receive their network settings dynamically, in disconnected, on-premises settings.

 

Why is this important?

  • Customers want to specify static network configurations, such as Static IPs, VLANs, and NICs bonding when deploying OpenShift in disconnected, on-premises settings (and DHCP servers are not available for security reasons).
  • Partners need a way to feed in their static network configurations, such as Static IPs, VLANs, and NICs bonding for automated OpenShift deployments in disconnected, on-premises settings (and DHCP servers are not available for security reasons).

Acceptance Criteria

  • Bonds/LACP/Nic Teaming, VLANs and Static IP must work
  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Previous Work (Optional)

  1. https://github.com/openshift/enhancements/blob/master/enhancements/network/baremetal-ipi-network-configuration.md
  2. https://github.com/openshift/assisted-service

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

 

References

We currently support static IPs on Node 0, and this is required in order to get the common IP for the other nodes. We also need to support configuration of static IPs on all of the nodes even though they could also use DHCP for their addresses.

https://github.com/openshift/assisted-service/blob/0e229dea8672ef2e5275563c493a42867ea70985/internal/controller/controllers/infraenv_controller.go#L365

The infraenv controller fetches the NMStateConfigs from the kube-api. Since we don't have the kube-api, we need to read them from the manifests and incorporate them into the InfraEnvCreateParams to create the InfraEnv.

In the MVP, the user must provide at least one static ip configuration for node0. If more are provided, one will be chosen.

Acceptance criteria

Node0 choice is consistent across every installation in the same environment with the same inputs.

User Story:

As an admin, I want to be able to:

  • Provide 1 or more NMState configurations for the nodes

so that I can achieve

  • All the nodes having persistent network interface configuration match the provided NMState Config

 

The agent based installation for Zero Touch provisioning has a Custom Resource Defined to configure the static networking of the nodes that will be provisioned. E.g:

 

 

apiVersion: agent-install.openshift.io/v1beta1
kind: NMStateConfig
metadata:
  name: mgmt-spoke1
  namespace: mgmt-spoke1
  labels:
    cluster-name: mgmt-spoke1
spec:
  config:
    interfaces:
      - name: bond0
        type: bond
        link-aggregation:
          mode: active-backup
          options:
            miimon: "140"
          slaves:
            - eth0
            - eth1
        state: up
        ipv4:
          enabled: true
          address:
            - ip: 192.168.123.151
              prefix-length: 24
          dhcp: false
        ipv6:
          enabled: false
    dns-resolver:
      config:
        server:
          - 192.168.1.1
    routes:
      config:
        - destination: 0.0.0.0/0
          next-hop-address: 192.168.1.1
          next-hop-interface: bond0
          table-id: 254
  interfaces:
    - name: "eth0"
      macAddress: "00:00:00:00:00:00"
    - name: "eth1"
      macAddress: "00:00:00:00:00:11"

NMState team is currently working on a rust library that includes the gc command that assisted service uses to generate all the configs and then load the one that matches the interfaces. We should reach out to Nick Carboni to check on assisted-service progress in integrating the new library and leverage the same code to make sure our ISO can use the same network configuration mechanism

Acceptance Criteria:

Description of criteria:

  • Upstream documentation
  • ZTP network config (NMStateConfig) can be passed to the CLI tool and ends up in the right nodes
  • Test coverage for providing NMStateConfig for all nodes
  • Test coverage for providing NMStateConfig for just the ephemeral provisioning service node.

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Epic Goal

  • Be able to run agent based installation without needing an external node (in disconnected environments, an external image registry must be provided)
  • Be able to deploy the following configuration:
    • SNO
    • Compact cluster (3 masters)
    • Highly available cluster (3 masters and at least 2 worker nodes)

Why is this important?

  • Customers require a way to deploy that does not need external machines after the Installation image is generated
  • Co-location of assisted-service, bootstrap and agent is necessary to be able to deploy SNO and compact clusters

Scenarios

  1. SNO
    1. ISO is booted on the node and after the reboots necessary for the installation, it must become a single node OpenShift
  2. Compact Cluster (3 masters)
    1. ISO is booted on the 3 nodes. node A is chosen to be Bootstrap and assisted service
    2. Node B and C form the target cluster
    3. Node A reboots to join the target cluster
  3. Highly available cluster (3 masters and 2+ workers) - Can run as the compact case

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Dependencies (internal and external)

  1. ISO generation that contains all the components

Previous Work (Optional):

  1. Bootstrap in place for SNO in cloud.redhat.com Assisted Installer

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Currently assisted service chooses one of the nodes that reach out to it to be the bootstrap node. We need to understand the choice mechanism and to make it reliably choose the node that we want node0 to be.

 

The bootstrap node already waits for the other nodes before rebooting, we need to make sure that this wait is sufficient for assisted-service as well. Prevent the assisted-service from rebooting the node it is running on until the following conditions are true:

  1. Installation is complete on all other hosts
  2. The cluster control plane is up and accessible

We can try with having it reboot into bootstrap while making sure that assisted-service runs after reboot but ideally we'd want to have the node start bootstrapping without needing the reboot (As per customer/PM demands to minimize reboots).

In the context of METAL-10 there was a proposal to add a file that the agent would check for, such that the presence of this file would inhibit a reboot. We could possibly use the same mechanism here to avoid the need for large-scale changes to how assisted-service itself works (assisted-service would still need to delete the file at the appropriate time, but that is a less-invasive change). However, there are timeouts that have to be considered, so changes to the state machine may be required.

Note that we do want to continue to install to disk on the assisted-service host in parallel with the others, since this is on the critical path slowing down all deployments. Only the reboot should be delayed.

Single-node deployments are an exception to this.

Ability to perform disconnected first cluster installation in the automated flow

Epic Goal

  • Generate an ISO that uses a disconnected mirror and can be fully deployed without access to quay

Why is this important?

  • A lot of secure environments do not allow connectivity to Red Hat / Quay registries. In order to enable customers with such environments to deploy their first cluster, we need to allow them to install from a mirror

Scenarios

  1. User sets up a mirror containing the release and any operator they wish to deploy after installation. User sets up the input to use the mirror registry, then generates the iso with openshift-install agent create image. Finally, the user boots the systems with the generated ISO and gets a succesful OCP cluster installation that does not connect to internet resources.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

We won't be shipping with the assisted-ui container. At this point it is blocking the disconnected work since we don't have an Openshift container for it in the payload, so its time to remove it.

The Core OS ISO can be extracted from the release payload using a command like:

oc image extract --file=/coreos/coreos-x86_64.iso quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1dc3c2a644f62049ea4a03fddb9305bc2b929405bf979b7f5e720cfadf327b54

Where the SHA points to the machine-os-images container in the release payload (which can be obtained using oc adm release info --image-for=machine-os-images. (Both of these commands require the pull secret for the cluster to be available in your podman config.)

We'll need to use equivalent code (hopefully imported from oc or the same library it uses) to fetch the base ISO using the supplied pull secret in the ZTP manifests and store it as an Asset.

Podman creates a pause container on the hosts for the service pod as follows:

$ sudo podman ps

87a02f9ace39  registry.access.redhat.com/ubi8/pause:latest                                                                                                  58 minutes ago  Up 58 minutes ago  0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp  27f9183bfbd9-infra

 

We should check if this image needs to be mirrored, and figure out if we need to change dev-scripts or add an entry to registries.conf.

When installing in a disconnected environment and the registries.conf and ca-bundle files have been loaded these files should be provided to assisted-service as a mount of the mirror/ dir. Assisted-service will updates its ignition config from these mounted files.

In order to configure the registry for disconnected installs, the following assets should be created:

RegistriesConfig (read from mirror/registries.conf)

CABundleCertificates (read from mirror/ca-bundle.crt)

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

Support user input consisting of just InstallConfig and AgentConfig

Epic Goal

  • Allow users to generate an ephemeral agent based installation ISO from just installConfig and AgentConfig

Why is this important?

  • While Zero Touch Provisioning Input is very amenable to automation, it is a more complex input for the user manually setting up a cluster.
  • InstallConfig is the canonical start point for OpenShift Installer installation
  • Some settings in ZTP are only available in BMH. InstallConfig and AgentConfig will allow customers that do not (or can't) use BMH/BMO to set the same things in their clusters

Scenarios

  1. User writes InstallConfig with the general cluster config, AgentConfig with host specific config, then runs openshift-install agent create cluster-manifests, then openshift-install agent create image. After that, boots the target systems with the ISO and gets a successful first OCP cluster

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

  1. Does openshift-install create cluster-manifests need to run explicitly?

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Given an install-config, convert it to the ZTP manifests that are used to directly populate the Ignition.

This document contains a list of fields and how they match up: https://docs.google.com/document/d/1S4OluK1c-CIma9hmEylPay9ugcqKrD64S7DgiYpufqE/edit

If node0 ip is specified in agentConfig, it takes precedence over the selection from NMStateConfigs, otherwise, we keep the same heuristic as we have now to choose.

Modify the agent-config to accept NMState config for each host.

This could be directly inline, or referenced from a file (either explicitly or by implicitly inferring the filename). This is TBD. We decided to go with `AgentConfig embeds install time node-specific configuration` option https://docs.google.com/document/d/1vCy0LikVPhbGIHF494NHTYsfu85fOiOicR3oB1vlEWI/edit#

Using the NMState data provided, generate the equivalent NMStateConfig manifests in cluster-manifests.

If we make the ZTP manifest assets depend on the install-config asset, the install config will effectively be required (and the installer will launch into the interactive CLI questionnaire if it is not present).

We want to use the install-config if it is present, and just use the ZTP manifests if those are present instead. (Note: this appears to conflict with what AGENT-135 says, so one of these stories might be wrong.)

The installer team has more details and can probably suggest a design.

Epic Goal

As a OpenShift infrastructure owner, I want to deploy OpenShift clusters with dual-stack IPv4/IPv6

As a OpenShift infrastructure owner, I want to deploy OpenShift clusters with single-stack IPv6

Why is this important?

IPv6 and dual-stack clusters are requested often by customers, especially from Telco customers. Working with dual-stack clusters is a requirement for many but also a transition into a single-stack IPv6 clusters, which for some of our users is the final destination.

Acceptance Criteria

  • Agent-based installer can deploy IPv6 clusters
  • Agent-based installer can deploy dual-stack clusters
  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Previous Work

Karim's work proving how agent-based can deploy IPv6: IPv6 deploy with agent based installer]

Done Checklist * CI - CI is running, tests are automated and merged.

  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>|

For dual-stack installations the agent-cluster-install.yaml must have both an IPv4 and IPv6 subnet in the networkking.MachineNetwork or assisted-service will throw an error. This field is in InstallConfig but it must be added to agent-cluster-install in its Generate().

For IPv4 and IPv6 installs, setting up the MachineNetwork is not needed but it also does not cause problems if its set, so it should be fine to set it all times.

Epic Goal

  • As an OpenShift infrastructure owner, I need a way to create my first on-premises cluster.
  • As an OpenShift infrastructure owner using a platform that is not formally supported by Red Hat, I need the ability to install OpenShift that is easier than the fully manual UPI process.

Why is this important?

  • Installing OpenShift has to be as simple as possible with as few requirements as reasonably possible. A bootable, ephemeral image based on the assisted-installer technology developed by the ecosystem team is one way to permit installing OpenShift clusters requiring access only to the hardware dedicated to the new cluster (as opposed to requiring a dedicated provisioning node or even an external service).

Scenarios

  1. The user has only access to the target nodes that will form the cluster and will boot them with the image presented locally via a USB stick. This scenario is common in sites with restricted access such as government infra where only users with security clearance can interact with the installation, where software is allowed to enter in the premises (in a USB, DVD, SD card, etc.) but never allowed to come back out. Users can't enter supporting devices such as laptops or phones.
  2. The user has access to the target nodes remotely to their BMCs (e.g. iDrac, iLo) and can map an image as virtual media from their computer. This scenario is common in data centers where the customer provides network access to the BMCs of the target nodes.
  3. We cannot assume that we will have access to a computer to run an installer or installer helper software.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

  • Take the functionality of the fleeting prototype and integrate it into the openshift/installer repo as described in https://github.com/openshift/enhancements/pull/1067

Open questions:

  1. An image generator has been identified as a possible requirement for this flow. If required, should it be part of the installer image and not an artifact on its own? 
  2. What’s the envisioned workflow during the installation when dedicated node images need to be created?
  3. How should we distribute this new installer solution?
  4. ARM Considerations - TBD

Done Checklist

CI - CI is running, tests are automated and merged.

Release Enablement <link to Feature Enablement Presentation>

DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>

DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>

DEV - Downstream build attached to advisory: <link to errata>

QE - Test plans in Polarion: <link or reference to Polarion>

QE - Automated tests merged: <link or reference to automated tests>

DOC - Downstream documentation merged: <link to meaningful PR>

 

References

Using code from the installer (not code from fleeting), populate the Ignition asset with the data built in to the installer binary.

Currently we use a separate embed.FS (inherited from fleeting) to load the data files to go into the ignition. We should get rid of this and use the same method as the rest of the installer. We should also use the installer's code to e.g. do templating and convert to ignition format and throw away the fleeting code.

As a first step for the assets integration. the create image command will need to fetch the required ztp manifest files from the cluster-manifests folder.

This will allow to:
1) Get the manifest file from the right location
2) seamlessly integrate the create image command with the create cluster-manifests one as the tasks related to assets generation are still in progress

3) Keep the create image command fully working until the assets generation will completed (users will still be able to create/edit manually the assets in the cluster-manifests folder)

Create installer Assets corresponding to each ZTP manifest, and move the code for reading them from disk into the respective assets.

From the initial install-config.yaml + agent-config.yaml, generate all the ZTP manifests file required by the create image command.

 

Dependency: install-config

 

*Note*: we could evaluate to further split this task into distinct manifests assets

Using git-filter-repo, rewrite the commits in fleeting to place files in their correct locations in the installer. The resulting commits can then be merged into the agent branch of the installer with a pull request.

Data files should be moved to e.g. data/data/agent, appending the suffix .template to any that are templated.

Code files that are needed by the installer should be moved to appropriate directories that have the agent team in the OWNERS.

Keep the git-filter-repo script so that development can continue in parallel on fleeting until we are ready to switch CI over to the installer implementation.

Add a subcommand to create the ephemeral ISO.

Create Agent ISO and Agent Ignition assets in the installer, and use them to generate a customized ISO.

This story is just for implementing the mechanics, filling in the ignition will be left to another story.

User Story:

As a (user persona), I want to be able to:

  • Capability 1
  • Capability 2
  • Capability 3

so that I can achieve

  • Outcome 1
  • Outcome 2
  • Outcome 3

Acceptance Criteria:

Description of criteria:

  • Upstream documentation
  • Point 1
  • Point 2
  • Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Currently it's possible to specify the release version to be installed via the ClusterImageSet manifests.

Since we're working from within the openshift installer, the accepted version should be the one hard-coded in the installer binary (or overriden by the env var)

Epic Goal

As an OpenShift infrastructure owner, I want to deploy a cluster zero with RHACM or MCE and have the required components installed when the installation is completed

Why is this important?

BILLI makes it easier to deploy a cluster zero. BILLI users know at installation time what the purpose of their cluster is when they plan the installation. Day-2 steps are necessary to install operators and users, especially when automating installations, want to finish the installation flow when their required components are installed.

Acceptance Criteria

  • A user can provide MCE manifests and have it installed without additional manual steps after the installation is completed
  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story:

As a customer, I want to be able to:

  • Install MCE with the agent-installer

so that I can achieve

  • create an MCE hub with my openshift install

Acceptance Criteria:

Description of criteria:

  • Upstream documentation including examples of the extra manifests needed
  • Unit tests that include MCE extra manifests
  • Ability to install MCE using agent-installer is tested
  • Point 3

(optional) Out of Scope:

We are only allowing the user to provide extra manifests to install MCE at this time. We are not adding an option to "install mce" on the command line (or UI)

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

User Story:

As a customer, I want to be able to:

  • Install MCE with the agent-installer

so that I can achieve

  • create an MCE hub with my openshift install

Acceptance Criteria:

Description of criteria:

  • Upstream documentation including examples of the extra manifests needed
  • Unit tests that include MCE extra manifests
  • Ability to install MCE using agent-installer is tested
  • Point 3

(optional) Out of Scope:

We are only allowing the user to provide extra manifests to install MCE at this time. We are not adding an option to "install mce" on the command line (or UI)

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Epic Goal

  • As an OpenShift infrastructure owner, I need to be able to integrate the installation of my first on-premises OpenShift cluster with my automation flows and tools.
  • As an OpenShift infrastructure owner, I must be able to provide the CLI tool with manifests that contain the definition of the cluster I want to deploy
  • As an OpenShift Infrastructure owner, I must be able to get the validation errors in a programmatic way
  • As an OpenShift Infrastructure owner, I must be able to get the events and progress of the installation in a programmatic way
  • As an OpenShift Infrastructure owner, I must be able to retrieve the kubeconfig and OpenShift Console URL in a programmatic way

Why is this important?

  • When deploying clusters with a large number of hosts and when deploying many clusters, it is common to require to automate the installations.
  • Customers and partners usually use third party tools of their own to orchestrate the installation.
  • For Telco RAN deployments, Telco partners need to repeatably deploy multiple OpenShift clusters in parallel to multiple sites at-scale, with no human intervention.

Scenarios

  1. Monitoring flow:
    1. I generate all the manifests for the cluster,
    2. call the CLI tool pointint to the manifests path,
    3. Obtain the installation image from the nodes
    4. Use my infrastructure capabilities to boot the image on the target nodes
    5. Use the tool to connect to assisted service to get validation status and events
    6. Use the tool to retrieve credentials and URL for the deployed cluster

Acceptance Criteria

  • Backward compatibility between OCP releases with automation manifests (they can be applied to a newer version of OCP).
  • Installation progress and events can be tracked programatically
  • Validation errors can be obtained programatically
  • Kubeconfig and console URL can be obtained programatically
  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

References

Fix the unwanted API call to set API_VIP in case of SNO cluster in start-cluster-installation.service.
 

 

{"code":"400","href":"","id":400,"kind":"Error","reason":"API VIP cannot be set with User Managed Networking"}

 

Create a completely golang implementation of AGENT-37 and place the code in the assisted-service repo. A new binary should be created in the assisted-service image. The binary will be used in the create-cluster-and-infra-env service.

Using podman kube play from a systemd service isn't ideal in terms of process monitoring, and makes it hard to do stuff like attach volumes. Split the containers out into separate containers (which can all be in the same pod still) that are started by their own systemd services. This will mean decomposing the ConfigMap that passes settings.

A cli subcommand that waits for the cluster to come up. This should be able to reuse the code from the regular openshift-install wait-for install-complete command largely unchanged, but if the k8s API is not available it may be because we're still running the assisted part of installation. It probably needs to fall back to checking for that. I'm not sure what assumptions in the existing installer command about when it is safe to run it. Ideally we would keep behaviour relatively consistent.

Currently we allow the assisted-service to generate the InfraEnv ID automatically when the InfraEnv is created. The agents then have to fetch the list of InfraEnvs from the service to get the ID. This is suboptimal in a number of ways and won't be possible at all once we have authentication enabled on the assisted-service API.

Instead, modify assisted-service to accept an environment variable that contains a fixed InfraEnv ID. Any new InfraEnv created will use this ID (this has the desirable side effect that there can be only one InfraEnv).

Pre-generate a random ID in the command-line tool and store it in the configuration of both the agent and the assisted-service in the ISO.

A cli subcommand that:

  • Checks and displays (stderr) the progress of the installation.
  • Show when the bootstrap node reboots.
  • Return 0 if we reach this point.

User Story:

As a deployer, I want to be able to:

  • Get the credentials for the cluster that is going to be deployed

so that I can achieve

  • Checking the installed cluster for installation completion
  • Connect and administer the cluster that gets installed

 

Currently the Assisted Service generates the credentials by running the ignition generation step of the oepnshift-installer. This is why the credentials are only retrievable from the REST API towards the end of the installation.

In the BILLI usage, which takes down assisted service before the installation is complete there is no obvious point at which to alert the user that they should retrieve the credentials. This means that we either need to:

  • Allow the user to pass the admin key that will then get signed by the generated CA and replace the key that is made by openshift-installer (would mean new functionality in AI)
  • Allow the key to be retrieved by SSH with the fleeting command from the node0 (after it has generated). The command should be able to wait until it is possible
  • Have the possibility to POST it somewhere

Acceptance Criteria:

  • The admin key is generated and usable to check for installation completeness

This requires/does not require a design proposal.
This requires/does not require a feature gate.

The service start-cluster-installation fails for conditionpathexists even though the path is created.

[core@master-0 ~]$ sudo systemctl status start-cluster-installation.service 
● start-cluster-installation.service - Service that starts cluster installation
Loaded: loaded (/etc/systemd/system/start-cluster-installation.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Condition: start condition failed at Wed 2022-05-11 04:40:43 UTC; 32s ago
└─ ConditionPathExists=/etc/assisted-service/node0 was not met

Also, when the ConditionPath error is fixed, later the service fails with

start-cluster-installation.sh[2533]: jq: error (at <stdin>:0): Cannot index number with string "status"

 

Set the ClusterDeployment CRD to deploy OpenShift in FIPS mode and make sure that after deployment the cluster is set in that mode

In order to install FIPS compliant clusters, we need to make sure that installconfig + agentoconfig based deployments take into account the FIPS config in installconfig.

This task is about passing the config to agentclusterinstall so it makes it into the iso. Once there, AGENT-374 will give it to assisted service

Epic Goal

  • As an OpenShift deployer, I want to be able to generate the installation image and boot it on the target machines without needing to pre-populate any node network configuration

Why is this important?

  • Providing the detailed network configuration needed for nmstate is a significant barrier of entry to deploy OpenShift as NMStateConfig, while accessible, doesn't exactly roll off the tongue

Scenarios

  1. I want to boot the baremetal node that will run the assisted service and the nodes that will be worker nodes all at once without needing to care about their IPs/VLANs, etc
  2. I want to make an "AMI" of the tool generated ISO to create my openshift clusters in my no-name cloud and I don't know which IP I am going to get (This scenario will need other work in other epics)

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • Deployment completes successfully without providing NMStateConfig for any node.

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

  1. If we don't know what IP the assisted service is going to get, how do the agents know where to register to? Antoni Segura Puimedon node0 agent-config must be provided
  2. If all the ISOs are the same and there's no prior knowledge of the IP configuration for the nodes, how do we decide which is going to run the assisted service? Antoni Segura Puimedon the node that finds itself to match node0 config will set itself to be node0.

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Acceptance criteria:

  • cluster-manifests validation passes if node0 config is provided in agent-config.yaml and is consistent with other net config like the machine CIDRs.

Epic Goal

As an OpenShift infrastructure owner, I need to add host-specific configurations at install time, so that they are applied when the cluster installation is completed.

Why is this important?

Specially, but not restricted to on-prem deployments, hosts need specific configurations (beyond the individual host network configuration). Customers automating installs want to avoid day-2 configurations and node reboots, so applying configurations during the installation is a requirement for them. Examples of this are multipath and SCTP on bare metal nodes, where it's not always straightforward to do it on day-2 and reboots are required.

Acceptance Criteria

  • An interface exists to pass host-specific configurations and it's documented
  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

There is no harm in supplying the “rd.multipath=default” argument on any host. The effect of this argument is to generate a default /etc/multipath.conf file and to enable the multipathd service. The assisted-service now adds these to its discovery ISOs, and we will do the same with the agent ISO.

  • Have service that waits for hosts to show up and use the REST API to set the Installation disk from the ID in the inventory that is available in the REST API. (we can reuse the logic in assisted service that matches root device hints to inventory)
  • Needs to run before the service that triggers installation

Necessary for SCTP

Manifests are placed in <install-config-dir>/openshift and copied to the ISO. (Previously we assumed this would be <install-config-dir>/manifests, but Andrea suggested that openshift would be more consistent.)

A client in the ISO submits the manifests through assisted-service API.

REST

Get the ZTP extra manifests into the image and use the REST API below:

    /v2/clusters/{cluster_id}/manifests

Epic Goal

  • Rebase cluster autoscaler on top of Kubernetes 1.25

Why is this important?

  • Need to pick up latest upstream changes

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story

As a user I would like to see all the events that the autoscaler creates, even duplicates. Having the CAO set this flag will allow me to continue to see these events.

Background

We have carried a patch for the autoscaler that would enable the duplication of events. This patch can now be dropped because the upstream added a flag for this behavior in https://github.com/kubernetes/autoscaler/pull/4921

Steps

  • add the --record-duplicated-events flag to all autoscaler deployments from the CAO

Stakeholders

  • openshift eng

Definition of Done

  • autoscaler continues to work as expected and produces events for everything
  • Docs
  • this does not require documentation as it preserves existing behavior and provides no interface for user interaction
  • Testing
  • current tests should continue to pass

Feature Overview

Add GA support for deploying OpenShift to IBM Public Cloud

Goals

Complete the existing gaps to make OpenShift on IBM Cloud VPC (Next Gen2) General Available

Requirements

Optional requirements

  • OpenShift can be deployed using Mint mode and STS for cloud provider credentials (future release, tbd)
  • OpenShift can be deployed in disconnected mode https://issues.redhat.com/browse/SPLAT-737)
  • OpenShift on IBM Cloud supports User Provisioned Infrastructure (UPI) deployment method (future release, 4.14?)

Epic Goal

  • Enable installation of private clusters on IBM Cloud. This epic will track associated work.

Why is this important?

  • This is required MVP functionality to achieve GA.

Scenarios

  1. Install a private cluster on IBM Cloud.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

< High-Level description of the feature ie: Executive Summary >

Goals

Cluster administrators need an in-product experience to discover and install new Red Hat offerings that can add high value to developer workflows.

Requirements

Requirements Notes IS MVP
Discover new offerings in Home Dashboard   Y
Access details outlining value of offerings   Y
Access step-by-step guide to install offering   N
Allow developers to easily find and use newly installed offerings   Y
Support air-gapped clusters   Y
    • (Optional) Use Cases

< What are we making, for who, and why/what problem are we solving?>

Out of scope

Discovering solutions that are not available for installation on cluster

Dependencies

No known dependencies

Background, and strategic fit

 

Assumptions

None

 

Customer Considerations

 

Documentation Considerations

Quick Starts 

What does success look like?

 

QE Contact

 

Impact

 

Related Architecture/Technical Documents

 

Done Checklist

  • Acceptance criteria are met
  • Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
  • User Journey automation is delivered
  • Support and SRE teams are provided with enough skills to support the feature in production environment

Problem:

Developers using Dev Console need to be made aware of the RH developer tooling available to them.

Goal:

Provide awareness to developers using Dev Console of the RH developer tooling that is available to them, including:

Consider enhancing the +Add page and/or the Guided tour

Provide a Quick Start for installing the Cryostat Operator

Why is it important?

To increase usage of our RH portfolio

Acceptance criteria:

  1. Quick Start - Installing Cryostat Operator
  2.  Quick Start - Get started with JBoss EAP using a Helm Chart
  3. Discoverability of the IDE extensions from Create Serverless form
  4. Update Terminal step of the Guided Tour to indicate that odo CLI is accessible (link to https://developers.redhat.com/products/odo/overview)

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

In testing dual stack on vsphere we discovered that kubelet will not allow us to specify two ips on any platform except baremetal. We have a couple of options to deal with that:

  • Wait for https://github.com/kubernetes/enhancements/pull/3706 to merge and be implemented upstream. This almost certainly means we miss 4.13.
  • Wait for https://github.com/kubernetes/enhancements/pull/3706 to merge and then implement the design downstream. This involves risk of divergence from the eventual upstream design. We would probably only ship this way as tech preview and provide support exceptions for major customers.
  • Remove the setting of nodeip for kubelet. This should get around the limitation on providing dual IPs, but it means we're reliant on the default kubelet IP selection logic, which is...not good. We'd probably only be able to support this on single nic network configurations.
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

User Story

As a developer
I want OpenShift builds to support cgroups v2
So that I can run OpenShift builds on clusters that have cgroups v2 enabled

Acceptance Criteria

  • Builds work if the underlying cluster is running with cgroups v2 enabled

Docs Impact

None - this is an implementation detail which should not impact end-users directly.

Notes

Originally filed in https://bugzilla.redhat.com/show_bug.cgi?id=1949438

Feature Overview (aka. Goal Summary)  

Add support for custom security groups to be attached to control plane and compute nodes at installation time.

Goals (aka. expected user outcomes)

Allow the user to provide existing security groups to be attached to the control plane and compute node instances at installation time.

Requirements (aka. Acceptance Criteria):

The user will be able to provide a list of existing security groups to the install config manifest that will be used as additional custom security groups to be attached to the control plane and compute node instances at installation time.

Out of Scope

The installer won't be responsible of creating any custom security groups, these must be created by the user before the installation starts.

Background

We do have users/customers with specific requirements on adding additional network rules to every instance created in AWS. For OpenShift these additional rules need to be added on day-2 manually as the Installer doesn't provide the ability to add custom security groups to be attached to any instance at install time.

MachineSets already support adding a list of existing custom security groups, so this could be automated already at install time manually editing each MachineSet manifest before starting the installation, but even for these cases the Installer doesn't allow the user to provide this information to add the list of these security groups to the MachineSet manifests.

Documentation Considerations

Documentation will be required to explain how this information needs to be provided to the install config manifest as any other supported field.

Epic Goal

  • Allow the user to provide existing security groups to be attached to the control plane and compute node instances at installation time.

Why is this important?

  • We do have users/customers with specific requirements on adding additional network rules to every instance created in AWS. For OpenShift these additional rules need to be added on day-2 manually as the Installer doesn't provide the ability to add custom security groups to be attached to any instance at install time.

    MachineSets already support adding a list of existing custom security groups, so this could be automated already at install time manually editing each MachineSet manifest before starting the installation, but even for these cases the Installer doesn't allow the user to provide this information to add the list of these security groups to the MachineSet manifests.

Scenarios

  1. The user will be able to provide a list of existing security groups to the install config that will be used as additional custom security groups to be attached to the control plane and compute node instances at installation time.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Previous Work (Optional):

  1. Compute Nodes managed by MAPI already support this feature

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story:

As a (user persona), I want to be able to:

  • Add custom security groups for compute nodes
  • Add custom security groups for control plane nodes

so that I can achieve

  • Control Plane and Compute nodes can support operational specific security rules. For instance: specific traffic may be required for compute vs control plane nodes.

Acceptance Criteria:

Description of criteria:

  • The control plane and compute machine sections of the install config accept user input as additionalSecurityGroupIDs (when using the aws platform).

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

  •  
    additionalSecurityGroupIDs:
      description: AdditionalSecurityGroupIDs contains IDs of
        additional security groups for machines, where each ID
        is presented in the format sg-xxxx.
      items:
        type: string
      type: array 

 

This requires/does not require a design proposal.

Feature

As an Infrastructure Administrator, I want to deploy OpenShift on vSphere with supervisor (aka Masters) and worker nodes (from a MachineSet) across multiple vSphere data centers and multiple vSphere clusters using full stack automation (IPI) and user provided infrastructure (UPI).

 

MVP

Install OpenShift on vSphere using IPI / UPI in multiple vSphere data centers (regions) and multiple vSphere clusters in 1 vCenter, all in the same IPv4 subnet (in the same physical location).

  • Kubernetes Region contains vSphere datacenter and (single) vCenter name
  • Kubernetes Zone contains vSphere cluster, resource pool, datastore, network (port group)

Out of scope

  • There are no support the conversion of a non-zonal configuration (i.e. an existing OpenShift installation without 1+ zones) to a zonal configuration (1+ zones), but zonal UPI installation by the Infrastructure Administrator is permitted.

Scenarios for consideration:

  • OpenShift in vSphere across different zones to avoid single points of failure, whereby each node is in different ESX clusters within the same vSphere datacenter, but in different networks.
  • OpenShift in vSphere across multiple vSphere datacenter, while ensuring workers and masters are spread across 2 different datacenter in different subnets. (RFE-845, RFE-459).

Acceptance criteria:

  • Ensure vSphere IPI can successfully be deployed with ODF across the 3 zones (vSphere clusters) within the same vCenter [like we do with AWS, GCP & Azure].
  • Ensure zonal configuration in vSphere using UPI is documented and tested.

References: 

As an openshift engineer make changes to various openshift components so that vSphere zonal installation is considered GA.

As a openshift engineer I need to follow the process to move the api from tech preview to ga so it can be used by clusters not installed with TechPreviewNoUpgrade.

more to follow...

As a openshift engineer depreciate existing vSphere platform spec parameters so that they can eventually be removed in favor of zonal.

Feature Overview

Create a GCP cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in GCP) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

 
Goals

  • Functionality on GCP Tech Preview
  • inclusion in the cluster backups
  • flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

This epic covers the work to apply user defined labels GCP resources created for openshift cluster available as tech preview.

The user should be able to define GCP labels to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags/labels in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for labeling when the resources are created.

Updating/deleting of labels added during cluster creation or adding new labels as Day-2 operation is out of scope of this epic.

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

Reference - https://issues.redhat.com/browse/RFE-2017

Installer creates below list of gcp resources during create cluster phase and these resources should be applied with the user defined labels and the default OCP label kubernetes-io-cluster-<cluster_id>:owned

Resources List

Resource Terraform API
VM Instance google_compute_instance
Image google_compute_image
Address google_compute_address(beta)
ForwardingRule google_compute_forwarding_rule(beta)
Zones google_dns_managed_zone
Storage Bucket google_storage_bucket

Acceptance Criteria:

  • Code linting, validation and best practices adhered to
  • List of gcp resources created by installer should have user defined labels and as well as the default OCP label.

Installer generates Infrastructure CR in manifests creation step of cluster creation process based on the user provided input recorded in install-config.yaml. While generating Infrastructure CR platformStatus.gcp.resourceLabels should be updated with the user provided labels(installconfig.platform.gcp.userLabels).

Acceptance Criteria

  • Code linting, validation and best practices adhered to
  • Infrastructure CR created by installer should have gcp user defined labels if any, in status field.

Enhancement proposed for GCP labels and tags support in OCP requires making use of latest APIs made available in terraform provider for google and requires an update to use the same.

Acceptance Criteria

  • Code linting, validation and best practices adhered to.

Enhancement proposed for Azure tags support in OCP, requires install-config CRD to be updated to include gcp userLabels for user to configure, which will be referred by the installer to apply the list of labels on each resource created by it and as well make it available in the Infrastructure CR created.

Below is the snippet of the change required in the CRD

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata: 
  name: installconfigs.install.openshift.io
spec: 
  versions: 
  - name: v1
    schema: 
      openAPIV3Schema: 
        properties: 
          platform: 
            properties: 
              gcp: 
                properties: 
                  userLabels: 
                    additionalProperties: 
                      type: string
                    description: UserLabels additional keys and values that the installer
                      will add as labels to all resources that it creates. Resources
                      created by the cluster itself may not include these labels.
                  type: object

This change is required for testing the changes of the feature, and should ideally get merged first.

Acceptance Criteria

  • Code linting, validation and best practices adhered to
  • User should be able to configure gcp user defined labels in the install-config.yaml
  • Fields descriptions

Feature Overview

Customers are asking for improvements to the upgrade experience (both over-the-air and disconnected). This is a feature tracking epics required to get that work done.  

Goals

  1. Have an option to do upgrades in more discrete steps under admin control. Specifically, these steps are: 
    • Control plane upgrade
    • Worker nodes upgrade
    • Workload enabling upgrade (i..e. Router, other components) or infra nodes
  2. Better visibility into any errors during the upgrades and documentation of what they error means and how to recover. 
  3. An user experience around an end-2-end back-up and restore after a failed upgrade 
  4. OTA-810  - Better Documentation: 
    • Backup procedures before upgrades. 
    • More control over worker upgrades (with tagged pools between user Vs admin)
    • The kinds of pre-upgrade tests that are run, the errors that are flagged and what they mean and how to address them. 
    • Better explanation of each discrete step in upgrades, and what each CVO Operator is doing and potential errors, troubleshooting and mitigating actions.

References

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Revamp our Upgrade Documentation to include an appropriate level of detail for admins

Why is this important?

  • Currently Admins have nothing which explains to them how upgrades actually work and as a result when things don't go perfectly they panic
  • We do not sufficiently, or at least within context of Upgrade Docs, explain the differences between Degraded and Available statuses
  • We do not explain order of operations
  • We do not explain protections built into the platform which protect against total cluster failure, ie halting when components do not return to healthy state within exp

Scenarios

  1. Move out channel management to its own chapter
  2. Explain or link to existing documentation which addresses the differences between Degraded=True and Available=False
  3. Explain Upgradeable=False conditions and other aspects of upgrade preflight strategy that Operators should be indicating when its unsafe to upgrade
  4. Explain basics of how the upgrade is applied
    1. CVO fetches release image
    2. CVO updates operators in the following order
    3. Each operator is expected to monitor for success
    4. Provide example ordering of manifests and command to extract release specific manifests and infer the ordering
  5. Explain how operators indicate problems and generic processes for investigating them
  6. Explain the special role of MCO and MCP mechanisms such as pausing pools
  7. Provide some basic guidance for Control Plane duration, that is exclude worker pool rollout duration (90-120 minutes is normal)

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. There was an effort to write up how to use MachineConfig Pools to partition and optimize worker rollout in https://issues.redhat.com/browse/OTA-375

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

The CVO README is currently aimed at CVO devs. But there are way more CVO consumers than there are CVO devs. We should aim the README at "what does the CVO do for my clusters?", and push the dev docs down under docs/dev/.

Feature Overview

Goals

  • Support OpenShift to be deployed from day-0 on AWS Local Zones
  • Support an existing OpenShift cluster to deploy compute Nodes on AWS Local Zones (day-2)

AWS Local Zones support - feature delivered in phases:

  • Phase 0 (OCPPLAN-9630): Document how to create compute nodes on AWS Local Zones in day-0 (SPLAT-635)
  • Phase 1 ( OCPBU-2): Create edge compute pool to generate MachineSets for node with NoSchedule taints when installing a cluster in existing VPC with AWS Local Zone subnets (SPLAT-636)
  • Phase 2 (OCPBU-351): Installer automates network resources creation on Local Zone based on the edge compute pool (SPLAT-657)

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

<!--

Please make sure to fill all story details here with enough information so
that it can be properly sized and is immediately actionable. Our Definition
of Ready for user stories is detailed in the link below:

https://docs.google.com/document/d/1Ps9hWl6ymuLOAhX_-usLmZIP4pQ8PWO15tMksh0Lb_A/

As much as possible, make sure this story represents a small chunk of work
that could be delivered within a sprint. If not, consider the possibility
of splitting it or turning it into an epic with smaller related stories.

Before submitting it, please make sure to remove all comments like this one.

-->

{}USER STORY:{}

<!--

One sentence describing this story from an end-user perspective.

-->

As a [type of user], I want [an action] so that [a benefit/a value].

{}DESCRIPTION:{}

<!--

Provide as many details as possible, so that any team member can pick it up
and start to work on it immediately without having to reach out to you.

-->

{}Required:{}

...

{}Nice to have:{}

...

{}ACCEPTANCE CRITERIA:{}

<!--

Describe the goals that need to be achieved so that this story can be
considered complete. Note this will also help QE to write their acceptance
tests.

-->

{}ENGINEERING DETAILS:{}

<!--

Any additional information that might be useful for engineers: related
repositories or pull requests, related email threads, GitHub issues or
other online discussions, how to set up any required accounts and/or
environments if applicable, and so on.

-->

Goal

Productize agent-installer-utils container from https://github.com/openshift/agent-installer-utils

Feature Description

In order to ship the network reconfiguration it would be useful to move the agent-tui to its own image instead of sharing the agent-installer-node-agent one.

Goal

Productize agent-installer-utils container from https://github.com/openshift/agent-installer-utils

Feature Description

In order to ship the network reconfiguration it would be useful to move the agent-tui to its own image instead of sharing the agent-installer-node-agent one.

Currently the `agent create image` command takes care to extract the agent-tui binary (and required libs) from the `assisted-installer-agent` image (shipped in the release as `agent-installer-node-agent`).
Once the agent-tui will be available instead from the `agent-installer-utils` image, it would be necessary to update accordingly the installer code (see https://github.com/openshift/installer/blob/56e85bee78490c18aaf33994e073cbc16181f66d/pkg/asset/agent/image/agentimage.go#L81)

Feature Overview

Allow users to interactively adjust the network configuration for a host after booting the agent ISO.

Goals

Configure network after host boots

The user has Static IPs, VLANs, and/or bonds to configure, but has no idea of the device names of the NICs. They don't enter any network config in agent-config.yaml. Instead they configure each host's network via the text console after it boots into the image.

Epic Goal

  • Allow users to interactively adjust the network configuration for a host after booting the agent ISO, before starting processes that pull container images.

Why is this important?

  • Configuring the network prior to booting a host is difficult and error-prone. Not only is the nmstate syntax fairly arcane, but the advent of 'predictable' interface names means that interfaces retain the same name across reboots but it is nearly impossible to predict what they will be. Applying configuration to the correct hosts requires correct knowledge and input of MAC addresses. All of these present opportunities for things to go wrong, and when they do the user is forced to return to the beginning of the process and generate a new ISO, then boot all of the hosts in the cluster with it again.

Scenarios

  1. The user has Static IPs, VLANs, and/or bonds to configure, but has no idea of the device names of the NICs. They don't enter any network config in agent-config.yaml. Instead they configure each host's network via the text console after it boots into the image.
  2. The user has Static IPs, VLANs, and/or bonds to configure, but makes an error entering the configuration in agent-config.yaml so that (at least) one host will not be able to pull container images from the release payload. They correct the configuration for that host via the text console before proceeding with the installation.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

As a user, I need information about common misconfigurations that may be preventing the automated installation from proceeding.

If we are unable to access the release image from the registry, provide sufficient debugging information to the user to pinpoint the problem. Check for:

  • DNS
  • ping
  • HTTP
  • Registry login
  • Release image

When the UI is active in the console events messages that are generated will distort the interface and make it difficult for the user to view the configuration and select options. An example is shown in the attached screenshot.

The node zero ip is currently hard-coded inside set-node-zero.sh.template and in the ServiceBaseURL template string.

ServiceBaseURL is also hard-coded inside:

  • apply-host-config.service.template
  • create-cluster-and-infraenv-service.template
  • common.sh.template
  • start-agent.sh.template
  • start-cluster-installation.sh.template
  • assisted-service.env.template

We need to remove this hard-coding and to allow a user to be able to set the node zero ip through the tui and have it be reflected by the agent services and scripts.

In the console service from AGENT-453, check whether we are able to pull the release image, and display this information to the user before prompting to run nmtui.

If we can access the image, then exit the service if there is no user input after some timeout, to allow the installation to proceed in the automation flow.

Enhance the openshift-install agent create image command so that the agent-nmtui executable will be embedded in the agent ISO

After having created the agent ISO, the agent-nmtui must be added to the ISO using the following approach:
1. Unpack the agent ISO in a temporary folder
2. Unpack the /images/ignition.img compressed cpio archive in a temporary folder
3. Create a new ignition.img compressed cpio archive by appending the agent-nmtui
2. Create a new agent ISO with the updated ignition.img

Implementation note
Portions of code from a PoC located at https://github.com/andfasano/gasoline could be re-used

When running the openshift-install agent create image command, first of all it needs to extract the agent-tui executable from the release payload in a temporary folder

The openshift-install agent create image will need to fetch the agent-tui executable so that it could be embedded within the agent ISO. For this reason the agent-tui must be available in the release payload, so that it could be retrieved even when the command is invoked in a disconnected environment.

Create a systemd service that runs at startup prior to the login prompt and takes over the console. This should start after the network-online target, and block the login prompt appearing until it exits.

This should also block, at least temporarily, any services that require pulling an image from the registry (i.e. agent + assisted-service).

BU Priority Overview

As our customers create more and more clusters, it will become vital for us to help them support their fleet of clusters. Currently, our users have to use a different interface(ACM UI) in order to manage their fleet of clusters. Our goal is to provide our users with a single interface for managing a fleet of clusters to deep diving into a single cluster.  This means going to a single URL – your Hub – to interact with your OCP fleet.

Goals

The goal of this tech preview update is to improve the experience from the last round of tech preview. The following items will be improved:

  1. Improved Cluster Picker: Moved to Masthead for better usability, filter/search
  2. Support for Metrics: Metrics are now visualized from Spoke Clusters
  3. Avoid UI Mismatch: Dynamic Plugins from Spoke Clusters are disabled 
  4. Console URLs Enhanced: Cluster Name Add to URL for Quick Links
  5. Security Improvements: Backend Proxy and Auth updates

Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster. 
Why customers want this?

  1. Single interface to accomplish their tasks
  2. Consistent UX and patterns
  3. Easily accessible: One URL, one set of credentials

Why we want this?

  • Shared code -  improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
  • Pre-built PF4 components
  • Accessibility & i18n
  • Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console 

  1. Enable user to quickly change context from fleet view to single cluster view
    1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
    2. Shared SSO across the fleet
    3. Hub OCP Console can connect to remote clusters API
    4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
  2. Share UX between views
    1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
    2. Add Cluster List to OCP —> Create Cluster

Description of problem:

There is a possible race condition in the console operator where the managed cluster config gets updated after the console deployment and doesn't trigger a rollout. 

Version-Release number of selected component (if applicable):

4.10

How reproducible:

Rarely

Steps to Reproduce:

1. Enable multicluster tech preview by adding TechPreviewNoUpgrade featureSet to FeatureGate config. (NOTE THIS ACTION IS IRREVERSIBLE AND WILL MAKE THE CLUSTER UNUPGRADEABLE AND UNSUPPORTED) 
2. Install ACM 2.5+
3. Import a managed cluster using either the ACM console or the CLI
4. Once that managed cluster is showing in the cluster dropdown, import a second managed cluster 

Actual results:

Sometimes the second managed cluster will never show up in the cluster dropdown

Expected results:

The second managed cluster eventually shows up in the cluster dropdown after a page refresh

Additional info:

Migrated from bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2055415

In order for hub cluster console OLM screens to behave as expected in a multicluster environment, we need to gather "copiedCSVsDisabled" flags from managed clusters so that the console backend/frontend can consume this information.

AC:

  • The console operator syncs "copiedCSVsDisabled" flags from managed clusters into the hub cluster managed cluster config.

Feature Overview

Allow to configure compute and control plane nodes on across multiple subnets for on-premise IPI deployments. With separating nodes in subnets, also allow using an external load balancer, instead of the built-in (keepalived/haproxy) that the IPI workflow installs, so that the customer can configure their own load balancer with the ingress and API VIPs pointing to nodes in the separate subnets.

Goals

I want to install OpenShift with IPI on an on-premise platform (high priority for bare metal and vSphere) and I need to distribute my control plane and compute nodes across multiple subnets.

I want to use IPI automation but I will configure an external load balancer for the API and Ingress VIPs, instead of using the built-in keepalived/haproxy-based load balancer that come with the on-prem platforms.

Background, and strategic fit

Customers require using multiple logical availability zones to define their architecture and topology for their datacenter. OpenShift clusters are expected to fit in this architecture for the high availability and disaster recovery plans of their datacenters.

Customers want the benefits of IPI and automated installations (and avoid UPI) and at the same time when they expect high traffic in their workloads they will design their clusters with external load balancers that will have the VIPs of the OpenShift clusters.

Load balancers can distribute incoming traffic across multiple subnets, which is something our built-in load balancers aren't able to do and which represents a big limitation for the topologies customers are designing.

While this is possible with IPI AWS, this isn't available with on-premise platforms installed with IPI (for the control plane nodes specifically), and customers see this as a gap in OpenShift for on-premise platforms.

Functionalities per Epic

 

Epic Control Plane with Multiple Subnets  Compute with Multiple Subnets Doesn't need external LB Built-in LB
NE-1069 (all-platforms)
NE-905 (all-platforms)
NE-1086 (vSphere)
NE-1087 (Bare Metal)
OSASINFRA-2999 (OSP)  
SPLAT-860 (vSphere)
NE-905 (all platforms)
OPNET-133 (vSphere/Bare Metal for AI/ZTP)
OSASINFRA-2087 (OSP)
KNIDEPLOY-4421 (Bare Metal workaround)
SPLAT-409 (vSphere)

Previous Work

Workers on separate subnets with IPI documentation

We can already deploy compute nodes on separate subnets by preventing the built-in LBs from running on the compute nodes. This is documented for bare metal only for the Remote Worker Nodes use case: https://docs.openshift.com/container-platform/4.11/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.html#configure-network-components-to-run-on-the-control-plane_ipi-install-installation-workflow

This procedure works on vSphere too, albeit no QE CI and not documented.

External load balancer with IPI documentation

  1. Bare Metal: https://docs.openshift.com/container-platform/4.11/installing/installing_bare_metal_ipi/ipi-install-post-installation-configuration.html#nw-osp-configuring-external-load-balancer_ipi-install-post-installation-configuration
  2. vSphere: https://docs.openshift.com/container-platform/4.11/installing/installing_vsphere/installing-vsphere-installer-provisioned.html#nw-osp-configuring-external-load-balancer_installing-vsphere-installer-provisioned

Scenarios

  1. vSphere: I can define 3 or more networks in vSphere and distribute my masters and workers across them. I can configure an external load balancer for the VIPs.
  2. Bare metal: I can configure the IPI installer and the agent-based installer to place my control plane nodes and compute nodes on 3 or more subnets at installation time. I can configure an external load balancer for the VIPs.

Acceptance Criteria

  • Can place compute nodes on multiple subnets with IPI installations
  • Can place control plane nodes on multiple subnets with IPI installations
  • Can configure external load balancers for clusters deployed with IPI with control plane and compute nodes on multiple subnets
  • Can configure VIPs to in external load balancer routed to nodes on separate subnets and VLANs
  • Documentation exists for all the above cases

 

Epic Goal

As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers

Why is this important?

Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy. 

Scenarios

  1. A large deployment routed across multiple failure domains without stretched L2 networks, would require to dynamically route the control plane VIP traffic through load-balancers capable of living in multiple L2.
  2. Customers who want to use their existing LB appliances for the control plane.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • QE - must be testing a scenario where we disable the internal LB and setup an external LB and OCP deployment is running fine.
  • Documentation - we need to document all the gotchas regarding this type of deployment, even the specifics about the load-balancer itself (routing policy, dynamic routing, etc)
  • For Tech Preview, we won't require Fixed IPs. This is something targeted for 4.14.

Dependencies (internal and external)

  1. For GA, we'll need Fixed IPs, already WIP by vsphere: https://issues.redhat.com/browse/OCPBU-179

Previous Work:

vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Epic Goal

As an OpenShift installation admin I want to use the Assisted Installer, ZTP and IPI installation workflows to deploy a cluster that has remote worker nodes in subnets different from the local subnet, while my VIPs with the built-in load balancing services (haproxy/keepalived).

While this request is most common with OpenShift on bare metal, any platform using the ingress operator will benefit from this enhancement.

Customers using platform none run external load balancers and they won't need this, this is specific for platforms deployed via AI, ZTP and IPI.

Why is this important?

Customers and partners want to install remote worker nodes on day1. Due to the built-in network services we provide with Assisted Installer, ZTP and IPI that manage the VIP for ingress, we need to ensure that they remain in the local subnet where the VIPs are configured.

Previous Work

The bare metal IPI tam added a workflow that allows to place the VIPs in the masters. While this isn't an ideal solution, this is the only option documented:

Configuring network components to run on the control plane

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Goal:
As a cluster administrator, I want OpenShift to include a recent HAProxy version, so that I have the latest available performance and security fixes.  

 Description:
We should strive to follow upstream HAProxy releases by bumping the HAProxy version that we ship in OpenShift with every 4.y release, so that OpenShift benefits from upstream performance and security fixes, and so that we avoid large version-number jumps when an urgent fix necessitates bumping to the latest HAProxy release.  This bump should happen as early as possible in the OpenShift release cycle, so as to maximize soak time.   

For OpenShift 4.13, this means bumping to 2.6.  

As a cluster administrator, 

I want OpenShift to include a recent HAProxy version, 

so that I have the latest available performance and security fixes.  

 

We should strive to follow upstream HAProxy releases by bumping the HAProxy version that we ship in OpenShift with every 4.y release, so that OpenShift benefits from upstream performance and security fixes, and so that we avoid large version-number jumps when an urgent fix necessitates bumping to the latest HAProxy release.  This bump should happen as early as possible in the OpenShift release cycle, so as to maximize soak time.   

For OpenShift 4.14, this means bumping to 2.6.  

Bump the HAProxy version in dist-git so that OCP 4.13 ships HAProxy 2.6.13, with this patch added on top: https://git.haproxy.org/?p=haproxy-2.6.git;a=commit;h=2b0aafdc92f691bc4b987300c9001a7cc3fb8d08. The patch fixes the segfault that was being tracked as OCPBUGS-13232.

This patch is in HAProxy 2.6.14, so we can stop carrying the patch once we bump to HAProxy 2.6.14 or newer in a subsequent OCP release.

Epic Goal

  • Update all images that we ship with OpenShift to the latest upstream releases and libraries.
  • Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.

Why is this important?

  • We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

  • Kubernetes:
    • client-go
    • controller-runtime
  • OCP:
    • library-go
    • openshift/api
    • openshift/client-go
    • operator-sdk

Operators:

  • aws-ebs-csi-driver-operator 
  • aws-efs-csi-driver-operator
  • azure-disk-csi-driver-operator
  • azure-file-csi-driver-operator
  • cinder-csi-driver-operator
  • gcp-pd-csi-driver-operator
  • gcp-filestore-csi-driver-operator
  • manila-csi-driver-operator
  • ovirt-csi-driver-operator
  • vmware-vsphere-csi-driver-operator
  • alibaba-disk-csi-driver-operator
  • ibm-vpc-block-csi-driver-operator
  • csi-driver-shared-resource-operator

 

  • cluster-storage-operator
  • csi-snapshot-controller-operator
  • local-storage-operator
  • vsphere-problem-detector

 

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

  • external-attacher
  • external-provisioner
  • external-resizer
  • external-snapshotter
  • node-driver-registrar
  • livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in  go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.

Goal

Allow to point to an existing OVA image stored in vSphere from the OpenShift installer, replacing the current method that uploads the OVA template every time an OpenShift cluster is installed.

Why is this important?

This is an improvement that makes the installation more efficient by not having to upload an OVA from where openshift-install is running every time a cluster is installed, saving time and bandwidth use. For example if an administrating is installing from a VPN then the OVA is upload through it to the target cluster every time an OpenShift cluster is installed. This makes the administration process more efficient by having a OVA centralised ready to use to install new clusters without uploading it from where the installer is run.

Epic Goal

  • To allow the use of a pre-existing RHCOS virtual machine or template via the IPI installer.

Why is this important?

  • It is a very common workflow in vSphere to upload a OVA. In the disconnected scenario the requirement of using a local web server, copying an ova to that webserver and then running the installer is a poor experience.

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Feature Goal

  • Enable platform=external to support onboarding new partners, e.g. Oracle Cloud Infrastructure and VCSP partners.
  • Create a new platform type, working name "External", that will signify when a cluster is deployed on a partner infrastructure where core cluster components have been replaced by the partner. “External” is different from our current platform types in that it will signal that the infrastructure is specifically not “None” or any of the known providers (eg AWS, GCP, etc). This will allow infrastructure partners to clearly designate when their OpenShift deployments contain components that replace the core Red Hat components.

This work will require updates to the core OpenShift API repository to add the new platform type, and then a distribution of this change to all components that use the platform type information. For components that partners might replace, per-component action will need to be taken, with the project team's guidance, to ensure that the component properly handles the "External" platform. These changes will look slightly different for each component.

To integrate these changes more easily into OpenShift, it is possible to take a multi-phase approach which could be spread over a release boundary (eg phase 1 is done in 4.X, phase 2 is done in 4.X+1).

OCPBU-5: Phase 1

  • Write platform “External” enhancement.
  • Evaluate changes to cluster capability annotations to ensure coverage for all replaceable components.
  • Meet with component teams to plan specific changes that will allow for supplement or replacement under platform "External".
  • Start implementing changes towards Phase 2.

OCPBU-510: Phase 2

  • Update OpenShift API with new platform and ensure all components have updated dependencies.
  • Update capabilities API to include coverage for all replaceable components.
  • Ensure all Red Hat operators tolerate the "External" platform and treat it the same as "None" platform.

OCPBU-329: Phase.Next

  • TBD

Why is this important?

  • As partners begin to supplement OpenShift's core functionality with their own platform specific components, having a way to recognize clusters that are in this state helps Red Hat created components to know when they should expect their functionality to be replaced or supplemented. Adding a new platform type is a significant data point that will allow Red Hat components to understand the cluster configuration and make any specific adjustments to their operation while a partner's component may be performing a similar duty.
  • The new platform type also helps with support to give a clear signal that a cluster has modifications to its core components that might require additional interaction with the partner instead of Red Hat. When combined with the cluster capabilities configuration, the platform "External" can be used to positively identify when a cluster is being supplemented by a partner, and which components are being supplemented or replaced.

Scenarios

  1. A partner wishes to replace the Machine controller with a custom version that they have written for their infrastructure. Setting the platform to "External" and advertising the Machine API capability gives a clear signal to the Red Hat created Machine API components that they should start the infrastructure generic controllers but not start a Machine controller.
  2. A partner wishes to add their own Cloud Controller Manager (CCM) written for their infrastructure. Setting the platform to "External" and advertising the CCM capability gives a clear to the Red Hat created CCM operator that the cluster should be configured for an external CCM that will be managed outside the operator. Although the Red Hat operator will not provide this functionality, it will configure the cluster to expect a CCM.

Acceptance Criteria

Phase 1

  • Partners can read "External" platform enhancement and plan for their platform integrations.
  • Teams can view jira cards for component changes and capability updates and plan their work as appropriate.

Phase 2

  • Components running in cluster can detect the “External” platform through the Infrastructure config API
  • Components running in cluster react to “External” platform as if it is “None” platform
  • Partners can disable any of the platform specific components through the capabilities API

Phase 3

  • Components running in cluster react to the “External” platform based on their function.
    • for example, the Machine API Operator needs to run a set of controllers that are platform agnostic when running in platform “External” mode.
    • the specific component reactions are difficult to predict currently, this criteria could change based on the output of phase 1.

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. Identifying OpenShift Components for Install Flexibility

Open questions::

  1. Phase 1 requires talking with several component teams, the specific action that will be needed will depend on the needs of the specific component. At the least the components need to treat platform "External" as "None", but there could be more changes depending on the component (eg Machine API Operator running non-platform specific controllers).

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Epic Goal

  • Empower External platform type user to specify when they will run their own CCM

Why is this important?

  • For partners wishing to use components that require zonal awareness provided by the infrastructure (for example CSI drivers), they will need to exercise their own cloud controller managers. This epic is about adding the proper configuration to OpenShift to allow users of External platform types to run their own CCMs.

Scenarios

  1. As a Red Hat partner, I would like to deploy OpenShift with my own CSI driver. To do this I need my CCM deployed as well. Having a way to instruct OpenShift to expect an external CCM deployment would allow me to do this.

Acceptance Criteria

  • CI - A new periodic test based on the External platform test would be ideal
  • Release Technical Enablement - Provide necessary release enablement details and documents.
    • Update docs.ci.openshift.org with CCM docs

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. https://github.com/openshift/enhancements/blob/master/enhancements/cloud-integration/infrastructure-external-platform-type.md#api-extensions
  2. https://github.com/openshift/api/pull/1409

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story

As a user I want to use the openshift installer to create clusters of platform type External so that I can use openshift more effectively on a partner provider platform.

Background

To fully support the External platform type for partners and users, it will be useful to be able to have the installer understand when it sees the external platform type in the install-config.yaml, and then to properly populate the resulting infrastructure config object with the external platform type and platform name.

As defined in https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go#L241 , the external platform type allows the user to specify a name for the platform. This card is about updating the installer so that a user can provide both the external type and a platform name that will be expressed in the infrastructure manifest.

Aside from this information, the installer should continue with a normal platform "None" installation.

Steps

  • update installer to allow platform "External" specified in the install-config.yaml
  • update installer to allow platform name to specified as part of the External platform configuration

Stakeholders

  • openshift cloud infra team
  • openshift installer team
  • openshift assisted installer team

Definition of Done

  • user can specify external platform in the install-config.yaml and have a cluster with External platform type and a name for the platform.
  • cluster installs as expected for platform external (similar to none)
  • Docs
  • Testing
  • this feature should allow us to update our external platform tests to make the installation easier, tests should be updated to include this methodology

Feature Overview

  • As an infrastructure owner, I want a repeatable method to quickly deploy the initial OpenShift cluster.
  • As an infrastructure owner, I want to install the first (management, hub, “cluster 0”) cluster to manage other (standalone, hub, spoke, hub of hubs) clusters.

Goals

  • Enable customers and partners to successfully deploy a single “first” cluster in disconnected, on-premises settings

Requirements

4.11 MVP Requirements

  • Customers and partners needs to be able to download the installer
  • Enable customers and partners to deploy a single “first” cluster (cluster 0) using single node, compact, or highly available topologies in disconnected, on-premises settings
  • Installer must support advanced network settings such as static IP assignments, VLANs and NIC bonding for on-premises metal use cases, as well as DHCP and PXE provisioning environments.
  • Installer needs to support automation, including integration with third-party deployment tools, as well as user-driven deployments.
  • In the MVP automation has higher priority than interactive, user-driven deployments.
  • For bare metal deployments, we cannot assume that users will provide us the credentials to manage hosts via their BMCs.
  • Installer should prioritize support for platforms None, baremetal, and VMware.
  • The installer will focus on a single version of OpenShift, and a different build artifact will be produced for each different version.
  • The installer must not depend on a connected registry; however, the installer can optionally use a previously mirrored registry within the disconnected environment.

Use Cases

  • As a Telco partner engineer (Site Engineer, Specialist, Field Engineer), I want to deploy an OpenShift cluster in production with limited or no additional hardware and don’t intend to deploy more OpenShift clusters [Isolated edge experience].
  • As a Enterprise infrastructure owner, I want to manage the lifecycle of multiple clusters in 1 or more sites by first installing the first  (management, hub, “cluster 0”) cluster to manage other (standalone, hub, spoke, hub of hubs) clusters [Cluster before your cluster].
  • As a Partner, I want to package OpenShift for large scale and/or distributed topology with my own software and/or hardware solution.
  • As a large enterprise customer or Service Provider, I want to install a “HyperShift Tugboat” OpenShift cluster in order to offer a hosted OpenShift control plane at scale to my consumers (DevOps Engineers, tenants) that allows for fleet-level provisioning for low CAPEX and OPEX, much like AKS or GKE [Hypershift].
  • As a new, novice to intermediate user (Enterprise Admin/Consumer, Telco Partner integrator, RH Solution Architect), I want to quickly deploy a small OpenShift cluster for Poc/Demo/Research purposes.

Questions to answer…

  •  

Out of Scope

Out of scope use cases (that are part of the Kubeframe/factory project):

  • As a Partner (OEMs, ISVs), I want to install and pre-configure OpenShift with my hardware/software in my disconnected factory, while allowing further (minimal) reconfiguration of a subset of capabilities later at a different site by different set of users (end customer) [Embedded OpenShift].
  • As an Infrastructure Admin at an Enterprise customer with multiple remote sites, I want to pre-provision OpenShift centrally prior to shipping and activating the clusters in remote sites.

Background, and strategic fit

  • This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

  1. The user has only access to the target nodes that will form the cluster and will boot them with the image presented locally via a USB stick. This scenario is common in sites with restricted access such as government infra where only users with security clearance can interact with the installation, where software is allowed to enter in the premises (in a USB, DVD, SD card, etc.) but never allowed to come back out. Users can't enter supporting devices such as laptops or phones.
  2. The user has access to the target nodes remotely to their BMCs (e.g. iDrac, iLo) and can map an image as virtual media from their computer. This scenario is common in data centers where the customer provides network access to the BMCs of the target nodes.
  3. We cannot assume that we will have access to a computer to run an installer or installer helper software.

Customer Considerations

  • ...

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?
  • New Content, Updates to existing content, Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

 

References

 

 

Epic Goal

Why is this important?

  • The Agent Based Installer is a new install path targeting fully disconnected installs. We should be looking at adding support for ARM in all install paths to ensure our customers can deploy to disconnected environments.
  • We want to start having new projects/products launch with support for ARM by default.

Scenarios
1. …

Acceptance Criteria

  • The Agent Installer launches with aarch64 support
  • The Agent installer has QE completed & CI for aarch64

Dependencies (internal and external)
1. …

Previous Work (Optional):
1.https://issues.redhat.com/browse/ARMOCP-346 

Open questions::
1. …

Done Checklist

  • CI - For new features (non-enablement), existing Multi-Arch CI jobs are not broken by the Epic
  • Release Enablement: <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR orf GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - If the Epic is adding a new stream, downstream build attached to advisory: <link to errata>
  • QE - Test plans in Test Plan tracking software (e.g. Polarion, RQM, etc.): <link or reference to the Test Plan>
  • QE - Automated tests merged: <link or reference to automated tests>
  • QE - QE to verify documentation when testing
  • DOC - Downstream documentation merged: <link to meaningful PR>
  • All the stories, tasks, sub-tasks and bugs that belong to this epic need to have been completed and indicated by a status of 'Done'.

As an OCP admistrator, I would like to deploy OCP on arm64 BM with agent installer

Acceptance Criteria

Dev:

  • Ensure openshift-installer creates an arm64 agent.iso
  • Ensure openshift-installer creates the correct ignition config and supporting files for assisted-api
  • Ensure assisted-api can install 

Jira Admin

  • Additional Jira tickets created (if needed)

QE

  • Understand if QE is needed for agent installer (as this Epic is currently a TP)

Docs:

  • Understand if ARM documentation needs to be updated (as there is currently no x86 documentation)

Agent Installer

  • Investigate if Heterogeneous clusters are feasible for Agent Installer

Feature Overview (aka. Goal Summary)  

Support OpenShift installation in AWS Shared VPC [1] scenario where AWS infrastructure resources (at least the Private Hosted Zone) belong to an account separate from the cluster installation target account.

Goals (aka. expected user outcomes)

As a user I need to use a Shared VPC [1] when installing OpenShift on AWS into an existing VPC. Which will at least require the use of a preexisting Route53 hosted zone where I am not allowed the user "participant" of the shared VPC to automatically create Route53 private zones.

Requirements (aka. Acceptance Criteria):

The Installer is able to successfully deploy OpenShift on AWS with a Shared VPC [1], and the cluster is able to successfully pass osde2e testing. This will include at least the scenario when private hostedZone belongs to different account (Account A) than cluster resources (Account B)

[1] https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Enable/confirm installation in AWS shared VPC scenario where Private Hosted Zone belongs to an account separate from the cluster installation target account

Why is this important?

  • AWS best practices suggest this setup

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story:

I want

  • the installer to check for appropriate permissions based on whether the installation is using an existing hosted zone and whether that hosted zone is in another account

so that I can

  • be sure that my credentials have sufficient and minimal permissions before beginning install

Acceptance Criteria:

Description of criteria:

  • When specifying platform.aws.hostedZoneRole. Route53:CreateHostedZone and Route53:DeleteHostedZone are not required

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Feature Overview

  • Extend OpenShift on IBM Cloud integration with additional features to pair the capabilities offered for this provider integration to the ones available in other cloud platforms

Goals

  • Extend the existing features while deploying OpenShift on IBM Cloud

Background, and strategic fit

This top level feature is going to be used as a placeholder for the IBM team who is working on new features for this integration in an effort to keep in sync their existing internal backlog with the corresponding Features/Epics in Red Hat's Jira.

 

Epic Goal

With this BYON support:

  • shared resources (VPC, subnets) can be placed in the resource group specified by the `networkResourceGroupName` install config parameter.
  • installer provisioned cluster resources will be placed in the resource group specified by the `resourceGroupName` install config parameter.

 

  • `networkResourceGroupName` is a required parameter for the BYON scenario
  • `resourceGroupName` is an optional parameter

Why is this important?

  • This will allow customers (using IBM Cloud VPC BYON support) to organize pre-created / shared resources (VPC, subnets) in a resource group separate from installer provisioned cluster resources.

Scenarios

`networkResourceGroupName` NOT specified ==> non-BYON install scenario

  • if `resourceGroupName` is specified, then ALL installer provisioned resources (VPC, subnets, cluster) will be placed in specified resource group (resource group must exist)
  • if `resourceGroupName` is NOT specified, then ALL installer provisioned resources (VPC, subnets, cluster) will be placed in a resource group created during the install process

`networkResourceGroupName` specified ==> BYON install scenario (required for BYON scenario)

  • `networkResourceGroupName` must contain pre-created/shared resources (VPC, subnets)
  • if `resourceGroupName` is specified, then all installer provisioned cluster resources will be placed in specified resource group (resource group must exist)
  • if `resourceGroupName` is NOT specified, then all installer provisioned cluster resources will be placed in a resource group created during the install process

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story:

As a (user persona), I want to be able to:

  • Capability 1
  • Capability 2
  • Capability 3

so that I can achieve

  • Outcome 1
  • Outcome 2
  • Outcome 3

Acceptance Criteria:

Description of criteria:

  • Upstream documentation
  • Point 1
  • Point 2
  • Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster. 
Why customers want this?

  1. Single interface to accomplish their tasks
  2. Consistent UX and patterns
  3. Easily accessible: One URL, one set of credentials

Why we want this?

  • Shared code -  improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
  • Pre-built PF4 components
  • Accessibility & i18n
  • Remove barriers for enabling ACM

Phase 1 Goal: Get something to market (OCP 4.8, ACM 2.3)
Phase 1 —> OCP deploys ACM Hub Operator —> ACM Perspective becomes available —> User can switch between ACM multi-cluster view and local OCP Console —> No SSO user has to login in twice

Phase 2 Goal: Productization of the united Console (OCP 4.9, ACM 2.4)

  1. Enable user to quickly change context from fleet view to single cluster view
    1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
    2. Shared SSO across the fleet
    3. Hub OCP Console can connect to remote clusters API
    4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
  2. Share UX between views
    1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
    2. Add Cluster List to OCP —> Create Cluster

Phase 2  Use Cases:

  1.  As a user, I want to be able to quickly switch context from the Fleet view(ACM) to any spoke cluster Console view all from the same web browser tab.
    1. ACM Hub Operator deployed to OCP—> Cluster picker become available, with “All cluster option”= ACM —> Single cluster user will get perspective picker(Admin, Dev) —> User needs the ability to quickly change context to single cluster —> All clusters should be linked via shared SSO
    2.   
  2. As a user, I should be able to drill down into resources in the ACM view and get the OCP resource details page
    1. ACM Hub Operator deployed to OCP—> User Searches for pods from the ACM view("All clusters")--> Single pod is selected --> OCP pod detail page

We need to coordinate with the ACM team so that the masthead looks the same when switching between contexts. This might require us to consume a common masthead component in OCP console.

The ACM team will need to honor our custom branding configuration so that the logo does not change when switching contexts.

Known differences:

  • Branding customization
  • Console link CRDs
  • Global notifications
  • Import button
  • Notification drawer
  • Language preferences
  • Search link (ACM only)
  • Web terminal (ACM only)

Open questions:

  • How do we handle alerts in the notification drawer across cluster contexts?

Pre-Work Objectives

Since some of our requirements from the ACM team will not be available for the 4.12 timeframe, the team should work on anything we can get done in the scope of the console repo so that when the required items are available in 4.13, we can be more nimble in delivering GA content for the Unified Console Epic.

Overall GA Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster. 
Why customers want this?

  1. Single interface to accomplish their tasks
  2. Consistent UX and patterns
  3. Easily accessible: One URL, one set of credentials

Why we want this?

  • Shared code -  improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
  • Pre-built PF4 components
  • Accessibility & i18n
  • Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console 

  1. Enable user to quickly change context from fleet view to single cluster view
    1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
    2. Shared SSO across the fleet
    3. Hub OCP Console can connect to remote clusters API
    4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
  2. Share UX between views
    1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
    2. Add Cluster List to OCP —> Create Cluster

As a developer I would like to disable clusters like *KS that we can't support for multi-cluster (for instance because we can't authenticate). The ManagedCluster resource has a vendor label that we can use to know if the cluster is supported.

cc Ali Mobrem Sho Weimer Jakub Hadvig 

UPDATE: 9/20/22 : we want an allow-list with OpenShift, ROSA, ARO, ROKS, and  OpenShiftDedicated

Acceptance criteria:

  • Investigate if console-operator should pass info about which cluster are supported and unsupported to the frontend
  • Unsupported clusters should not appear in the cluster dropdown
  • Unsupported clusters based off
    • defined vendor label
    • non 4.x ocp clusters

This epic contains all the OLM related stories for OCP release-4.14

Epic Goal

  • Track all the stories under a single epic

Console operator should be building up a set of cluster nodes OS types, which he should supply to console, so it renders only operators that could be installed on the cluster.

This will be needed when we will support different OS types on the cluster.

We need to scan through the compute nodes and build a set of supported OS from those. Each node on the cluster has a label for its operating system: e.g. kubernetes.io/os=linux,

 

AC:

  1. Implement logic in the console-operator that will scan though all the nodes and build a set of all the OS types that the cluster nodes run on and pass it to the console-config.yaml . This set of OS types will be then used by console frontend.
  2. Add unit and e2e test cases in the console-operator repository.

This epic contains all the OLM related stories for OCP release-4.13

Epic Goal

  • Track all the stories under a single epic

Description/Acceptance Criteria:

  • Add RBAC for the console-operator so it can GET/LIST/WATCH OLMConfig  cluster config. The RBAC should be added to console-operator cluster-role rules 
  • The console operator should watch the spec.features.disableCopiedCSVs property of the OLM cluster config. When this property is true, the console-config should be updated "clusterInfo.copiedCSVsDisabled" field accordingly, and rollout a new version of console.

Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster. 
Why customers want this?

  1. Single interface to accomplish their tasks
  2. Consistent UX and patterns
  3. Easily accessible: One URL, one set of credentials

Why we want this?

  • Shared code -  improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
  • Pre-built PF4 components
  • Accessibility & i18n
  • Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console 

  1. Enable user to quickly change context from fleet view to single cluster view
    1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
    2. Shared SSO across the fleet
    3. Hub OCP Console can connect to remote clusters API
    4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
  2. Share UX between views
    1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
    2. Add Cluster List to OCP —> Create Cluster

We need a way to show metrics for workloads running on spoke clusters. This depends on ACM-876, which lets the console discover the monitoring endpoints.

  • Console operator must discover the external URLs for monitoring
  • Console operator must pass the URLs and CA files as part of the cluster config to the console backend
  • Console backend must set up proxies for each endpoint (as it does for the API server endpoints)
  • Console frontend must include the cluster in metrics requests

Open Issues:

We will depend on ACM to create a route on each spoke cluster for the prometheus tenancy service, which is required for metrics for normal users.

 

Openshift console backend should proxy managed cluster monitoring requests through the MCE cluster proxy addon to prometheus services on the managed cluster. This depends on https://issues.redhat.com/browse/ACM-1188

 

Feature Overview

Enable sharing ConfigMap and Secret across namespaces

Requirements

Requirement Notes isMvp?
Secrets and ConfigMaps can get shared across namespaces   YES

Questions to answer…

NA

Out of Scope

NA

Background, and strategic fit

Consumption of RHEL entitlements has been a challenge on OCP 4 since it moved to a cluster-based entitlement model compared to the node-based (RHEL subscription manager) entitlement mode. In order to provide a sufficiently similar experience to OCP 3, the entitlement certificates that are made available on the cluster (OCPBU-93) should be shared across namespaces in order to prevent the need for cluster admin to copy these entitlements in each namespace which leads to additional operational challenges for updating and refreshing them. 

Documentation Considerations

Questions to be addressed:
 * What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
 * Does this feature have doc impact?
 * New Content, Updates to existing content, Release Note, or No Doc Impact
 * If unsure and no Technical Writer is available, please contact Content Strategy.
 * What concepts do customers need to understand to be successful in [action]?
 * How do we expect customers will use the feature? For what purpose(s)?
 * What reference material might a customer want/need to complete [action]?
 * Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
 * What is the doc impact (New Content, Updates to existing content, or Release Note)?

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Allow ConfigMaps and Secrets (resources) to be mounted as volumes in a build

Why is this important?

  • Secrets and ConfigMaps can be added to builds as "source" code that can leak into the resulting container image
  • When using sensitive credentials in a build, accessing secrets as a mounted volume ensure that these credentials are not present in the resulting container image.

Scenarios

  1. Access private artifact repositories (Artifactory, jFrog, Mavein)
  2. Download RHEL packages in a build

Acceptance Criteria

  • Builds can mount a Secret or ConfigMap in a build
  • Content in the secret or ConfigMap are not present in the resulting container image.

Dependencies (internal and external)

  1. Buildah - support mounting of volumes when building with a Dockerfile

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview

  • Enable user custom RHCOS images location for Installer IPI provisioned OpenShift clusters on Google Cloud and Azure

Goals

  • The Installer to accept custom locations for RHCOS images while deploying OpenShift on Google Cloud and Azure as we support already for AWS via `platform.aws.amiID` for control plane and compute nodes.
  • As a user, I want to be able to specify a custom RHCOS image location to be used for control plane and compute nodes while deploying OpenShift on Google Cloud and Azure so that I cab be complaint with my company security policies.

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES
  •  

Background, and strategic fit

Many enterprises have strict security policies where all the software must be pulled from a trusted or private source. For these scenarios the RHCOS image used to bootstrap the cluster is usually coming from shared public locations that some companies don't accept as a trusted source.

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?
  • New Content, Updates to existing content, Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

 

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Simplify ARO's workflow by allowing Azure marketplace images to be specified in the `install-config.yaml` for all nodes (compute, control plane, and bootstrap).

Why is this important?

  • ARO is a first party Azure service and has a number of requirements/restrictions. These requirements include the following: it must not request anything from outside of Azure and it must consume RHCOS VM images from a trusted source (marketplace).
  • At the same time upstream OCP does the following:
    1. It uses quay.io to get container images.
    2. Uses a random blob as a RHCOS VM image such as this. This VHD blob is then uploaded by the Installer to an Image Gallery in the user’s Storage Account where two boot images are created: a HyperV gen1 and a HyperV gen2. See here.
      To meet the requirements ARO team currently does the following as part of the release process:
    1. Mirror container images from quay.io to Azure Container Registry to avoid leaving Azure boundaries.
    2. Copy VM image from the blob in someone else's Azure subscription into the blob on the subscription ARO team manages and then publish a VM image on Azure Marketplace (publisher: azureopenshift, offer: aro4. See az vm image list --publisher azureopenshift --all). ARO does not bill for these images.
  • ARO has to carry their own changes on top of the Installer code to allow them to specify their own images for the cluster deployment.

Scenarios

  1. ...

Acceptance Criteria

  • Custom RHCOS images can be specified in the install-config for compute, controlPlane and defaultMachinePlatform and they are used for the installation instead of the default RHCOS VHD.

Out of scope

  • A VHD blob will still be uploaded to the user's Storage Account even though it won't be used during installation. That cannot be changed for now.

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Description of problem:

ARO needs to copy RHCOS image blobs to their own Azure Marketplace offering since, as a first party Azure service, they must not request anything from outside of Azure and must consume RHCOS VM images from a trusted source (marketplace).
To meet the requirements ARO team currently does the following as part of the release process:

 1. Mirror container images from quay.io to Azure Container Registry to avoid leaving Azure boundaries.
 2. Copy VM image from the blob in someone else's Azure subscription
 into the blob on the subscription ARO team manages and then we publish a VM image on Azure Marketplace (publisher: azureopenshift, offer: aro4. See az vm image list --publisher azureopenshift --all). We do not bill for these images.

The usage of Marketplace images in the installer was already implemented as part of CORS-1823. This single line [1] needs to be refactored to enable ARO from the installer code perspective: on ARO we don't need to set type to AzureImageTypeMarketplaceWithPlan.

However, in OCPPLAN-7556 and related CORS-1823 it was mentioned that using Marketplace images is out of scope for nodes other than compute. For ARO we need to be able to use marketplace images for all nodes.

[1] https://github.com/openshift/installer/blob/f912534f12491721e3874e2bf64f7fa8d44aa7f5/pkg/asset/machines/azure/machines.go#L107

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Set RHCOS image from Azure Marketplace in the installconfig
2. Deploy a cluster
3.

Actual results:

Only compute nodes use the Marketplace image.

Expected results:

All nodes created by the Installer use RHCOS image coming from Azure Marketplace.

Additional info:

 

 

Epic Goal

  • As a customer, I need to make sure that the RHCOS image I leverage is coming from a trusted source. 

Why is this important?

  • For customer who have a very restricted security policies imposed by their InfoSec teams they need to be able to manually specify a custom location for the RHCOS image to use for the Cluster Nodes.

Scenarios

  1. As a customer, I want to specify a custom location for the RHCOS image to be used for the cluster Nodes

Acceptance Criteria

A user is able to specify a custom location in the Installer manifest for the RHCOS image to be used for bootstrap and cluster Nodes. This is the similar approach we support already for AWS with the compute.platform.aws.amiID option

Previous Work (Optional):

https://issues.redhat.com/browse/CORS-1103

 

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

 

 

 

 

 

 

User Story:

As a user, I want to be able to:

  • Specify a RHCOS image coming from a custom source in the install config to override the installer's internal choice of bootimage  

so that I can achieve

  • a custom location in the install config for the RHCOS image to use for the Cluster Nodes

Acceptance Criteria:

A user is able to specify a custom location in the Installer manifest for the RHCOS image to be used for bootstrap and cluster Nodes. This is the similar approach we support already for AWS with the compute.platform.aws.amiID option

(optional) Out of Scope:

 

Engineering Details:

  •  

User Story:

Some background on the Licenses field:

https://github.com/openshift/installer/pull/3808#issuecomment-663153787

https://github.com/openshift/installer/pull/4696

So we do not want to allow licenses to be specified (it's up to customers to create a custom image with licenses embedded and supply that to the Installer) when pre-built images are specified (current behaviour). Since we don't need to specify licenses for RHCOs images anymore, the Licenses field is useless and should be deprecated.

Acceptance Criteria:

Description of criteria:

  • License field deprecated
  • Any dev docs mentioning Licenses is updated.

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Epic Goal

  • Improve the default configuration the installer uses when the control-plane is single node

Why is this important?

  • Starting 4.13 we're going to officially support (OCPBU-95) SNO on AWS, so our installer defaults need to make sense

Scenarios

  1. User performs AWS IPI installation with number of control plane node replicas equal to 1. Installer will default instance type to be bigger than it usually would, to align with larger single-node openshift control plane requirements 

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Background

  • Starting with version 4.13 OCP is going to officially support Single
    Node clusters on AWS.
  • The minimum documented OCP requirement for single-node control plane
    nodes is 8-cores and 16GiB of RAM
  • The current default instance type chosen for AWS clusters by the
    installer is `xlarge` which is 4 cores and 16GiB of RAM

Issue

The default instance type the installer currently chooses for Single
Node Openshift clusters doesn't follow our documented minimum
requirements

Solution

When the number of replicas of the ControlPlane pool is 1, the installer
will now choose `2xlarge` instead of `xlarge`.

Caveat

`2xlarge` has 32GiB of RAM, which is twice as much as we need, but it's
the best we can do to meet the minimum single-node requirements, because
AWS doesn't offer a 16GiB RAM instance type with 8 cores.

 

Feature Overview (aka. Goal Summary)  

Goal: Control plane nodes in the cluster can be scaled up or down, lost and recovered, with no more importance or special procedure than that of a data plane node.

Problem: There is a lengthy special procedure to recover from a failed control plane node (or majority of nodes) and to add new control plane nodes.

Why is this important: Increased operational simplicity and scale flexibility of the cluster’s control plane deployment.

Goals (aka. expected user outcomes)

To enable full support for control plane machine sets on GCP

 

Requirements (aka. Acceptance Criteria):

  • Generate CPMS for upgraded clusters
  • Document support for upgraded clusters
  • Ensure E2E testing for GCP clusters

Out of Scope

Any other cloud platforms

Background

Feature created from split of overarching Control Plane Machine Set feature into single release based effort

 

Customer Considerations

n/a

 

Documentation Considerations

Nothing outside documentation that shows the Azure platform is supported as part of Control Plane Machine Sets

 

Interoperability Considerations

n/a

Goal:

Control plane nodes in the cluster can be scaled up or down, lost and recovered, with no more importance or special procedure than that of a data plane node.

Problem:

There is a lengthy special procedure to recover from a failed control plane node (or majority of nodes) and to add new control plane nodes.

Why is this important:

  • Increased operational simplicity and scale flexibility of the cluster’s control plane deployment.

Lifecycle Information:

  • Core

Previous Work:

Dependencies:

  • Etcd operator

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL):

 

 

 

User Story:

As a developer, I want to be able to:

  • Create Azure control plane nodes using MachineSets.

so that I can achieve

  • More control over the nodes using the MachineAPI Operator.

Acceptance Criteria:

Description of criteria:

  • New CRD ControlPlaneMachineSet is used and populated.
  • New manifest is created for the ControlPlaneMachineSet.
  • Fields required for the CRD are set.

(optional) Out of Scope:

 

Engineering Details:

This does not require a design proposal.
This does not require a feature gate.

Feature Overview

  • Customers want to create and manage OpenShift clusters using managed identities for Azure resources for authentication.

Goals

  • A customer using ARO wants to spin up an OpenShift cluster with "az aro create" without needing additional input, i.e. without the need for an AD account or service principal credentials, and the identity used is never visible to the customer and cannot appear in the cluster.
  • As an administrator, I want to deploy OpenShift 4 and run Operators on Azure using access controls (IAM roles) with temporary, limited privilege credentials.

Requirements

  • Azure managed identities must work for installation with all install methods including IPI and UPI, work with upgrades, and day-to-day cluster lifecycle operations.
  • Support HyperShift and non-HyperShift clusters.
  • Support use of Operators with Azure managed identities.
  • Support in all Azure regions where Azure managed identity is available. Note: Federated credentials is associated with Azure Managed Identity, and federated credentials is not available in all Azure regions.

More details at ARO managed identity scope and impact.

 

This Section: A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

(Optional) Use Cases

This Section:

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

Questions to answer…

  • ...

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

  • ...

Customer Considerations

  • ...

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?
  • New Content, Updates to existing content, Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

References

Epic Goal

  • Enable the OpenShift Installer to authenticate using authentication methods supported by both the azure sdk for go and the terraform azure provider
  • Future proofing to enable Terraform support for workload identity authentication when it is enabled upstream

Why is this important?

  • This ties in to the larger OpenShift goal of: as an infrastructure owner, I want to deploy OpenShift on Azure using Azure Managed Identities (vs. using Azure Service Principal) for authentication and authorization.
  • Customers want support for using Azure managed identities in lieu of using an Azure service principal. In the OpenShift documentation, we are directed to use an Azure Service Principal - "Azure offers the ability to create service accounts, which access, manage, or create components within Azure. The service account grants API access to specific services". However, Microsoft and the customer would prefer that we use User Managed Identities to keep from putting the Service Principal and principal password in clear text within the azure.conf file. 
  • See https://docs.microsoft.com/en-us/azure/active-directory/develop/workload-identity-federation for additional information.

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. ...

Open questions::

  1. ...

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story:

As a cluster admin I want to be able to:

  • use the managed identity from the installer host VM (running in Azure)

so that I can

  • install a cluster without copying credentials to the installer host

Acceptance Criteria:

Description of criteria:

  • Installer (azure sdk) & terraform authenticate using identity from host VM (not client secret in file ~/.azure/servicePrincipal.json)
  • Cluster credential is handled appropriately (presumably we force manual mode)

Engineering Details:

Feature Overview

  • Enables OTA updates from OpenShift 4.12.x to OpenShift 4.13.x.

Goals

  • As a platform administrator, I want to upgrade my OpenShift cluster from a previous supported release to the current release, i.e. 4.12.x to 4.13.x.
  • Ensure upgrades work smoothly without impacting end user workloads (for HA clusters) from the previous release to the latest release for all supported OpenShift environments:
  • Connected and disconnected deployments
  • All support topologies (SNO, compact cluster, standard HA cluster, RWN)
  • All platforms and providers
  • Cloud and on-premises

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

Epic Goal

  • Provide a convenient  way to migrate from a homogeneous to a heterogeneous cluster.

Why is this important?

  • So customers with an existing cluster can migrate to a heterogeneous payload rather than doing a fresh install, without needing to use oc adm upgrade --allow-explicit-upgrade --to-image "${PULLSPEC}".  OTA-658 and maybe some oc side tooling, if folks feel oc patch ... is too heavy (although see discussion in OTA-597 about policies for adding new oc subcommands).
  • So components (like which?) can make decisions (like what?) based on the "current" cluster architecture. OTA-659.

Scenarios

  1. Upgrade from a homogeneous release eg. 4.11.0-x86_64 to a heterogeneous release 4.11.0-multi.
  2. Ensure that ClusterVersion spec has a new architecture field to denote desired architecture of the cluster
  3. Ensure ClusterVersionStatus populates a new architecture field denoting the current architecture of the cluster.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

  1.    Should the migration also be an upgrade or should it be two separate steps? i.e, migrate to hetero release of same version and then upgrade?

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview

Create a Azure cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in Azure) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

 
Goals

  • Functionality on Azure Tech Preview
  • inclusion in the cluster backups
  • flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

This is continuation of CORS-2249 / CFE-671 work, where support for Azure tags was delivered as TechPreview in 4.13 and to make it GA in 4.14. It would involve removing any reference to TechPreview in code and doc and to incorporate any feedback received from the users.

Remove code references related to Azure Tags is for TechPreview in below list

  • installer/data/data/install.openshift.io_installconfigs.yaml (PR#6820)
  • installer/pkg/explain/printer_test.go (PR#6820)
  • installer/pkg/types/azure/platform.go (PR#6820)
  • installer/pkg/types/validation/installconfig.go (PR#6820)

Goal:
Support migration from dual-stack IPv6 to single-stack IPv6.

Why is this important?
We have customers who want to deploy a dual stack cluster and then (eventually) migrate to single stack ipv6 once all of their ipv4 dependencies are eliminated. Currently this isn't possible because we only support ipv4-primary dual stack deployments. However, with the implementation of OPNET-1 we addressed many of the limitations that prevented ipv6-primary, so we need to figure out what remains to make this supported.

At the very least we need to remove the validations in the installer that requires ipv4 to be the primary address. There will also be changes needed in dev-scripts to allow testing (an option to make the v6 subnets and addresses primary, for example).

We have customers who want to deploy a dual stack cluster and then migrate to single stack ipv6 once all of their ipv4 dependencies are eliminated. Currently this isn't possible because we only support ipv4-primary dual stack deployments. However, with the implementation of OPNET-1 we addressed many of the limitations that prevented ipv6-primary, so we need to figure out what remains to make this supported. At the very least we need to remove the validations in the installer that require ipv4 to be the primary address. There will also be changes needed in dev-scripts to allow testing (an option to make the v6 subnets and addresses primary, for example).

Runtimecfg assumes ipv4-primary in some places today and we need to make that aware of whether a cluster is v4 or v6 primary.

The installer currently enforces ipv4-primary for dual stack deployments. We will need to remove/modify those validations to allow an ipv6-primary configureation.

Feature Overview

Create a Azure cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in Azure) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

 
Goals

  • Functionality on Azure Tech Preview
  • inclusion in the cluster backups
  • flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

This epic covers the work to apply user defined tags to Azure created for openshift cluster available as tech preview.

The user should be able to define the azure tags to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for tagging when the resources are created.

Updating/deleting of tags added during cluster creation or adding new tags as Day-2 operation is out of scope of this epic.

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

Reference - https://issues.redhat.com/browse/RFE-2017

Installer creates below list of resources during create cluster phase and these resources should be applied with the user defined tags and the default OCP tag kubernetes.io/cluster/<cluster_name>:owned

Resources List

Resource Terraform API
Resource group azurerm_resource_group
Image azurerm_image
Load Balancer azurerm_lb
Network Security Group azurerm_network_security_group
Storage Account azurerm_storage_account
Managed Identity azurerm_user_assigned_identity
Virtual network azurerm_virtual_network
Virtual machine azurerm_linux_virtual_machine
Network Interface azurerm_network_interface
Private DNS Zone azurerm_private_dns_zone
DNS Record azurerm_dns_cname_record

Acceptance Criteria:

  • Code linting, validation and best practices adhered to
  • List of azure resources created by installer should have user defined tags and as well as the default OCP tag.

Issues found by QE team during pre-merge tests are reported in QE Tracker, which should be fixed.

Acceptance criteria:

  • Update UTs, if required
  • Update enhancement, if required

Installer generates Infrastructure CR in manifests creation step of cluster creation process based on the user provided input recorded in install-config.yaml. While generating Infrastructure CR platformStatus.azure.resourceTags should be updated with the user provided tags(installconfig.platform.azure.userTags).

Acceptance Criteria

  • Code linting, validation and best practices adhered to
  • Infrastructure CR created by installer should have azure user defined tags if any, in status field.

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Cluster Infrastructure owned components should be running on Kubernetes 1.27
  • This includes
    • The cluster autoscaler (+operator)
    • Machine API operator
      • Machine API controllers for:
        • AWS
        • Azure
        • GCP
        • vSphere
        • OpenStack
        • IBM
        • Nutanix
    • Cloud Controller Manager Operator
      • Cloud controller managers for:
        • AWS
        • Azure
        • GCP
        • vSphere
        • OpenStack
        • IBM
        • Nutanix
    • Cluster Machine Approver
    • Cluster API Actuator Package
    • Control Plane Machine Set Operator

Why is this important?

  • ...

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. ...

Open questions::

  1. ...

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

Epic Goal*

What is our purpose in implementing this?  What new capability will be available to customers?

 
Why is this important? (mandatory)

What are the benefits to the customer or Red Hat?   Does it improve security, performance, supportability, etc?  Why is work a priority?

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1.  

 
Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation -
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Epic Goal

  • Update all images that we ship with OpenShift to the latest upstream releases and libraries.
  • Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.

Why is this important?

  • We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

  • Kubernetes:
    • client-go
    • controller-runtime
  • OCP:
    • library-go
    • openshift/api
    • openshift/client-go
    • operator-sdk

Operators:

  • aws-ebs-csi-driver-operator 
  • aws-efs-csi-driver-operator
  • azure-disk-csi-driver-operator
  • azure-file-csi-driver-operator
  • openstack-cinder-csi-driver-operator
  • gcp-pd-csi-driver-operator
  • gcp-filestore-csi-driver-operator
  • csi-driver-manila-operator
  • vmware-vsphere-csi-driver-operator
  • alibaba-disk-csi-driver-operator
  • ibm-vpc-block-csi-driver-operator
  • csi-driver-shared-resource-operator
  • ibm-powervs-block-csi-driver-operator

 

  • cluster-storage-operator
  • cluster-csi-snapshot-controller-operator
  • local-storage-operator
  • vsphere-problem-detector

EOL, do not upgrade:

  • github.com/oVirt/csi-driver-operator

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

  • external-attacher
  • external-provisioner
  • external-resizer
  • external-snapshotter
  • node-driver-registrar
  • livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in  go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

Feature Overview (aka. Goal Summary)  

Goal: Control plane nodes in the cluster can be scaled up or down, lost and recovered, with no more importance or special procedure than that of a data plane node.

Problem: There is a lengthy special procedure to recover from a failed control plane node (or majority of nodes) and to add new control plane nodes.

Why is this important: Increased operational simplicity and scale flexibility of the cluster’s control plane deployment.

Goals (aka. expected user outcomes)

To enable full support for control plane machine sets on Azure

 

Requirements (aka. Acceptance Criteria):

  • Generate CPMS for upgraded clusters
  • Document support for upgraded clusters
  • Ensure E2E testing for Azure clusters

Out of Scope

Any other cloud platforms

Background

Feature created from split of overarching Control Plane Machine Set feature into single release based effort

 

Customer Considerations

n/a

 

Documentation Considerations

Nothing outside documentation that shows the Azure platform is supported as part of Control Plane Machine Sets

 

Interoperability Considerations

n/a

Goal:

Control plane nodes in the cluster can be scaled up or down, lost and recovered, with no more importance or special procedure than that of a data plane node.

Problem:

There is a lengthy special procedure to recover from a failed control plane node (or majority of nodes) and to add new control plane nodes.

Why is this important:

  • Increased operational simplicity and scale flexibility of the cluster’s control plane deployment.

Lifecycle Information:

  • Core

Previous Work:

Dependencies:

  • Etcd operator

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL):

 

 

 

User Story:

As a developer, I want to be able to:

  • Create Azure control plane nodes using MachineSets.

so that I can achieve

  • More control over the nodes using the MachineAPI Operator.

Acceptance Criteria:

Description of criteria:

  • New CRD ControlPlaneMachineSet is used and populated.
  • New manifest is created for the ControlPlaneMachineSet.
  • Fields required for the CRD are set.

(optional) Out of Scope:

 

Engineering Details:

This does not require a design proposal.
This does not require a feature gate.

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

 

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Complete during New status.

 

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

 

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

 

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Problem:

As a developer of serverless functions, we don't provide any samples.

Goal:

Provide Serverless Function samples in the sample catalog.  These would be utilizing the Builder Image capabilities.

Why is it important?

Use cases:

  1. <case>

Acceptance criteria:

  1. <criteria>

Dependencies (External/Internal):

  • Serverless team would need to provide sample repo for serverless function
  • Samples operator would need to be update

Design Artifacts:

Exploration:

Note:

  • Need to define the API and confirm with other stakeholders - need to support a serverless func image stream "tag"
  • Serverless team will need to provide updates to the existing Image Streams, as well as maintain the sample repositories which are referenced in the Image Streams.
  • Need to understand the relationship between ImageStream and Image Stream Tag
  • Should serverless function samples in the catalog have "builder image" tag?  or should it be "serverless function"

Description

As an operator author, I want to provide additional samples that are tied to an operator version, not an OpenShift release. For that, I want to create a resource to add new samples to the web console.

Acceptance Criteria

  1. openshift/console-operator update so that new clusters have the new ConsoleSample CRD
  2. Add RBAC permissions (roles and rolebinding?) so that all users have access to ConsoleSample resources

Additional Details:

Enable OpenShift to support the Shield VMs capability on Google Cloud

 

 

 

 

Epic Goal

  • Support OpenShift and the IPI workflow on GCP to use Shielded VMs feature from Google Cloud

Why is this important?

  • Many Google Cloud customers want to leverage Shielded VMs feature while deploying OpenShift on GCP

Scenarios

  1. As a user, I want to be able to instruct the OpenShift Installer to use Shield VMs while deploying the platform on Google Cloud so I can use the Shield VMs feature from GCP on every Node

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  •  

Dependencies (internal and external)

  1. OCPBUGS-4522 coreos fail to boot on GCP when enabling secure boot

Open questions::

  1. Should we add API to support all shielded VMs options (Secure Boot, vTPM, Integrity Monitoring) or just Secure Boot? 

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview (aka. Goal Summary)  

As Arm adoption grows OpenShift on Arm is a key strategic initiative for Red Hat. Key to success is the support of all key cloud providers adopting this technology. Google have announced support for Arm in their GCP offering and we need to support OpenShift in this configuration.

Goals (aka. expected user outcomes)

The ability to have OCP on Arm running in a GCP instance

Requirements (aka. Acceptance Criteria):

OCP on Arm running in a GCP instance

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

 

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Description:

Update 4.14 documentation to reflect new GCP support on ARM machines.

Updates: 

  • Add google instance types for ARM
  • Add config parameters 
  • Supported installation platforms 
  • Release note

Acceptance criteria: 

  • Dev and QE ack
  • PR is merged 

 

Description: 

In order to add instance types to the OCP documentation, there needs to be a .md file in the OpenShift installer repo that contains the 64-bit ARM machine types that have been texted and are supported on GCP. 

Create a PR in the OpenShift installer repo that creates a new .md file that shows the supported instance types 

Acceptance criteria: 

  • Dev and QE ack from ARM side 
  • Dev and QE ack from Installer side
  • Approval from installer product manager 
  • PR is merged and ready to be used for OCP docs referencing 

Note: Replace text in red with details of your feature request.

Feature Overview

Extend the Workload Partitioning feature to support multi-node clusters.

Goals

Customers running RAN workloads on C-RAN Hubs (i.e. multi-node clusters) that want to maximize the cores available to the workloads (DU) should be able to utilize WP to isolate CP processes to reserved cores.

Requirements

A list of specific needs or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts.  If a non MVP requirement slips, it does not shift the feature.

requirement Notes isMvp?
     
     
     

 

Describe Use Cases (if needed)

< How will the user interact with this feature? >

< Which users will use this and when will they use it? >

< Is this feature used as part of current user interface? >

Out of Scope

 

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

< Are there Upgrade considerations that customers need to account for or that the feature should address on behalf of the customer?>

<Does the Feature introduce data that could be gathered and used for Insights purposes?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

< What does success look like?>

< Does this feature have doc impact?  Possible values are: New Content, Updates to existing content,  Release Note, or No Doc Impact>

< If unsure and no Technical Writer is available, please contact Content Strategy. If yes, complete the following.>

  • <What concepts do customers need to understand to be successful in [action]?>
  • <How do we expect customers will use the feature? For what purpose(s)?>
  • <What reference material might a customer want/need to complete [action]?>
  • <Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available. >
  • <What is the doc impact (New Content, Updates to existing content, or Release Note)?>

Interoperability Considerations

< Which other products and versions in our portfolio does this feature impact?>

< What interoperability test scenarios should be factored by the layered product(s)?>

Questions

Question Outcome
   

 

 

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Add support to Installer to bootstrap cluster with the configurations for CPU Partitioning based off of the infrastructure flag and NTO generated configurations.

We need to call NTO bootstrap render during the bootstrap cycle. This will follow the same pattern that MCO follows and other components that render during bootstrap.

Since this feature requires that it be turned on ONLY at install time, and can not be turned off, the best place we've found to set the Infrastructure.Status option is through the openshift installer. This has a few benefits, the primary of which being simplifying how this feature get's used by upstream teams such as Assisted Installer and ZTP. If we expose this option as an install config it makes it trivial for those consumers to support turning on this feature at install time.

We'll need to update the openshift installer configuration option to support a flag for CPU Partitioning at install time.

We'll need to add a new flag to the InstallConfig

cpuPartitioningMode: None | AllNode

Incomplete Features

When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

For users who are using OpenShift but have not yet begun to explore multicluster and we we offer them.

I'm investigating where Learning paths are today and what is required.

As a user I'd like to have learning path for how to get started with Multicluster.
Install MCE
Create multiple clusters
Use HyperShift
Provide access to cluster creation to devs via templates
Scale up to ACM/ACS (OPP?)

Status
https://github.com/patternfly/patternfly-quickstarts/issues/37#issuecomment-1199840223

Goal: Resources provided via the Dynamic Resource Allocation Kubernetes mechanism can be consumed by VMs.

Details: Dynamic Resource Allocation

Goal

Come up with a design of how resources provided by Dynamic Resource Allocation can be consumed by KubeVirt VMs.

Description

The Dynamic Resource Allocation (DRA) feature is an alpha API in Kubernetes 1.26, which is the base for OpenShift 4.13.
This feature provides the ability to create ResourceClaim and ResourceClasse to request access to Resources. This is similar to the dynamic provisioning of PersistentVolume via PersistentVolumeClaim and StorageClasse.

NVIDIA has been a lead contributor to the KEP and has already an initial implementation of a DRA driver and plugin, with a nice demo recording. NVIDIA is expecting to have this DRA driver available in CY23 Q3 or Q4, so likely in NVIDIA GPU Operator v23.9, around OpenShift 4.14.

When asked about the availability of MIG-backed vGPU for Kubernetes, NVIDIA said that the timeframe is not decided yet, because it will likely use DRA for the MIG devices creation and their registration with the vGPU host driver. The MIG-base vGPU feature for OpenShift Virtualization will then likely require support of DRA to request vGPU resources for the VMs.

Not having MIG-backed vGPU is a risk for OpenShift Virtualization adoption in GPU use cases, such as virtual workstations for rendering with Windows-only softwares. Customers who want to have a mix of passthrough, time-based vGPU and MIG-backed vGPU will prefer competitors who offer the full range of options. And the certification of NVIDIA solutions like NVIDIA Omniverse will be blocked, despite a great potential to increase the OpenShift consumption, as it uses RTX/A40 GPU for virtual workstations (not certified by NVIDIA on OpenShift Virtualization yet) and A100/H100 for physics simulation, both use cases probably leveraring vGPUs [7]. There's a lot of necessary conditions for that to happen and MIG-backed vGPU support is one of them.

User Stories

  • GPU consumption optimization
    "As an Admin, I want to let NVIDIA GPU DRA driver provision vGPUs for OpenShift Virtualization, so that it optimizes the allocation with dynamic provisioning of time or MIG backed vGPUs"
  • GPU mixed types per server
    "As an Admin, I want to be able to mix different types of GPU to collocate different types of workloads on the same host, in order to improve multi-pod/stack performance.

Non-Requirements

  • List of things not included in this epic, to alleviate any doubt raised during the grooming process.

Notes

  • Any additional details or decisions made/needed

References

Done Checklist

Who What Reference
DEV Upstream roadmap issue (or individual upstream PRs) <link to GitHub Issue>
DEV Upstream documentation merged <link to meaningful PR>
DEV gap doc updated <name sheet and cell>
DEV Upgrade consideration <link to upgrade-related test or design doc>
DEV CEE/PX summary presentation label epic with cee-training and add a <link to your support-facing preso>
QE Test plans in Polarion <link or reference to Polarion>
QE Automated tests merged <link or reference to automated tests>
DOC Downstream documentation merged <link to meaningful PR>

Epic Goal

  • Enable Image Registry to use Azure Blob Storage from AzureStackCloud

Why is this important?

  • While certifying Azure Stack Hub as OCP provider we need to ensure all the required components for UPI/IPI deployments are ready to be used

Scenarios

  1. Create an OCP cluster is Azure Stack Hub and use Internal Registry with Azure Blob Storage from AzureStackCloud

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Story: As an OpenShift admin I want the internal registry of the cluster use storage from Azure Stack Hub so that I can run a fully supported OpenShift environment on that infrastructure provider.

The details of this Jira Card are restricted (Only Red Hat employees and contractors)
The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Feature Overview

We drive OpenShift cross-market customer success and new customer adoption with constant improvements and feature additions to the existing capabilities of our OpenShift Core Networking (SDN and Network Edge). This feature captures that natural progression of the product.

Goals

  • Feature enhancements (performance, scale, configuration, UX, ...)
  • Modernization (incorporation and productization of new technologies)

Requirements

  • Core Networking Stability
  • Core Networking Performance and Scale
  • Core Neworking Extensibility (Multus CNIs)
  • Core Networking UX (Observability)
  • Core Networking Security and Compliance

In Scope

  • Network Edge (ingress, DNS, LB)
  • SDN (CNI plugins, openshift-sdn, OVN, network policy, egressIP, egress Router, ...)
  • Networking Observability

Out of Scope

There are definitely grey areas, but in general:

  • CNV
  • Service Mesh
  • CNF

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?
  • New Content, Updates to existing content, Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

Feature Overview

Plugin teams need a mechanism to extend the OCP console that is decoupled enough so they can deliver at the cadence of their projects and not be forced in to the OCP Console release timelines.

The OCP Console Dynamic Plugin Framework will enable all our plugin teams to do the following:

  • Extend the Console
  • Deliver UI code with their Operator
  • Work in their own git Repo
  • Deliver at their own cadence

Goals

    • Operators can deliver console plugins separate from the console image and update plugins when the operator updates.
    • The dynamic plugin API is similar to the static plugin API to ease migration.
    • Plugins can use shared console components such as list and details page components.
    • Shared components from core will be part of a well-defined plugin API.
    • Plugins can use Patternfly 4 components.
    • Cluster admins control what plugins are enabled.
    • Misbehaving plugins should not break console.
    • Existing static plugins are not affected and will continue to work as expected.

Out of Scope

    • Initially we don't plan to make this a public API. The target use is for Red Hat operators. We might reevaluate later when dynamic plugins are more mature.
    • We can't avoid breaking changes in console dependencies such as Patternfly even if we don't break the console plugin API itself. We'll need a way for plugins to declare compatibility.
    • Plugins won't be sandboxed. They will have full JavaScript access to the DOM and network. Plugins won't be enabled by default, however. A cluster admin will need to enable the plugin.
    • This proposal does not cover allowing plugins to contribute backend console endpoints.

 

Requirements

 

Requirement Notes isMvp?
 UI to enable and disable plugins    YES 
 Dynamic Plugin Framework in place    YES 
Testing Infra up and running   YES 
 Docs and read me for creating and testing Plugins    YES 
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 
 Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

We need to support localization of dynamic plugins. The current proposal is to have one i18n namespace per dynamic plugin with a fixed name: `${plugin-name}-plugin`. Since console will know the list of plugins on startup, it can add these namespaces to the i18next config.

The console backend will need to implement an endpoint at the i18next load path. The endpoint will see if the namespace matches the known plugin namespaces. If so, it will proxy to the plugin. Otherwise it will serve the static file from the local filesystem.

The dynamic plugins enhancement describes a `disable-plugins` query parameter for disabling specific console plugins.

  • ?disable-plugins or ?disable-plugins= prevents loading of any dynamic plugins (disable all)
  • ?disable-plugins=foo,bar prevents loading of dynamic plugins named foo or bar (disable selectively)

This has no effect on static plugins, which are built into the Console application.

https://github.com/openshift/enhancements/blob/master/enhancements/console/dynamic-plugins.md#error-handling

We need a UI for enabling and disabling dynamic plugins. The plugins will be discovered either through a custom resource or an annotation on the operator CSV. The enabled plugins will be persisted through the operator config (consoles.operator.openshift.io).

This story tracks enabling and disabling the plugin during operator install through Cluster Settings. This is needed in the future if a plugin is installed outside of an OLM operator.

UX design: https://github.com/openshift/openshift-origin-design/pull/536 

Feature Overview

  • This Section:* High-Level description of the feature ie: Executive Summary
  • Note: A Feature is a capability or a well defined set of functionality that delivers business value. Features can include additions or changes to existing functionality. Features can easily span multiple teams, and multiple releases.

 

Goals

  • This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

 

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

 

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

(Optional) Use Cases

This Section: 

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

 

Questions to answer…

  • ...

 

Out of Scope

 

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

 

Assumptions

  • ...

 

Customer Considerations

  • ...

 

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Goal
By default the Cluster Utilization card should not include metrics from `master` nodes in its queries for CPU, Memory, Filesystem, Network, and Pod count.

A new filter option should allow users to toggle between a combined view of what is seen on the Cluster Utilization card today, which is mostly useful on small clusters where masters are schedulable for user workloads.

Assets

  • Marvel with two scenarios:
    • Windows nodes exist
    • Windows nodes do not exist

Background

As discussed in this thread, the`kube_node_role` metric available since 4.3 should allow us to filter the card's PromQL queries to not include master node metrics.

This filtered view would likely make the card's data more useful for users who aren't running their workloads on masters, like OpenShift Dedicated users.

As noted by some folks during design discussions, this filter isn't perfect, and wouldn't filter out the data from "Infra" nodes that users may have set up using labels/taints. Until we determine a good way to provide more advanced filtering, this basic "Include masters" checkbox is still more flexible than what the card offers today.

Requirements

  • When windows nodes exist in the cluster:
    • Node type filter will be added to the Cluster Utilization card that lists all node types available
    • It will be pre-filtered to only show Worker nodes
    • The filter will be single select and will display the selected item in the toggle.
  • When windows nodes do not exist in the cluster:
    • Node type filter will be added to the Cluster Utilization card that lists node types available, plus an "all types" item.
    • It will be pre-filtered to only show Worker nodes
    • The filter will be multi-select
    • The badge in the toggle will update as more items are selected
    • If the "all nodes" is selected, the other items will automatically become deselected, and the badge will update to "All".

As a admin, I want to be able to access the node logs from the nodes detail page in order to troubleshoot what is going on with the node.

We should support getting node logs for different units for node journal logs and evaluate the other CLI flags.

We currently have a gap with the CLI:

  •   oc adm node-logs [-l LABELS] [NODE...] [flags]

We need to investigate whether the k8s API supports WebSockets for streaming node logs.

Goal

Currently we are showing system projects within the list view of the Projects page. As stated here https://issues.redhat.com/browse/RFE-185, there are many projects that are considered as system projects that are not important to the user. The value should be remember across sessions, but it something we should be able to toggle directly from the list.

Design assets

Design doc

Marvel

Requirements

  • The user should be able to hide/show system projects within the project list page (and namespace list page)
  • The user should be able to hide/show system projects from the project selector
  • The same capability should work from the project list page in the developer perspective

In OpenShift, reserved namespaces are `default`, `openshift`, and those that start with `openshift-`, `kubernetes-`, or `kube-`.

Edge case scenarios

  • If the user filters out system projects from the projects or namespaces list view, then filters and there are no results, an empty state will be surfaced with ability to clear filters. (see design assets)
  • If the user has hidden system projects from the project selector and has favorited or defaulted system projects in the project selector, those favorited or defaulted system projects will NOT appear in the project selector list. (see design assets)
  • If the user has hidden system projects from the project selector, then navigates to some resource page where a system project is selected, the system project name will still appear in the project selector toggle but not within the list of projects in the selector. (see design assets)
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

When OCP is performing cluster upgrade user should be notified about this fact.

There are two possibilities how to surface the cluster upgrade to the users:

  • Display a console notification throughout OCP web UI saying that the cluster is currently under upgrade.
  • Global notification throughout OCP web UI saying that the cluster is currently under upgrade.
  • Have an alert firing for all the users of OCP stating the cluster is undergoing an upgrade. 

 

AC:

  • Console-operator will create a ConsoleNotification CR when the cluster is being upgraded. Once the upgrade is done console-operator will remote that CR. These are the three statuses based on which we are determining if the cluster is being upgraded.
  • Add unit tests

 

Note: We need to decide if we want to distinguish this particular notification by a different color? ccing Ali Mobrem 

 

Created from: https://issues.redhat.com/browse/RFE-3024

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

During master nodes upgrade when nodes are getting drained there's currently no protection from two or more operands going down. If your component is required to be available during upgrade or other voluntary disruptions, please consider deploying PDB to protect your operands.

The effort is tracked in https://issues.redhat.com/browse/WRKLDS-293.

Example:

 

Acceptance Criteria:
1. Create PDB controller in console-operator for both console and downloads pods
2. Add e2e tests for PDB in single node and multi node cluster

 

Note: We should consider to backport this to 4.10

Epic Goal*

Provide a long term solution to SELinux context labeling in OCP.

 
Why is this important? (mandatory)

As of today when selinux is enabled, the PV's files are relabeled when attaching the PV to the pod, this can cause timeout when the PVs contains lot of files as well as overloading the storage backend.

https://access.redhat.com/solutions/6221251 provides few workarounds until the proper fix is implemented. Unfortunately these workaround are not perfect and we need a long term seamless optimised solution.

This feature tracks the long term solution where the PV FS will be mounted with the right selinux context thus avoiding to relabel every file.

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1. Apply new context when there is none
  2. Change context of all files/folders when changing context
  3. RWO & RWX PVs
    1. ReadWriteOncePod PVs first
    2. RWX PV in a second phase

As we are relying on mount context there should not be any relabeling (chcon) because all files / folders will inherit the context from the mount context

More on design & scenarios in the KEP  and related epic STOR-1173

Dependencies (internal and external) (mandatory)

None for the core feature

However the driver will have to set SELinuxMountSupported to true in the CSIDriverSpec to enable this feature. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - STOR
  • Documentation - STOR
  • QE - STOR
  • PX - 
  • Others -

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

This Epic is to track upstream work in the Storage SIG community

This Epic is to track the SELinux specific work required. fsGroup work is not included here.

Goal: 

Continue contributing to and help move along the upstream efforts to enable recursive permissions functionality.

Finish current SELinuxMountReadWriteOncePod feature upstream:

  • Implement it in all volume plugins (current alpha has just iSCSI and CSI
  • Add e2e test + fixing all tests that don't work well with SELinux
  • Implement necessary changes in volume reconstruction to reconstruct also SELinux context.

The feature is probably going to stay alpha upstream.

Problem: 

Recursive permission change takes very long for fsGroup and SELinux. For volumes with many small files Kubernetes currently does a chown for every file on the volume (due to fsGroup). Similarly for container runtimes (such as CRI-O) a chcon of every file on the volume is performed due to SCC's SELinux context. Data on the volume may already have the correct GID/SELinux context so Kubernetes needs way to detect this automatically to avoid the long delay.

Why is this important: 

  • A user wants to bring their pod online quickly and efficiently.  

Dependencies (internal and external):

 

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL):

 

Previous Work:

Customers:

Open questions:

  •  

Notes:

As OCP developer (and as OCP user in the future), I want all CSI drivers shipped as part of OCP to support mounting with -o context=XYZ, so I can test with CSIDriver.SELinuxMount: true (or my pods are running without CRI-O recursively relabeling my volume).

 

In detail:

  • For CSI drivers based on block devices, pass host's /etc/selinux and /sys/fs/ to the CSI drvier container on the node as HostPath volumes
  • For CSI drivers based on NFS / CIFS: do the same as for block volumes (it won't harm the driver in any way), but investigate if these drivers can actually run with CSIDriver.SELinuxMount: true.

Details: https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling#selinux-support-in-volumes

 

Exit criteria:

  • Verify that CSI drivers shipped by OCP based on block volumes mount volumes with -o context=xyz instead of relabeling the volumes by CRI-O. That should happen when all these conditions are satisfied:
    • SELinuxMountReadWriteOncePod and ReadWriteOncePod feature gates are enabled
    • CSIDriver.SELinuxMount is set to true manually for the CSI driver. OCP will not do it by default in 4.13, because it requires the alpha feature gates from the previous bullet.
    • PVC has AccessMode: [ReadWriteOncePod] 
    • Pod has SELinux context explicitly assigned, i.e. pod.spec.securityContext (or pod.spec.containers[*].securityContext) has seLinuxOptions set, incl. {{level }}(based on SCC, OCP might do it automatically)
  • This is alpha / dev preview feature, so QE might done when graduating to Beta / tech preview.

Epic Goal

As an OpenShift on vSphere administrator, I want to specify static IP assignments to my VMs.

As an OpenShift on vSphere administrator, I want to completely avoid using a DHCP server for the VMs of my OpenShift cluster.

Why is this important?

Customers want the convenience of IPI deployments for vSphere without having to use DHCP. As in bare metal, where METAL-1 added this capability, some of the reasons are the security implications of DHCP (customers report that for example depending on configuration they allow any device to get in the network). At the same time IPI deployments only require to our OpenShift installation software, while with UPI they would need automation software that in secure environments they would have to certify along with OpenShift.

Acceptance Criteria

  • I can specify static IPs for node VMs at install time with IPI

Previous Work

Bare metal related work:

CoreOS Afterburn:

https://github.com/coreos/afterburn/blob/main/src/providers/vmware/amd64.rs#L28

https://github.com/openshift/installer/blob/master/upi/vsphere/vm/main.tf#L34

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Epic Goal

As an OpenShift on vSphere administrator, I want to specify static IP assignments to my VMs.

As an OpenShift on vSphere administrator, I want to completely avoid using a DHCP server for the VMs of my OpenShift cluster.

Why is this important?

Customers want the convenience of IPI deployments for vSphere without having to use DHCP. As in bare metal, where METAL-1 added this capability, some of the reasons are the security implications of DHCP (customers report that for example depending on configuration they allow any device to get in the network). At the same time IPI deployments only require to our OpenShift installation software, while with UPI they would need automation software that in secure environments they would have to certify along with OpenShift.

Acceptance Criteria

  • I can specify static IPs for node VMs at install time with IPI

Previous Work

Bare metal related work:

CoreOS Afterburn:

https://github.com/coreos/afterburn/blob/main/src/providers/vmware/amd64.rs#L28

https://github.com/openshift/installer/blob/master/upi/vsphere/vm/main.tf#L34

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview

Overarching Goal
Move to using the upstream Cluster API (CAPI) in place of the current implementation of the Machine API for standalone Openshift

prerequisite work Goals
Complete the design of the Cluster API (CAPI) architecture and build the core operator logic needed for Phase-1, incorporating the assets from different repositories to simplify asset management.

Phase 1 & 2 covers implementing base functionality for CAPI.

Background, and strategic fit

  • Initially CAPI did not meet the requirements for cluster/machine management that OCP had the project has moved on, and CAPI is a better fit now and also has better community involvement.
  • CAPI has much better community interaction than MAPI.
  • Other projects are considering using CAPI and it would be cleaner to have one solution
  • Long term it will allow us to add new features more easily in one place vs. doing this in multiple places.

Acceptance Criteria

There must be no negative effect to customers/users of the MAPI, this API must continue to be accessible to them though how it is implemented "under the covers" and if that implementation leverages CAPI is open

Epic Goal

  • Rework the current flow for the installation of Cluster API components in OpenShift by addressing some of the criticalities of the current implementation

Why is this important?

  • We need to reduce complexity of the CAPI install system architecture
  • We need to improve the development, stability and maintainability of Standalone Cluster API on OpenShift
  • We need to make Cluster 

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  •  

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1.  

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story

As an OpenShift engineer I want to be able to install the new manifest generation tool as a standalone tool in my CAPI Infra Provider repo to generate the CAPI Provider transport ConfigMap(s)

Background

Renaming of the CAPI Asset/Manifest generator from assets (generator) to manifest-gen, as it won't need to generate go embeddable assets anymore, but only manifests that will be referenced and applied by CVO

Steps

  • Removal of the `/assets` folder - we are moving away from embedded assets in favour of transport ConfigMaps
  • Renaming of the CAPI Asset/Manifest generator from assets (generator) to manifest-gen, as it won't need to generate go embeddable assets anymore, but only manifests that will be referenced and applied by CVO
  • Removal of the cluster-api-operator specific code from the assets generator - we are moving away from using the cluster-api-operator
  • Remove the assets generator specific references from the Makefiles/hack scripts - they won't be needed anymore as the tool will be referenced only from other repositories 
  • Adapting the new generator tool to be a standalone go module that can be installed as a tool in other repositories to generate manifests
  • Make sure to add CRDs and Conversion,Validation (also Mutation?) Webhooks to the generated transport ConfigMaps

Stakeholders

  • Cluster Infrastructure Team
  • ShiftStack Team (CAPO)

Definition of Done

  • Working and standalone installable generation tool

User Story

As an OpenShift engineer I want the CAPI Providers repositories to use the new generator tool so that they can independently generate CAPI Provider transport ConfigMaps

Background

Once the new CAPI manifests generator tool is ready, we want to make use of that directly from the CAPI Providers repositories so we can avoid storing the generated configuration centrally and independently apply that based on the running platform.

Steps

  • Install new CAPI manifest generator as a go `tool` to all the CAPI provider repositories
  • Setup a make target under the `/openshift/Makefile` to invoke the generator. Make it output the manifests under `/openshift/manifests`
  • Make sure `/openshift/manifests` is mapped to `/manifests` in the openshift/Dockerfile, so that the files are later picked up by CVO
  • Make sure the manifest generation works by triggering a manual generation
  • Check in the newly generated transport ConfigMap + Credential Requests (to let them be applied by CVO)

Stakeholders

  • <Who is interested in this/where did they request this>

Definition of Done

  • CAPI manifest generator tool is installed 
  • Docs
  • <Add docs requirements for this card>
  • Testing
  • <Explain testing that will be added>

Feature Overview (aka. Goal Summary)  

The Agent Based installer is a clean and simple way to install new instances of OpenShift in disconnected environments, guiding the user through the questions and information needed to successfully install an OpenShift cluster. We need to bring this highly useful feature to the IBM Power and IBM zSystem architectures

 

Goals (aka. expected user outcomes)

Agent based installer on Power and zSystems should reflect what is available for x86 today.

 

Requirements (aka. Acceptance Criteria):

Able to use the agent based installer to create OpenShift clusters on Power and zSystem architectures in disconnected environments

 

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

 

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Epic Goal

  • The goal of this Epic is to enable Agent Based Installer for P/Z

Why is this important?

  • The Agent Based installer is a research Spike item for the Multi-Arch team during the 4.12 release and later

Scenarios
1. …

Acceptance Criteria

  • See "Definition of Done" below

Dependencies (internal and external)
1. …

Previous Work (Optional):
1. …

Open questions::
1. …

Done Checklist

  • CI - For new features (non-enablement), existing Multi-Arch CI jobs are not broken by the Epic
  • Release Enablement: <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR orf GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - If the Epic is adding a new stream, downstream build attached to advisory: <link to errata>
  • QE - Test plans in Test Plan tracking software (e.g. Polarion, RQM, etc.): <link or reference to the Test Plan>
  • QE - Automated tests merged: <link or reference to automated tests>
  • QE - QE to verify documentation when testing
  • DOC - Downstream documentation merged: <link to meaningful PR>
  • All the stories, tasks, sub-tasks and bugs that belong to this epic need to have been completed and indicated by a status of 'Done'.

User Story

As a managed application services developer, I want to install addons, use syncsets, scale nodes and query ingresses, so that I offer Red Hat OpenShift Streams on Azure.

Acceptance Criteria

  • Create/Delete ARO clusters through api.openshift.com
  • Install OCM addons on ARO clusters through api.openshift.com
  • Create/Update/Delete SyncSets on ARO clusters through api.openshift.com
  • Scale compute nodes on ARO clusters through api.openshift.com
  • Query the cluster DNS through api.openshift.com

Default Done Criteria

  • All existing/affected SOPs have been updated.
  • New SOPs have been written.
  • Internal training has been developed and delivered.
  • The feature has both unit and end to end tests passing in all test
    pipelines and through upgrades.
  • If the feature requires QE involvement, QE has signed off.
  • The feature exposes metrics necessary to manage it (VALET/RED).
  • The feature has had a security review.* Contract impact assessment.
  • Service Definition is updated if needed.* Documentation is complete.
  • Product Manager signed off on staging/beta implementation.

Dates

Integration Testing:
Beta:
GA:

Current Status

GREEN | YELLOW | RED
GREEN = On track, minimal risk to target date.
YELLOW = Moderate risk to target date.
RED = High risk to target date, or blocked and need to highlight potential
risk to stakeholders.

References

Links to Gdocs, github, and any other relevant information about this epic.

User Story:

As an ARO customer, I want to be able to:

  • use first-party service principals to authenticate

so that I can

  • use first party resource providers for provisioning

Acceptance Criteria:

Description of criteria:

  • Installer SDKs can auth with the 1st service principal
  • Terraform can auth with the 1st service principal
  • "local" testing of this functionality (we need to setup the ability to try this out)

(optional) Out of Scope:

The installer will not accept a separate service principal to pass to the cluster as described in HIVE-1794. Instead Hive will write the separate cred into the manifests.

Engineering Details:

Feature Overview (aka. Goal Summary)  

Due to low customer interest of using Openshift on Alibaba cloud we have decided to deprecate then remove the IPI support for ALibaba Cloud 

https://docs.google.com/document/d/1Kp-GrdSHqsymzezLCm0bKrCI71alup00S48QeWFa0q8/edit#heading=h.v75efohim75y 

Goals (aka. expected user outcomes)

4.14

Announcement 

  1. Update cloud.redhat.com with deprecation information 
  2. Update IPI installer code with warning
  3. Update release node with deprecation information
  4. Update Openshift Doc with deprecation information

4.15

Archive code 

 

Add a warning of depreciation in installer code for anyone trying to install Alibaba via IPI

{}USER STORY:{}

As an user of the installer binary, I want to be warned that Alibaba support will be deprecated in 4.15, so that I'm prevented from creating clusters that will soon be unsupported.

{}DESCRIPTION:{}

Alibaba support will be decommissioned from both IPI and UPI starting in 4.15. We want to warn users of the 4.14 installer binary picking 'alibabacloud' in the list of providers.

{}ACCEPTANCE CRITERIA:{}

Warning message is displayed after choosing 'alibabacloud'.

{}ENGINEERING DETAILS:{}

https://docs.google.com/document/d/1Kp-GrdSHqsymzezLCm0bKrCI71alup00S48QeWFa0q8/edit?usp=sharing_eip_m&ts=647df877

 

Feature Overview (aka. Goal Summary)  

The storage operators need to be automatically restarted after the certificates are renewed.

From OCP doc "The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left."

Since OCP is now offering an 18 months lifecycle per release, the storage operator pods need to be automatically restarted after the certificates are renewed.

Goals (aka. expected user outcomes)

The storage operators will be transparently restarted. The customer benefit should be transparent, it avoids manually restart of the storage operators.

 

Requirements (aka. Acceptance Criteria):

The administrator should not need to restart the storage operator when certificates are renew.

This should apply to all relevant operators with a consistent experience.

 

Use Cases (Optional):

As an administrator I want the storage operators to be automatically restarted when certificates are renewed.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

This feature request is triggered by the new extended OCP lifecycle. We are moving from 12 to 18 months support per release.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

No doc is required

 

Interoperability Considerations

This feature only cover storage but the same behavior should be applied to every relevant  components. 

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

The pod `openstack-manila-csi-controllerplugin` mounts the secret:

$ cat assets/controller.yaml
...
      containers:
        - name: provisioner-kube-rbac-proxy

          volumeMounts:
          - mountPath: /etc/tls/private
            name: metrics-serving-cert

      volumes:
        - name: metrics-serving-cert
          secret:
            secretName: manila-csi-driver-controller-metrics-serving-cert

Hence, if the secret is updated (e.g. as a result of CA cert update), the Pod must be restarted

Epic Goal

As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers

Why is this important?

Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy. 

Scenarios

  1. A large deployment routed across multiple failure domains without stretched L2 networks, would require to dynamically route the control plane VIP traffic through load-balancers capable of living in multiple L2.
  2. Customers who want to use their existing LB appliances for the control plane.

Acceptance Criteria

  • Should we require the support of migration from internal to external LB?
  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • QE - must be testing a scenario where we disable the internal LB and setup an external LB and OCP deployment is running fine.
  • Documentation - we need to document all the gotchas regarding this type of deployment, even the specifics about the load-balancer itself (routing policy, dynamic routing, etc)

Dependencies (internal and external)

  1. Fixed IPs would be very interesting to support, already WIP by vsphere (need to Spike on this): https://issues.redhat.com/browse/OCPBU-179
  2. Confirm with customers that they are ok with external LB or they prefer a new internal LB that supports BGP

Previous Work:

vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Epic Goal

As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers

Why is this important?

Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy. 

Scenarios

  1. A large deployment routed across multiple failure domains without stretched L2 networks, would require to dynamically route the control plane VIP traffic through load-balancers capable of living in multiple L2.
  2. Customers who want to use their existing LB appliances for the control plane.

Acceptance Criteria

  • Should we require the support of migration from internal to external LB?
  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • QE - must be testing a scenario where we disable the internal LB and setup an external LB and OCP deployment is running fine.
  • Documentation - we need to document all the gotchas regarding this type of deployment, even the specifics about the load-balancer itself (routing policy, dynamic routing, etc)

Dependencies (internal and external)

  1. Fixed IPs would be very interesting to support, already WIP by vsphere (need to Spike on this): https://issues.redhat.com/browse/OCPBU-179
  2. Confirm with customers that they are ok with external LB or they prefer a new internal LB that supports BGP

Previous Work:

vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

 

Notes: https://github.com/EmilienM/ansible-role-routed-lb is an example of a LB that will be used for CI, can be used by QE and customers.

Epic Goal

As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers

Why is this important?

Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy. 

Scenarios

  1. A large deployment routed across multiple failure domains without stretched L2 networks, would require to dynamically route the control plane VIP traffic through load-balancers capable of living in multiple L2.
  2. Customers who want to use their existing LB appliances for the control plane.

Acceptance Criteria

  • Should we require the support of migration from internal to external LB?
  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • QE - must be testing a scenario where we disable the internal LB and setup an external LB and OCP deployment is running fine.
  • Documentation - we need to document all the gotchas regarding this type of deployment, even the specifics about the load-balancer itself (routing policy, dynamic routing, etc)

Dependencies (internal and external)

  1. Fixed IPs would be very interesting to support, already WIP by vsphere (need to Spike on this): https://issues.redhat.com/browse/OCPBU-179
  2. Confirm with customers that they are ok with external LB or they prefer a new internal LB that supports BGP

Previous Work:

vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview

Console enhancements based on customer RFEs that improve customer user experience.

 

Goals

  • This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

 

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

 

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

(Optional) Use Cases

This Section: 

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

 

Questions to answer…

  • ...

 

Out of Scope

 

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

 

Assumptions

  • ...

 

Customer Considerations

  • ...

 

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?