Back to index

4.7.0-0.ci-2023-09-17-014956

Jump to: Complete Features | Incomplete Features | Complete Epics | Incomplete Epics | Other Complete | Other Incomplete |

Changes from 4.6.62

Note: this page shows the Feature-Based Change Log for a release

Complete Features

These features were completed when this image was assembled

Currently the Get started with on-premise host inventory quickstart gets delivered in the Core console. If we are going to keep it here we need to add the MCE or ACM operator as a prerequisite, otherwise it's very confusing.

Goal:

As an administrator, I would like to deploy OpenShift 4 clusters to AWS C2S region

Problem:

Customers were able to deploy to AWS C2S region in OCP 3.11, but our global configuration in OCP 4.1 doesn't support this.
  

Why is this important:

  • Many of our public sector customers would like to move off 3.11 and on to 4.1, but missing support for AWS C2S region will prevent them from being able to migrate their environments..

Lifecycle Information:

  • Core

Previous Work:**

Here are the relevant PRs from OCP 3.11.  You can see that these endpoints are not part of the standard SDK (they use an entirely separate SDK).  To support these regions the endpoints had to be configured explicitly.

Seth Jennings has put together a highly customized POC.

Dependencies:

  • Custom API endpoint support w/ CA
    • Cloud / Machine API
    • Image Registry
    • Ingress
    • Kube Controller Manager
    • Cloud Credential Operator
    • others?
  • Require access to local/private/hidden AWS environment

 

Prioritized epics + deliverables (in scope / not in scope):

  • Allow AWS C2S region to be specified for OpenShift cluster deployment
  • Enable customers to use their own managed internal/cluster DNS solutions due to provider and operational restrictions
  • Document deploying OpenShift to AWS C2S region
  • Enable CI for the AWS C2S region

Related : https://jira.coreos.com/browse/CORS-1271

Estimate (XS, S, M, L, XL, XXL): L

 

Customers: North America Public Sector and Government Agencies

Open Questions:

 

 

 

Epic Goal

  • Support running console in single-node OpenShift configurations for production use in edge computing use cases.
  • Support disabling the console entirely in some of these configurations to reduce overhead in constrained environments.

Why is this important?

  • Some bare metal edge customers, especially in the telco market, want to use kubernetes at physically remote sites with minimal hardware.

Scenarios

  1. As a user, I want to deploy a fully supported instance of OpenShift on a single node.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • console can be deployed with a single replica

Dependencies (internal and external)

  1. CORS-1589

Previous Work (Optional):

  1. https://github.com/openshift/enhancements/pull/504
  2. https://github.com/openshift/enhancements/pull/560

Open questions::

  1. Should the console configuration API have a separate option for this setting, or should it use the API created from CORS-1589?

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

https://github.com/openshift/enhancements/pull/555
https://github.com/openshift/api/pull/827

The console operator will need to support single-node clusters.

We have a console deployment and downloads deployment. Each will to be updated so that there's only a single replica when high availability mode is disabled in the Infrastructure config. We should also remove the anti-affinity rule in the console deployment that tries to spread console pods across nodes.

The downloads deployment is currently a static manifest. That likely needs to be created by the console operator instead going forward.

Acceptance Criteria:

  • Console operator deploys console with 1 replica and no anti-affinity rules when not in high availability mode
  • Console operator deploys the downloads deployment with 1 replica when not in high availability mode
  • The console and downloads deployments do not change when in high availability mode
  • The feature is well-covered by tests

OCP/Telco Definition of Done
Feature Template descriptions and documentation.

Feature Overview.

Early customer feedback is that they see SNO as a great solution covering smaller footprint deployment, but are wondering what is the evolution story OpenShift is going to provide where more capacity or high availability are needed in the future.

While migration tooling (moving workload/config to new cluster) could be a mid-term solution, customer desire is not to include extra hardware to be involved in this process.

 For Telecommunications Providers, at the Far Edge they intend to start small and then grow. Many of these operators will start with a SNO-based DU deployment as an initial investment, but as DUs evolve, different segments of the radio spectrum are added, various radio hardware is provisioned and features delivered to the Far Edge, the Telecommunication Providers desire the ability for their Far Edge deployments to scale up from 1 node to 2 nodes to n nodes. On the opposite side of the spectrum from SNO is MMIMO where there is a robust cluster and workloads use HPA.

Goals

  • Provide the capability to expand a single replica control plane topology to host more workloads capacity - add worker
  • Provide the capability to expand a single replica control plane to be a highly available control plane
  • To satisfy MMIMO Telecommunications providers will want the ability to scale a SNO to a multi-node cluster that can support HPA.
  • Telecommunications providers do not want workload (DU specifically) downtime when migrating from SNO to a multi-node cluster.
  • Telecommunications providers wish to be able to scale from one to two or more nodes to support a variety of radio hardware.
  • Support CP scaling (CP HA) for 2 node cluster, 3 node cluster and n node cluster. As the number of nodes in the cluster increases so does the failure domain of the cluster. The cluster is now supporting more cell sectors and therefore has more of a need for HA and resiliency including the cluster CP.

Requirements

  • TBD
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

(Optional) Use Cases

This Section:

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

Questions to answer…

  • ...

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

  • ...

Customer Considerations

  • ...

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?
  • New Content, Updates to existing content, Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic Goal

  • Documented and supported flow for adding 1, 2, 3 or more workers to a Single Node OpenShift (SNO) deployment without requiring cluster downtime and the understanding that this action will not make the cluster itself highly available.

Why is this important?

  • Telecommunications and Edge scenarios where HA is handled via failover to another site but single site capacity may vary or need to be expanded over time.
  • Similar scenarios exist for some ISV vendors where OpenShift is an implementation detail of how they deliver their solution on top of another platform (e.g. VMware).

Scenarios

  1. Adding a worker to a single node openshift cluster.
  2. Adding a second worker to a single node openshift cluster.
  3. Adding a third worker to a single node openshift cluster.
  4. Removing a worker node from a single node openshift cluster that has had 1 or more workers added.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • Customer facing documentation of the add worker flow for SNO.

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

  1. Presumably there is a scale limit on how many workers could be added to an SNO control plane, and it is lower than the limit for a "normal" 3 node control plane. It is not anticipated that this limit will be established in this epic. Intent is to focus on small scale sites where adding 1-3 worker nodes would be beneficial.

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Epic Goal

  • Make it possible to disable the console operator at install time, while still having a supported+upgradeable cluster.

Why is this important?

  • It's possible to disable console itself using spec.managementState in the console operator config. There is no way to remove the console operator, though. For clusters where an admin wants to completely remove console, we should give the option to disable the console operator as well.

Scenarios

  1. I'm an administrator who wants to minimize my OpenShift cluster footprint and who does not want the console installed on my cluster

Acceptance Criteria

  • It is possible at install time to opt-out of having the console operator installed. Once the cluster comes up, the console operator is not running.

Dependencies (internal and external)

  1. Composable cluster installation

Previous Work (Optional):

  1. https://docs.google.com/document/d/1srswUYYHIbKT5PAC5ZuVos9T2rBnf7k0F1WV2zKUTrA/edit#heading=h.mduog8qznwz
  2. https://docs.google.com/presentation/d/1U2zYAyrNGBooGBuyQME8Xn905RvOPbVv3XFw3stddZw/edit#slide=id.g10555cc0639_0_7

Open questions::

  1. The console operator manages the downloads deployment as well. Do we disable the downloads deployment? Long term we want to move to CLI manager: https://github.com/openshift/enhancements/blob/6ae78842d4a87593c63274e02ac7a33cc7f296c3/enhancements/oc/cli-manager.md

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

In the console-operator repo we need to add `capability.openshift.io/console` annotation to all the manifests that the operator either contains creates on the fly.

 

Manifests are currently present in /bindata and /manifest directories.

 

Here is example of the insights-operator change.

Here is the overall enhancement doc.

 

We need to continue to maintain specific areas within storage, this is to capture that effort and track it across releases.

Goals

  • To allow OCP users and cluster admins to detect problems early and with as little interaction with Red Hat as possible.
  • When Red Hat is involved, make sure we have all the information we need from the customer, i.e. in metrics / telemetry / must-gather.
  • Reduce storage test flakiness so we can spot real bugs in our CI.

Requirements

Requirement Notes isMvp?
Telemetry   No
Certification   No
API metrics   No
     

Out of Scope

n/a

Background, and strategic fit
With the expected scale of our customer base, we want to keep load of customer tickets / BZs low

Assumptions

Customer Considerations

Documentation Considerations

  • Target audience: internal
  • Updated content: none at this time.

Notes

In progress:

  • CI flakes:
    • Configurable timeouts for e2e tests
      • Azure is slow and times out often
      • Cinder times out formatting volumes
      • AWS resize test times out

 

High prio:

  • Env. check tool for VMware - users often mis-configure permissions there and blame OpenShift. If we had a tool they could run, it might report better errors.
    • Should it be part of the installer?
    • Spike exists
  • Add / use cloud API call metrics
    • Helps customers to understand why things are slow
    • Helps build cop to understand a flake
      • With a post-install step that filters data from Prometheus that’s still running in the CI job.
    • Ideas:
      • Cloud is throttling X% of API calls longer than Y seconds
      • Attach / detach / provisioning / deletion / mount / unmount / resize takes longer than X seconds?
    • Capture metrics of operations that are stuck and won’t finish.
      • Sweep operation map from executioner???
      • Report operation metric into the highest bucket after the bucket threshold (i.e. if 10minutes is the last bucket, report an operation into this bucket after 10 minutes and don’t wait for its completion)?
      • Ask the monitoring team?
    • Include in CSI drivers too.
      • With alerts too

Unsorted

  • As the number of storage operators grows, it would be grafana board for storage operators
    • CSI driver metrics (from CSI sidecars + the driver itself  + its operator?)
    • CSI migration?
  • Get aggregated logs in cluster
    • They're rotated too soon
    • No logs from dead / restarted pods
    • No tools to combine logs from multiple pods (e.g. 3 controller managers)
  • What storage issues customers have? it was 22% of all issues.
    • Insufficient docs?
    • Probably garbage
  • Document basic storage troubleshooting for our supports
    • What logs are useful when, what log level to use
    • This has been discussed during the GSS weekly team meeting; however, it would be beneficial to have this documented.
  • Common vSphere errors, their debugging and fixing. 
  • Document sig-storage flake handling - not all failed [sig-storage] tests are ours

Epic Goal

  • Update all images that we ship with OpenShift to the latest upstream releases and libraries.
  • Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.

Why is this important?

  • We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

  • Kubernetes:
    • client-go
    • controller-runtime
  • OCP:
    • library-go
    • openshift/api
    • openshift/client-go
    • operator-sdk

Operators:

  • aws-ebs-csi-driver-operator 
  • aws-efs-csi-driver-operator
  • azure-disk-csi-driver-operator
  • azure-file-csi-driver-operator
  • openstack-cinder-csi-driver-operator
  • gcp-pd-csi-driver-operator
  • gcp-filestore-csi-driver-operator
  • manila-csi-driver-operator
  • ovirt-csi-driver-operator
  • vmware-vsphere-csi-driver-operator
  • alibaba-disk-csi-driver-operator
  • ibm-vpc-block-csi-driver-operator
  • csi-driver-shared-resource-operator

 

  • cluster-storage-operator
  • csi-snapshot-controller-operator
  • local-storage-operator
  • vsphere-problem-detector

Update all CSI sidecars to the latest upstream release.

  • external-attacher
  • external-provisioner
  • external-resizer
  • external-snapshotter
  • node-driver-registrar
  • livenessprobe

This includes update of VolumeSnapshot CRDs in https://github.com/openshift/cluster-csi-snapshot-controller-operator/tree/master/assets

Epic Goal

  • Enable the migration from a storage intree driver to a CSI based driver with minimal impact to the end user, applications and cluster
  • These migrations would include, but are not limited to:
    • CSI driver for AWS EBS
    • CSI driver for GCP
    • CSI driver for Azure (file and disk)
    • CSI driver for VMware vSphere

Why is this important?

  • OpenShift needs to maintain it's ability to enable PVCs and PVs of the main storage types
  • CSI Migration is getting close to GA, we need to have the feature fully tested and enabled in OpenShift
  • Upstream intree drivers are being deprecated to make way for the CSI drivers prior to intree driver removal

Scenarios

  1. User initiated move to from intree to CSI driver
  2. Upgrade initiated move from intree to CSI driver
  3. Upgrade from EUS to EUS

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

On new installations, we should make the StorageClass created by the CSI operator the default one. 

However, we shouldn't do that on an upgrade scenario. The main reason is that users might have set  a different quota on the CSI driver Storage Class.

Exit criteria:

  • New clusters get the CSI Storage Class as the default one.
  • Existing clusters don't get their default Storage Classes changed.

Feature Overview

OpenShift console supports new features and elevated experience for Operator Lifecycle Manager (OLM) Operators and Cluster Operators.

Goal:

OCP Console improves the controls and visibility for managing vendor-provided software in customers’ infrastructure and making these solutions available for customers' internal users.

 

To achieve this, 

  • Operator Lifecycle Manager (OLM) teams have been introducing new features aiming towards simplification and ease of use for both developers and cluster admins.
  • On the Cluster Operators side, the console iteratively improves the visibilities to the resources being associated with the Operators to improve the overall managing experience.

We want to make sure OLM’s and Cluster Operators' new features are exposed in the console so admin console users can benefit from them.

Benefits:

  • Cluster admin/Operator consumers:
    • Able to see, learn, and interact with OLM managed and/or Cluster Operators associated resources in openShift console.

Requirements

Requirement Notes isMvp?
OCP console supports the latest OLM APIs and features This is a requirement for ALL features. YES
OCP console improves visibility to Cluster Operators related resources and features. This is a requirement for ALL features. YES
     

 


(Optional) Use Cases
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Main success scenarios - high-level user stories
* Alternate flow/scenarios - high-level user stories
* ...

Questions to answer...
How will the user interact with this feature?
Which users will use this and when will they use it?
Is this feature used as part of the current user interface?

Out of Scope
<--- Remove this text when creating a Feature in Jira, only for reference --->
# List of non-requirements or things not included in this feature
# ...

Background, and strategic fit
<--- Remove this text when creating a Feature in Jira, only for reference --->
What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there assumptions being made regarding prerequisites and dependencies?
* Are there assumptions about hardware, software or people resources?
* ...

Customer Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there specific customer environments that need to be considered (such as working with existing h/w and software)?
...

Documentation Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?

Background:

OpenShift console allows users (cluster admins) to change the state of the “default hub sources” for OperatorHub on the cluster from “enabled” to “disabled” and vice versa through “Global Configuration → OperatorHub” view from the “Cluster Settings” view.

Starting from OpenShift 4.4, the console and OLM provides richer configurations for the ‘CatalogSource’ objects that enable users to create their curated sources for OperatorHub with custom “Display Name”, “URL of Image Registry”, and the “Polling Interval” for updating the custom OperatorHub source.

This epic is about reflecting/exposing the newer capabilities on the ‘OperatorHub’ (Cluster Config view) and the ‘CatalogSource’ list and details views.

Goals:

1. As an admin user of console, I'd like to:
easily disable/enable the predefined Operator sources for the OperatorHub

so that I can:
control the sources of the Operators my cluster users see on the OperatorHub view.

2. As an admin user of console, I'd like to:
easily understand how to add/edit/view/remove my custom Operator catalogSource for the OperatorHub

so that I can
easier managing (add/edit/view/remove) my custom sources for the OperatorHub

3. As an admin user of console, I'd like to:
easily see the configurations and status of my custom Operator catalogSource for the OperatorHub

so that I can
easier managing (review/edit) the configurations of my custom sources for the OperatorHub

Acceptance Criteria:

  1. Improve cluster config: ‘OperatorHub’ Detail view
    • Add toggles for “disabled/enabled” predefined Operators
      • "Red Hat Operators"
      • "Certified Operators"
      • "Community Operators"
      • "Marketplace Operators"
    • Add “help text” on details view to guide users:
      Change the state of the default hub sources for OperatorHub on the cluster from enabled to disabled and vice versa. Add and manage your curated sources for OperatorHub on the Sources tab with custom Display Name, URL of Image Registry, and the Polling Interval for updating your custom OperatorHub source.
      
    • Embedded a link to the "Sources" tab in the help text above:
      URL/k8s/cluster/config.openshift.io~v1~OperatorHub/cluster/sources
      
      • Easier access to create/manage custom ‘CatalogSource
  2. Improve ‘CatalogSource’ list view (on the “Source” tab) and details view
    • On ‘CatalogSourcelist view:
      • Show “Catalog Polling Interval” (if `spec.sourceType: grpc`)
      • Expose “Status” (status.connectionState.lastObservedState)
    • On ‘CatalogSourcedetails view:
      • Show & Edit “Catalog Polling Interval” (if `spec.sourceType: grpc`)
        --> A dropdown with options in ‘15m’, ‘30m’, ‘45m’, ‘60m’.
      • (Sync up with the list view) Expose newly introduced fields on the ‘CatalogSource’ object:
        • Expose “Status” (status.connectionState.lastObservedState)
        • Display Name (spec.displayName)
        • Publisher (spec.publisher)
        • Availability
        • Endpoint (spec.image)
        • Polling Interval
        • # of Operators
      • Add an “Operators” tab - show a list of “PackageManifests” (Operators) consists of this ‘CatalogSource

Current UI in OpenShift console:

See current console screenshots in the attachments for reference:

  1. Cluster config: ‘OperatorHub’ Detail view
  2. CatalogSource’ list view (on the “Source” tab)
  3. CatalogSource‘ Details view
  4. mocked screenshot - "Operators" tab on the ‘CatalogSource‘ Details view

The Sources tab list view now includes 2 new columns for the Catalog Sources:

  • Status and Registry Poll Interval

Action menu changes:

  • For default sources: Only the existing Disable and Edit CatalogSource actions are available since any other delete or edit will immediately be reverted by the cluster operator
  • For custom sources: The menus match the Actions menu in the Catalog Source

Improve ‘CatalogSource’ details view (on the “Details” tab)

  • On ‘CatalogSource’ details view:
    • Show & Edit “Catalog Polling Interval” (if `spec.sourceType: grpc`)
      --> dropdown with options in ‘15m’, ‘30m’, ‘45m’, ‘60m’.
    • (Sync up with the list view) Expose newly introduced fields on the ‘CatalogSource’ object:
      • Expose “Status” (status.connectionState.lastObservedState)
      • Display Name (spec.displayName)
      • Publisher (spec.publisher)
      • Availability
      • Endpoint (spec.image)
      • Polling Interval
      • # of Operators
    • Add an “Operators” tab - show a list of “PackageManifests” (Operators) consists of this ‘CatalogSource

Feature Overview
PF4 has introduced a bevy of new components to really enhance the user experience, specifically around list views. These new components make it easier than ever for customers to find the data they care about and to take action.

Goals

  • Migrate to PF4 Components
  • Improve Search Page
    • Saved Search
  • Improve List View
    • Bulk Actions
      • delete
      • add label
      • add annotation
    • Column Management
      • Show\Hide Columns
    • Favoriting
      • Bubble favorites to the top of the list view
    • Toolbar
      • Advance Filter

Requirements

Requirement Notes isMvp?
Improvements must be applied to all resource list view pages including the search page This is a requirement for ALL features. YES
Search page will be the only page that needs "Saved Searches" This is a requirement for ALL features. NO
All components updated must be PF4 supported This is a requirement for ALL features. YES

Questions to answer...
How will the user interact with this feature?
Which users will use this and when will they use it?
Is this feature used as part of the current user interface?

Out of Scope
<--- Remove this text when creating a Feature in Jira, only for reference --->
# List of non-requirements or things not included in this feature
# ...

Background, and strategic fit
<--- Remove this text when creating a Feature in Jira, only for reference --->
What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there assumptions being made regarding prerequisites and dependencies?
* Are there assumptions about hardware, software or people resources?
* ...

Customer Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there specific customer environments that need to be considered (such as working with existing h/w and software)?
...

Documentation Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

The inventory card needs a couple of visual tweaks:

  • the individual links should be 14px (not 16px) to be consistent with other cards
  • the order of the icon/# should be switched so that the number precedes the icon

Attaching the desired design from Michael

 Feature Overview

This will be phase 1 of Internationalization of the OpenShift Console.

 Phase 1 will include the following:

  1. UI based language Selector instead of using browser detection
  2. Externalize all hard coded strings in the client code including all OpenShift static plugins
    1. Admin Console
    2. Dev Console
    3. Serverless
    4. Pipelines
    5. CNV
    6. OCS
    7. CSO
  3. Localized Date\Time
  4. Setup all processes, infrastructure, and testing required
  5. We will start with support for Chinese and Japanese lang

Phase 1 will not include:

  1. Dynamically generated UI (Operator, OpenAPIV3Schema)
    1. Operators that surface informational messages may not have translations available
  2. Strings from non client code
    1. This may include items such as events surfaced from Kuberenetes, alerts, and error messages displayed to the user or in logs
  3. Localization of logging messages at any level is not in scope
  4. Any CLI
  5. Language support for left to right languages ie Arabic

Initial List of Languages to Support

---------- 4.7* ----------

  1. Japanese - Code: ja 
  2. Chinese - Code: zh_CN, zh_TW 
  3. Korean - Code: ko

*This will be based on the ability to get all the strings externalized, there is a good chance this gets pushed to 4.8.

---------- Post 4.7 ----------

  1. Spanish: - Code: es_419, es 
  2. German: - Code: de
  3. French - Code: fr
  4. Italian - Code: it
  5. Portuguese - Code: pt_BR
  6. Korean - Code: ko
  7. Hindi - Code: hi

POC

 Initial POC PR

Goals

Internationalization has become table stakes. OpenShift Console needs to support different languages in each of the major markets. This is key functionality that will help unlock sales in different regions.

 

Requirements

 

Requirement Notes isMvp?
Language Selector   YES
Localized Date. + Time   YES
Externalization and translation of all client side strings   YES
Translation for Chinese and Japanese   YES
Process, infra, and testing capabilities put into place   YES
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
     

  

Out of Scope

  1. Dynamically generated UI (Operator, OpenAPIV3Schema)
    1. Operators that surface informational messages may not have translations available
  2. Strings from non client code
    1. This may include items such as events surfaced from Kuberenetes, alerts, and error messages displayed to the user or in logs
  3. Localization of logging messages at any level is not in scope
  4. Any CLI support
  5. Language support for left to right languages ie. Arabic

 

Assumptions

  • Each static plugin team will be responsible for externalizing all their client code strings.
  • Quick Starts will need to be translated.

Customer Considerations

We are rolling this feature in phases, based on customer feedback, there may be no phase 2.

Documentation Considerations

I believe documentation already supports a large language set.

Goal

  • Localize OpenShift Admin Console
    • Externalize hardcoded String on the client side
    • Update Capitalization for Strings
    • Add a language selector under User Menu
    • Localize date + time
    • Setup Build system
    • Setup CI system 

Why is this important?

  • Our goal should be to make OCP Console accessible to everyone. By providing different language support we open up OCP to many people that previously couldn't use the Console because of language barriers.

  

Acceptance Criteria

  • Release Technical Enablement

 

Externalize strings in the pages under the console Workloads nav section.

Externalize strings in the Compute nav section (Nodes, Machines, Machine Sets, Machine Autoscalers, Machine Health Checks, Machine Configs, Machine Config Pools).

We need to translate following common components used in the list and details pages:

  • Common ListPage component
  • Column management modal
  • Common list view kebab
  • Common filter toolbar
  • Details page breadcrumbs
  • ResourceSummary/DetailsItem component
  • Details page Actions dropdown
  • Common details page tabs
  • Managed by operator link
  • Conditions table
  • Common dropdown component

Update i18n files/scripts to support Korean in advance of translation work from Terry.

Specifically:

Epic Goal

  • This is the continuation of the Internationalization work... the following items remain:
    • All existing QuickStarts get Translated
    • Automation Completed
    • Any remaining items cleaned up

Why is this important?

  • Automating as much as possible with the detecting duplicate strings, building, translation drops will ensure we will be successful for all future releases
  • Quick Start are important part of the product that enable our users to maximize usage of the Console
  • Best to clean up anything left over to reduce future Tech Debt

Acceptance Criteria

  • Quick Starts are translated 
  • Everything is automated for building, and pushing translation drops to the globalization team
  • Source code should be up to quality standards

Previous Work (Optional):

  1. https://issues.redhat.com/browse/CONSOLE-2325

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

We have too many namespaces if we're loading them upfront. We should consolidate some of the files.

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Rebase OpenShift components to k8s v1.24

Why is this important?

  • Rebasing ensures components work with the upcoming release of Kubernetes
  • Address tech debt related to upstream deprecations and removals.

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. k8s 1.24 release

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

 

Why?

  • Decouple control and data plane. 
    • Customers do not pay Red Hat more to run HyperShift control planes and supporting infrastructure than Standalone control planes and supporting infrastructure.
  • Improve security
    • Shift credentials out of cluster that support the operation of core platform vs workload
  • Improve cost
    • Allow a user to toggle what they don’t need.
    • Ensure a smooth path to scale to 0 workers and upgrade with 0 workers.

 

Assumption

  • A customer will be able to associate a cluster as “Infrastructure only”
  • E.g. one option: management cluster has role=master, and role=infra nodes only, control planes are packed on role=infra nodes
  • OR the entire cluster is labeled infrastructure , and node roles are ignored.
  • Anything that runs on a master node by default in Standalone that is present in HyperShift MUST be hosted and not run on a customer worker node.

 

 

Doc: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit 

Overview 

Customers do not pay Red Hat more to run HyperShift control planes and supporting infrastructure than Standalone control planes and supporting infrastructure.

Assumption

  • A customer will be able to associate a cluster as “Infrastructure only”
  • E.g. one option: management cluster has role=master, and role=infra nodes only, control planes are packed on role=infra nodes
  • OR the entire cluster is labeled infrastructure, and node roles are ignored.
  • Anything that runs on a master node by default in Standalone that is present in HyperShift MUST be hosted and not run on a customer worker node.

DoD 

Run cluster-storage-operator (CSO) + AWS EBS CSI driver operator + AWS EBS CSI driver control-plane Pods in the management cluster, run the driver DaemonSet in the hosted cluster.

More information here: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit 

 

As HyperShift Cluster Instance Admin, I want to run AWS EBS CSI driver operator + control plane of the CSI driver in the management cluster, so the guest cluster runs just my applications.

  • Add a new cmdline option for the guest cluster kubeconfig file location
  • Parse both kubeconfigs:
    • One from projected service account, which leads to the management cluster.
    • Second from the new cmdline option introduced above. This one leads to the guest cluster.
  • Only on HyperShift:
    • When interacting with Kubernetes API, carefully choose the right kubeconfig to watch / create / update objects in the right cluster.
    • Replace namespaces in all Deployments and other objects that are created in the management cluster. They must be created in the same namespace as the operator.
  •  
  •  
    • Pass only the guest kubeconfig to the operand (control-plane Deployment of the CSI driver).

Exit criteria:

  • Control plane Deployment of AWS EBS CSI driver runs in the management cluster in HyperShift.
  • Storage works in the guest cluster.
  • No regressions in standalone OCP.

< High-Level description of the feature ie: Executive Summary >

Goals

Cluster administrators need an in-product experience to discover and install new Red Hat offerings that can add high value to developer workflows.

Requirements

Requirements Notes IS MVP
Discover new offerings in Home Dashboard   Y
Access details outlining value of offerings   Y
Access step-by-step guide to install offering   N
Allow developers to easily find and use newly installed offerings   Y
Support air-gapped clusters   Y
    • (Optional) Use Cases

< What are we making, for who, and why/what problem are we solving?>

Out of scope

Discovering solutions that are not available for installation on cluster

Dependencies

No known dependencies

Background, and strategic fit

 

Assumptions

None

 

Customer Considerations

 

Documentation Considerations

Quick Starts 

What does success look like?

 

QE Contact

 

Impact

 

Related Architecture/Technical Documents

 

Done Checklist

  • Acceptance criteria are met
  • Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
  • User Journey automation is delivered
  • Support and SRE teams are provided with enough skills to support the feature in production environment

Problem:

Developers using Dev Console need to be made aware of the RH developer tooling available to them.

Goal:

Provide awareness to developers using Dev Console of the RH developer tooling that is available to them, including:

Consider enhancing the +Add page and/or the Guided tour

Provide a Quick Start for installing the Cryostat Operator

Why is it important?

To increase usage of our RH portfolio

Acceptance criteria:

  1. Quick Start - Installing Cryostat Operator
  2.  Quick Start - Get started with JBoss EAP using a Helm Chart
  3. Discoverability of the IDE extensions from Create Serverless form
  4. Update Terminal step of the Guided Tour to indicate that odo CLI is accessible (link to https://developers.redhat.com/products/odo/overview)

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Feature Goal*

What is our purpose in implementing this?  What new capability will be available to customers?

The goal of this feature is to provide a consistent, predictable and deterministic approach on how the default storage class(es) is managed.

 
Why is this important? (mandatory)

The current default storage class implementation has corner cases which can result in PVs staying in pending because there is either no default storage class OR multiple storage classes are defined

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

 

No default storage class

In some cases there is no default SC defined, this can happen during OCP deployment where components such as the registry request a PV whereas the SC are not been defined yet. This can also happen during a change in default SC, there won't be any between the admin unset the current one and set the new on.

 

  1. The admin marks the current default SC1 as non-default.

Another user creates PVC requesting a default SC, by leaving pvc.spec.storageClassName=nil. The default SC does not exist at this point, therefore the admission plugin leaves the PVC untouched with pvc.spec.storageClassName=nil.
The admin marks SC2 as default.
PV controller, when reconciling the PVC, updates pvc.spec.storageClassName=nil to the new SC2.
PV controller uses the new SC2 when binding / provisioning the PVC.

  1. The installer creates PVC for the image registry first, requesting the default storage class by leaving pvc.spec.storageClassName=nil.

The installer creates a default SC.
PV controller, when reconciling the PVC, updates pvc.spec.storageClassName=nil to the new default SC.
PV controller uses the new default SC when binding / provisioning the PVC.

Multiple Storage Classes

In some cases there are multiple default SC, this can be an admin mistake (forget to unset the old one) or during the period where a new default SC is created but the old one is still present.

New behavior:

  1. Create a default storage class A
  2. Create a default storage class B
  3. Create PVC with pvc.spec.storageCLassName = nil

-> the PVC will get the default storage class with the newest CreationTimestamp (i.e. B) and no error should show.

-> admin will get an alert that there are multiple default storage classes and they should do something about it.

 

CSI that are shipped as part of OCP

The CSI drivers we ship as part of OCP are deployed and managed by RH operators. These operators automatically create a default storage class. Some customers don't like this approach and prefer to:

 

  1. Create their own default storage class
  2. Have no default storage class in order to disable dynamic provisioning

 
Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic. 

No external dependencies.

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - STOR
  • Documentation - STOR
  • QE - STOR
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Can bring confusion to customer as there is a change in the default behavior customer are used to. This needs to be carefully documented.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Feature Overview

Create a GCP cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in GCP) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

 
Goals

  • Functionality on GCP Tech Preview
  • inclusion in the cluster backups
  • flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

This epic covers the work to apply user defined labels GCP resources created for openshift cluster available as tech preview.

The user should be able to define GCP labels to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags/labels in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for labeling when the resources are created.

Updating/deleting of labels added during cluster creation or adding new labels as Day-2 operation is out of scope of this epic.

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

Reference - https://issues.redhat.com/browse/RFE-2017

cluster-config-operator makes Infrastructure CRD available for installer, which is included in it's container image from the openshift/api package and requires the package to be updated to have the latest CRD.

BU Priority Overview

As our customers create more and more clusters, it will become vital for us to help them support their fleet of clusters. Currently, our users have to use a different interface(ACM UI) in order to manage their fleet of clusters. Our goal is to provide our users with a single interface for managing a fleet of clusters to deep diving into a single cluster.  This means going to a single URL – your Hub – to interact with your OCP fleet.

Goals

The goal of this tech preview update is to improve the experience from the last round of tech preview. The following items will be improved:

  1. Improved Cluster Picker: Moved to Masthead for better usability, filter/search
  2. Support for Metrics: Metrics are now visualized from Spoke Clusters
  3. Avoid UI Mismatch: Dynamic Plugins from Spoke Clusters are disabled 
  4. Console URLs Enhanced: Cluster Name Add to URL for Quick Links
  5. Security Improvements: Backend Proxy and Auth updates

Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster. 
Why customers want this?

  1. Single interface to accomplish their tasks
  2. Consistent UX and patterns
  3. Easily accessible: One URL, one set of credentials

Why we want this?

  • Shared code -  improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
  • Pre-built PF4 components
  • Accessibility & i18n
  • Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console 

  1. Enable user to quickly change context from fleet view to single cluster view
    1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
    2. Shared SSO across the fleet
    3. Hub OCP Console can connect to remote clusters API
    4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
  2. Share UX between views
    1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
    2. Add Cluster List to OCP —> Create Cluster

Description of problem:

There is a possible race condition in the console operator where the managed cluster config gets updated after the console deployment and doesn't trigger a rollout. 

Version-Release number of selected component (if applicable):

4.10

How reproducible:

Rarely

Steps to Reproduce:

1. Enable multicluster tech preview by adding TechPreviewNoUpgrade featureSet to FeatureGate config. (NOTE THIS ACTION IS IRREVERSIBLE AND WILL MAKE THE CLUSTER UNUPGRADEABLE AND UNSUPPORTED) 
2. Install ACM 2.5+
3. Import a managed cluster using either the ACM console or the CLI
4. Once that managed cluster is showing in the cluster dropdown, import a second managed cluster 

Actual results:

Sometimes the second managed cluster will never show up in the cluster dropdown

Expected results:

The second managed cluster eventually shows up in the cluster dropdown after a page refresh

Additional info:

Migrated from bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2055415

In order for hub cluster console OLM screens to behave as expected in a multicluster environment, we need to gather "copiedCSVsDisabled" flags from managed clusters so that the console backend/frontend can consume this information.

AC:

  • The console operator syncs "copiedCSVsDisabled" flags from managed clusters into the hub cluster managed cluster config.

Feature Overview

Allow to configure compute and control plane nodes on across multiple subnets for on-premise IPI deployments. With separating nodes in subnets, also allow using an external load balancer, instead of the built-in (keepalived/haproxy) that the IPI workflow installs, so that the customer can configure their own load balancer with the ingress and API VIPs pointing to nodes in the separate subnets.

Goals

I want to install OpenShift with IPI on an on-premise platform (high priority for bare metal and vSphere) and I need to distribute my control plane and compute nodes across multiple subnets.

I want to use IPI automation but I will configure an external load balancer for the API and Ingress VIPs, instead of using the built-in keepalived/haproxy-based load balancer that come with the on-prem platforms.

Background, and strategic fit

Customers require using multiple logical availability zones to define their architecture and topology for their datacenter. OpenShift clusters are expected to fit in this architecture for the high availability and disaster recovery plans of their datacenters.

Customers want the benefits of IPI and automated installations (and avoid UPI) and at the same time when they expect high traffic in their workloads they will design their clusters with external load balancers that will have the VIPs of the OpenShift clusters.

Load balancers can distribute incoming traffic across multiple subnets, which is something our built-in load balancers aren't able to do and which represents a big limitation for the topologies customers are designing.

While this is possible with IPI AWS, this isn't available with on-premise platforms installed with IPI (for the control plane nodes specifically), and customers see this as a gap in OpenShift for on-premise platforms.

Functionalities per Epic

 

Epic Control Plane with Multiple Subnets  Compute with Multiple Subnets Doesn't need external LB Built-in LB
NE-1069 (all-platforms)
NE-905 (all-platforms)
NE-1086 (vSphere)
NE-1087 (Bare Metal)
OSASINFRA-2999 (OSP)  
SPLAT-860 (vSphere)
NE-905 (all platforms)
OPNET-133 (vSphere/Bare Metal for AI/ZTP)
OSASINFRA-2087 (OSP)
KNIDEPLOY-4421 (Bare Metal workaround)
SPLAT-409 (vSphere)

Previous Work

Workers on separate subnets with IPI documentation

We can already deploy compute nodes on separate subnets by preventing the built-in LBs from running on the compute nodes. This is documented for bare metal only for the Remote Worker Nodes use case: https://docs.openshift.com/container-platform/4.11/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.html#configure-network-components-to-run-on-the-control-plane_ipi-install-installation-workflow

This procedure works on vSphere too, albeit no QE CI and not documented.

External load balancer with IPI documentation

  1. Bare Metal: https://docs.openshift.com/container-platform/4.11/installing/installing_bare_metal_ipi/ipi-install-post-installation-configuration.html#nw-osp-configuring-external-load-balancer_ipi-install-post-installation-configuration
  2. vSphere: https://docs.openshift.com/container-platform/4.11/installing/installing_vsphere/installing-vsphere-installer-provisioned.html#nw-osp-configuring-external-load-balancer_installing-vsphere-installer-provisioned

Scenarios

  1. vSphere: I can define 3 or more networks in vSphere and distribute my masters and workers across them. I can configure an external load balancer for the VIPs.
  2. Bare metal: I can configure the IPI installer and the agent-based installer to place my control plane nodes and compute nodes on 3 or more subnets at installation time. I can configure an external load balancer for the VIPs.

Acceptance Criteria

  • Can place compute nodes on multiple subnets with IPI installations
  • Can place control plane nodes on multiple subnets with IPI installations
  • Can configure external load balancers for clusters deployed with IPI with control plane and compute nodes on multiple subnets
  • Can configure VIPs to in external load balancer routed to nodes on separate subnets and VLANs
  • Documentation exists for all the above cases

 

Epic Goal

As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers

Why is this important?

Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy. 

Scenarios

  1. A large deployment routed across multiple failure domains without stretched L2 networks, would require to dynamically route the control plane VIP traffic through load-balancers capable of living in multiple L2.
  2. Customers who want to use their existing LB appliances for the control plane.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • QE - must be testing a scenario where we disable the internal LB and setup an external LB and OCP deployment is running fine.
  • Documentation - we need to document all the gotchas regarding this type of deployment, even the specifics about the load-balancer itself (routing policy, dynamic routing, etc)
  • For Tech Preview, we won't require Fixed IPs. This is something targeted for 4.14.

Dependencies (internal and external)

  1. For GA, we'll need Fixed IPs, already WIP by vsphere: https://issues.redhat.com/browse/OCPBU-179

Previous Work:

vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Epic Goal

  • Update all images that we ship with OpenShift to the latest upstream releases and libraries.
  • Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.

Why is this important?

  • We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

  • external-attacher
  • external-provisioner
  • external-resizer
  • external-snapshotter
  • node-driver-registrar
  • livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in  go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

  • Kubernetes:
    • client-go
    • controller-runtime
  • OCP:
    • library-go
    • openshift/api
    • openshift/client-go
    • operator-sdk

Operators:

  • aws-ebs-csi-driver-operator 
  • aws-efs-csi-driver-operator
  • azure-disk-csi-driver-operator
  • azure-file-csi-driver-operator
  • cinder-csi-driver-operator
  • gcp-pd-csi-driver-operator
  • gcp-filestore-csi-driver-operator
  • manila-csi-driver-operator
  • ovirt-csi-driver-operator
  • vmware-vsphere-csi-driver-operator
  • alibaba-disk-csi-driver-operator
  • ibm-vpc-block-csi-driver-operator
  • csi-driver-shared-resource-operator

 

  • cluster-storage-operator
  • csi-snapshot-controller-operator
  • local-storage-operator
  • vsphere-problem-detector

 

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Feature Goal

  • Enable platform=external to support onboarding new partners, e.g. Oracle Cloud Infrastructure and VCSP partners.
  • Create a new platform type, working name "External", that will signify when a cluster is deployed on a partner infrastructure where core cluster components have been replaced by the partner. “External” is different from our current platform types in that it will signal that the infrastructure is specifically not “None” or any of the known providers (eg AWS, GCP, etc). This will allow infrastructure partners to clearly designate when their OpenShift deployments contain components that replace the core Red Hat components.

This work will require updates to the core OpenShift API repository to add the new platform type, and then a distribution of this change to all components that use the platform type information. For components that partners might replace, per-component action will need to be taken, with the project team's guidance, to ensure that the component properly handles the "External" platform. These changes will look slightly different for each component.

To integrate these changes more easily into OpenShift, it is possible to take a multi-phase approach which could be spread over a release boundary (eg phase 1 is done in 4.X, phase 2 is done in 4.X+1).

OCPBU-5: Phase 1

  • Write platform “External” enhancement.
  • Evaluate changes to cluster capability annotations to ensure coverage for all replaceable components.
  • Meet with component teams to plan specific changes that will allow for supplement or replacement under platform "External".
  • Start implementing changes towards Phase 2.

OCPBU-510: Phase 2

  • Update OpenShift API with new platform and ensure all components have updated dependencies.
  • Update capabilities API to include coverage for all replaceable components.
  • Ensure all Red Hat operators tolerate the "External" platform and treat it the same as "None" platform.

OCPBU-329: Phase.Next

  • TBD

Why is this important?

  • As partners begin to supplement OpenShift's core functionality with their own platform specific components, having a way to recognize clusters that are in this state helps Red Hat created components to know when they should expect their functionality to be replaced or supplemented. Adding a new platform type is a significant data point that will allow Red Hat components to understand the cluster configuration and make any specific adjustments to their operation while a partner's component may be performing a similar duty.
  • The new platform type also helps with support to give a clear signal that a cluster has modifications to its core components that might require additional interaction with the partner instead of Red Hat. When combined with the cluster capabilities configuration, the platform "External" can be used to positively identify when a cluster is being supplemented by a partner, and which components are being supplemented or replaced.

Scenarios

  1. A partner wishes to replace the Machine controller with a custom version that they have written for their infrastructure. Setting the platform to "External" and advertising the Machine API capability gives a clear signal to the Red Hat created Machine API components that they should start the infrastructure generic controllers but not start a Machine controller.
  2. A partner wishes to add their own Cloud Controller Manager (CCM) written for their infrastructure. Setting the platform to "External" and advertising the CCM capability gives a clear to the Red Hat created CCM operator that the cluster should be configured for an external CCM that will be managed outside the operator. Although the Red Hat operator will not provide this functionality, it will configure the cluster to expect a CCM.

Acceptance Criteria

Phase 1

  • Partners can read "External" platform enhancement and plan for their platform integrations.
  • Teams can view jira cards for component changes and capability updates and plan their work as appropriate.

Phase 2

  • Components running in cluster can detect the “External” platform through the Infrastructure config API
  • Components running in cluster react to “External” platform as if it is “None” platform
  • Partners can disable any of the platform specific components through the capabilities API

Phase 3

  • Components running in cluster react to the “External” platform based on their function.
    • for example, the Machine API Operator needs to run a set of controllers that are platform agnostic when running in platform “External” mode.
    • the specific component reactions are difficult to predict currently, this criteria could change based on the output of phase 1.

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. Identifying OpenShift Components for Install Flexibility

Open questions::

  1. Phase 1 requires talking with several component teams, the specific action that will be needed will depend on the needs of the specific component. At the least the components need to treat platform "External" as "None", but there could be more changes depending on the component (eg Machine API Operator running non-platform specific controllers).

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Epic Goal

  • Empower External platform type user to specify when they will run their own CCM

Why is this important?

  • For partners wishing to use components that require zonal awareness provided by the infrastructure (for example CSI drivers), they will need to exercise their own cloud controller managers. This epic is about adding the proper configuration to OpenShift to allow users of External platform types to run their own CCMs.

Scenarios

  1. As a Red Hat partner, I would like to deploy OpenShift with my own CSI driver. To do this I need my CCM deployed as well. Having a way to instruct OpenShift to expect an external CCM deployment would allow me to do this.

Acceptance Criteria

  • CI - A new periodic test based on the External platform test would be ideal
  • Release Technical Enablement - Provide necessary release enablement details and documents.
    • Update docs.ci.openshift.org with CCM docs

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. https://github.com/openshift/enhancements/blob/master/enhancements/cloud-integration/infrastructure-external-platform-type.md#api-extensions
  2. https://github.com/openshift/api/pull/1409

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story

As a Red Hat Partner installing OpenShift using the External platform type, I would like to install my own Cloud Controller Manager(CCM). Having a field in the Infrastructure configuration object to signal that I will install my own CCM and that Kubernetes should be configured to expect an external CCM will allow me to run my own CCM on new OpenShift deployments.

Background

This work has been defined in the External platform enhancement , and had previously been part of openshift/api . The CCM API pieces were removed for the 4.13 release of OpenShift to ensure that we did not ship unused portions of the API.

In addition to the API changes, library-go will need to have an update to the  IsCloudProviderExternal function to detect the if the External platform is selected and if the CCM should be enabled for external mode.

We will also need to check the ObserveCloudVolumePlugin function to ensure that it is not affected by the external changes and that it continues to use the external volume plugin.

After updating openshift/library-go, it will need to be re-vendored into the MCO  , KCMO , and CCCMO  (although this is not as critical as the other 2).

Steps

  • update openshift/api with new CCM fields (re-revert #1409)
  • revendor api to library-go
  • update IsCloudProviderExternal in library-go to observe the new API fields
  • investigate ObserveCloudVolumePlugin to see if it requires changes
  • revendor library-go to MCO, KCMO, and CCCMO
  • update enhancement doc to reflect state

Stakeholders

  • openshift eng
  • oracle cloud install effort

Definition of Done

  • openshift can be installed with External platform type with kubelet, and related components, using the external cloud provider flags.
  • Docs
  • this will need to be documented in the API and as part of OCPCLOUD-1581
  • Testing
  • this will need validation through unit test, integration testing may be difficult as we will need a new e2e built off the external platform with a ccm
  1. Proposed title of this feature request:

Update ETCD datastore encryption to use AES-GCM instead of AES-CBC

2. What is the nature and description of the request?

The current ETCD datastore encryption solution uses the aes-cbc cipher. This cipher is now considered "weak" and is susceptible to padding oracle attack.  Upstream recommends using the AES-GCM cipher. AES-GCM will require automation to rotate secrets for every 200k writes.

The cipher used is hard coded. 

3. Why is this needed? (List the business requirements here).

Security conscious customers will not accept the presence and use of weak ciphers in an OpenShift cluster. Continuing to use the AES-CBC cipher will create friction in sales and, for existing customers, may result in OpenShift being blocked from being deployed in production. 

4. List any affected packages or components.

Epic Goal*

What is our purpose in implementing this?  What new capability will be available to customers?

The Kube APIserver is used to set the encryption of data stored in etcd. See https://docs.openshift.com/container-platform/4.11/security/encrypting-etcd.html

 

Today with OpenShift 4.11 or earlier, only aescbc is allowed as the encryption field type. 

 

RFE-3095 is asking that aesgcm (which is an updated and more recent type) be supported. Furthermore RFE-3338 is asking for more customizability which brings us to how we have implemented cipher customzation with tlsSecurityProfile. See https://docs.openshift.com/container-platform/4.11/security/tls-security-profiles.html

 

 
Why is this important? (mandatory)

AES-CBC is considered as a weak cipher

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1.  

 
Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation -
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

This epic contains all the OLM related stories for OCP release-4.14

Epic Goal

  • Track all the stories under a single epic

Console operator should be building up a set of cluster nodes OS types, which he should supply to console, so it renders only operators that could be installed on the cluster.

This will be needed when we will support different OS types on the cluster.

We need to scan through the compute nodes and build a set of supported OS from those. Each node on the cluster has a label for its operating system: e.g. kubernetes.io/os=linux,

 

AC:

  1. Implement logic in the console-operator that will scan though all the nodes and build a set of all the OS types that the cluster nodes run on and pass it to the console-config.yaml . This set of OS types will be then used by console frontend.
  2. Add unit and e2e test cases in the console-operator repository.

Pre-Work Objectives

Since some of our requirements from the ACM team will not be available for the 4.12 timeframe, the team should work on anything we can get done in the scope of the console repo so that when the required items are available in 4.13, we can be more nimble in delivering GA content for the Unified Console Epic.

Overall GA Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster. 
Why customers want this?

  1. Single interface to accomplish their tasks
  2. Consistent UX and patterns
  3. Easily accessible: One URL, one set of credentials

Why we want this?

  • Shared code -  improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
  • Pre-built PF4 components
  • Accessibility & i18n
  • Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console 

  1. Enable user to quickly change context from fleet view to single cluster view
    1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
    2. Shared SSO across the fleet
    3. Hub OCP Console can connect to remote clusters API
    4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
  2. Share UX between views
    1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
    2. Add Cluster List to OCP —> Create Cluster

As a developer I would like to disable clusters like *KS that we can't support for multi-cluster (for instance because we can't authenticate). The ManagedCluster resource has a vendor label that we can use to know if the cluster is supported.

cc Ali Mobrem Sho Weimer Jakub Hadvig 

UPDATE: 9/20/22 : we want an allow-list with OpenShift, ROSA, ARO, ROKS, and  OpenShiftDedicated

Acceptance criteria:

  • Investigate if console-operator should pass info about which cluster are supported and unsupported to the frontend
  • Unsupported clusters should not appear in the cluster dropdown
  • Unsupported clusters based off
    • defined vendor label
    • non 4.x ocp clusters

Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster. 
Why customers want this?

  1. Single interface to accomplish their tasks
  2. Consistent UX and patterns
  3. Easily accessible: One URL, one set of credentials

Why we want this?

  • Shared code -  improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
  • Pre-built PF4 components
  • Accessibility & i18n
  • Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console 

  1. Enable user to quickly change context from fleet view to single cluster view
    1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
    2. Shared SSO across the fleet
    3. Hub OCP Console can connect to remote clusters API
    4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
  2. Share UX between views
    1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
    2. Add Cluster List to OCP —> Create Cluster

We need a way to show metrics for workloads running on spoke clusters. This depends on ACM-876, which lets the console discover the monitoring endpoints.

  • Console operator must discover the external URLs for monitoring
  • Console operator must pass the URLs and CA files as part of the cluster config to the console backend
  • Console backend must set up proxies for each endpoint (as it does for the API server endpoints)
  • Console frontend must include the cluster in metrics requests

Open Issues:

We will depend on ACM to create a route on each spoke cluster for the prometheus tenancy service, which is required for metrics for normal users.

 

Openshift console backend should proxy managed cluster monitoring requests through the MCE cluster proxy addon to prometheus services on the managed cluster. This depends on https://issues.redhat.com/browse/ACM-1188

 

This epic contains all the OLM related stories for OCP release-4.13

Epic Goal

  • Track all the stories under a single epic

Description/Acceptance Criteria:

  • Add RBAC for the console-operator so it can GET/LIST/WATCH OLMConfig  cluster config. The RBAC should be added to console-operator cluster-role rules 
  • The console operator should watch the spec.features.disableCopiedCSVs property of the OLM cluster config. When this property is true, the console-config should be updated "clusterInfo.copiedCSVsDisabled" field accordingly, and rollout a new version of console.

1. Proposed title of this feature request
BYOK encrypts root vols AND default storageclass

2. What is the nature and description of the request?
User story
As a customer spinning up managed OpenShift clusters, if I pass a custom AWS KMS key to the installer, I expect it (installer and cluster-storage-operator) to not only encrypt the root volumes for the nodes in the cluster, but also be applied to encrypt the first/default (gp2 in current case) StorageClass, so that my assumptions around passing a custom key are met.
In current state, if I pass a KMS key to the installer, only root volumes are encrypted with it, and the default AWS managed key is used for the default StorageClass.
Perhaps this could be offered as a flag to set in the installer to further pass the key to the storage class, or not.

3. Why does the customer need this? (List the business requirements here)
To satisfy that customers wish to encrypt their owned volumes with their selected key instead of the AWS default account key, by accident.

4. List any affected packages or components.

  • uncertain.

Note: this implementation should take effect on AWS, GCP and Azure (any cloud provider) equally.

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

User Story:

As a cluster admin, I want OCP to provision new volumes with my custom encryption key that I specified during cluster installation in install-config.yaml so all OCP assets (PVs, VMs & their root disks) use the same encryption key.

Acceptance Criteria:

Description of criteria:

  • Check that dynamically provisioned PVs use the key specified in install-config.yaml
  • Check that the key can be changed in TBD API and all volumes newly provisioned after the key change use the new key. (Exact API is not defined yet, probably a new field in `Infrastructure`, calling it TBD API now).

(optional) Out of Scope:

Re-encryption of existing PVs with a new key. Only newly provisioned PVs will use the new key.

Engineering Details:

Enhancement (incl. TBD API with encryption key reference) will be provided as part of https://issues.redhat.com/browse/CORS-2080.

"Raw meat" of this story is translation of the key reference in TBD API to StorageClass.Parameters. AWS EBS CSi driver operator should update both the StorageClass it manages (managed-csi) with:

Parameters:
    encrypted: "true"

    kmsKeyId:  "arn:aws:kms:us-east-1:012345678910:key/abcd1234-a123-456a-a12b-a123b4cd56ef"

Upstream docs: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/parameters.md 

Feature Overview

Enable sharing ConfigMap and Secret across namespaces

Requirements

Requirement Notes isMvp?
Secrets and ConfigMaps can get shared across namespaces   YES

Questions to answer…

NA

Out of Scope

NA

Background, and strategic fit

Consumption of RHEL entitlements has been a challenge on OCP 4 since it moved to a cluster-based entitlement model compared to the node-based (RHEL subscription manager) entitlement mode. In order to provide a sufficiently similar experience to OCP 3, the entitlement certificates that are made available on the cluster (OCPBU-93) should be shared across namespaces in order to prevent the need for cluster admin to copy these entitlements in each namespace which leads to additional operational challenges for updating and refreshing them. 

Documentation Considerations

Questions to be addressed:
 * What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
 * Does this feature have doc impact?
 * New Content, Updates to existing content, Release Note, or No Doc Impact
 * If unsure and no Technical Writer is available, please contact Content Strategy.
 * What concepts do customers need to understand to be successful in [action]?
 * How do we expect customers will use the feature? For what purpose(s)?
 * What reference material might a customer want/need to complete [action]?
 * Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
 * What is the doc impact (New Content, Updates to existing content, or Release Note)?

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Allow CSI volumes to be mounted into a build

Why is this important?

  • CSI volumes allow data to be mounted into containers via ephemeral CSI Volumes
  • Ephemeral CSI volumes are provided by CSI drivers that support this feature. Such drivers include:
  • When using sensitive credentials in a build, accessing secrets as a mounted volume ensure that these credentials are not present in the resulting container image.

Scenarios

  1. Access private artifact repositories (Artifactory, jFrog, Mavein)
  2. Download RHEL packages in a build

Acceptance Criteria

  • Builds can mount a CSI volume in a build
  • Content in the CSI volume is not present in the resulting container image.
  • If SCCs do not support fine controls over CSI volumes, provide this feature on a TechPreview basis with a feature gate.

Dependencies (internal and external)

  1. Buildah - support mounting of volumes when building with a Dockerfile
  2. (optional) Auth - use SCCs to control which CSI drivers are allowed to be used with ephemeral CSI volumes.

Previous Work (Optional):

  1. BUILD-257 - Build Resource Volume Mounts

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story

As an OpenShift cluster admin
I want to use the TechPreview feature set to try out CSI volumes in builds
So that I can test using CSI volumes in builds through tech preview features

Minimum Acceptance Criteria

  • Openshift-controller-manager-operator reads the cluster feature gate (BuildCSIVolumes).
  • If the feature gate for Build CSI volumes is enabled, the operator enables configuration in openshift-controller-manager to turn on CSI volumes for the build controller.
  • CI testing verifies that we pass the appropriate feature gate to the build controller when tech preview is enabled.

Docs Impact

Product documentation will not be required until BUILD-275 is complete. Documentation for CSI volumes in builds will need to note that the TechPreviewNoUpgrade feature set needs to be enabled on the cluster.

PX Impact

Additional training enablement materials may not be needed - product docs should be sufficient.

QE Imact

Full e2e testing may not be feasible until BUILD-275 is completed.
CI testing should verify that the appropriate configuration values were passed to the build controller.

We will likely need a new CI job that installs the cluster with tech preview enabled before we verify that the BuildCSIVolumes feature gate has been enabled.

Open Questions

  • Should ocm-o mark itself Upgradeable=false if we detect the BuildCSIVolumes feature gate has been enabled?

Notes

OpenShift already has feature gates baked into the core platform via the FeatureGate API object. For this feature, we need to declare a feature gate that is added to the TechPreviewNoUpgrade feature set, which openshift-controller-manager-operator then reads and applies to the build controller.

Feature gate needs to be proposed to openshift/api (add to the TechPreviewNoUpgrade feature set).
An example PR on how to do this: https://github.com/openshift/api/pull/982.
Once approved, the updated tech preview feature set needs to be vendored into openshift/library-go.
Openshift-controller-manager-operator needs to read the feature gate, pass it on to the build controller.
The build controller has its own configuration "API" - this was a relic of the 3.x master configuration that is not exposed to admins in OCP 4.x: https://github.com/openshift/api/blob/master/openshiftcontrolplane/v1/types.go#L198-L207

A separate operator looks checks if a *NoUpgrade feature set is enabled, and if so marks the cluster as unable to be upgraded to the next minor OCP
release.

To test this in CI, we need a suite that runs with the TechPreviewNoUpgrade feature set enabled. The step registry has primitives which bring up a cluster with tech preview features enabled. We will need to update ocm-o's CI configuration to run our operator tests with tech preview enabled. Testing for this specific feature will need to have separate logic that verifies we are sending the right configuration to the build controller under normal and TechPreview mode.

 

Existing techpreview CI step registry setups (note the per cloud elements, which make sense, since the existing CSI drivers are per cloud):

/ci-operator/step-registry/ipi/aws/pre/techpreview
./ci-operator/step-registry/ipi/azure/pre/techpreview
./ci-operator/step-registry/ipi/conf/aws/techpreview
./ci-operator/step-registry/ipi/conf/azure/techpreview
./ci-operator/step-registry/ipi/conf/techpreview
./ci-operator/step-registry/ipi/conf/openstack/techpreview
./ci-operator/step-registry/ipi/openstack/pre/techpreview
./ci-operator/step-registry/openshift/e2e/aws/techpreview
./ci-operator/step-registry/openshift/e2e/gcp/techpreview
./ci-operator/step-registry/openshift/e2e/azure/techpreview
./ci-operator/step-registry/openshift/e2e/openstack/techpreview
./ci-operator/step-registry/openshift/e2e/vsphere/techpreview

 

Given shared resources span all clouds, etc. does that mean we touch each of these, or create a new one, or both?

Feature Overview

  • Customers want to create and manage OpenShift clusters using managed identities for Azure resources for authentication.

Goals

  • A customer using ARO wants to spin up an OpenShift cluster with "az aro create" without needing additional input, i.e. without the need for an AD account or service principal credentials, and the identity used is never visible to the customer and cannot appear in the cluster.
  • As an administrator, I want to deploy OpenShift 4 and run Operators on Azure using access controls (IAM roles) with temporary, limited privilege credentials.

Requirements

  • Azure managed identities must work for installation with all install methods including IPI and UPI, work with upgrades, and day-to-day cluster lifecycle operations.
  • Support HyperShift and non-HyperShift clusters.
  • Support use of Operators with Azure managed identities.
  • Support in all Azure regions where Azure managed identity is available. Note: Federated credentials is associated with Azure Managed Identity, and federated credentials is not available in all Azure regions.

More details at ARO managed identity scope and impact.

 

This Section: A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

(Optional) Use Cases

This Section:

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

Questions to answer…

  • ...

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

  • ...

Customer Considerations

  • ...

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?
  • New Content, Updates to existing content, Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

References

Epic Overview

  • Enable customers to create and manage OpenShift clusters using managed identities for Azure resources for authentication.
  • A customer using ARO wants to spin up an OpenShift cluster with "az aro create" without needing additional input, i.e. without the need for an AD account or service principal credentials, and the identity used is never visible to the customer and cannot appear in the cluster.

Epic Goal

  • A customer creates an OpenShift cluster ("az aro create") using Azure managed identity.
  • Azure managed identities must work for installation with all install methods including IPI and UPI, work with upgrades, and day-to-day cluster lifecycle operations.
  • After Azure failed to implement workable golang API changes after deprecation of their old API, we have removed mint mode and work entirely in passthrough mode. Azure has plans to implement pod/workload identity similar to how they have been implemented in AWS and GCP, and when this feature is available, we should implement permissions similar to AWS/GCP
  • This work cannot start until Azure have implemented this feature - as such, this Epic is a placeholder to track the effort when available.

Why is this important?

  • Microsoft and the customer would prefer that we use Managed Identities vs. Service Principal (which requires putting the Service Principal and principal password in clear text within the azure.conf file).

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

 

 

This document describes the expectations for promoting a feature that is behind a feature gate.

https://docs.google.com/document/d/1zOL38_KDKwAvsx-LMHfyDdX4-jq3HQU1YWBjfuHaM0Q/edit#heading=h.2se1quoqg6jr 

The criteria includes:

  • Automated testing
    • CCO-234: Azure workload identity e2e testing
    • CCO-408: Add periodic e2e test for e2e-azure-manual-oidc
    • CCO-379: E2E Automation
    • CCO-380: CI Integration-Azure Managed Identity (Workload Identity) Support
  • QE sign off that the feature is complete
  • Staff engineer approval

Feature Overview

Create a Azure cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in Azure) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

 
Goals

  • Functionality on Azure Tech Preview
  • inclusion in the cluster backups
  • flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

This epic covers the work to apply user defined tags to Azure created for openshift cluster available as tech preview.

The user should be able to define the azure tags to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for tagging when the resources are created.

Updating/deleting of tags added during cluster creation or adding new tags as Day-2 operation is out of scope of this epic.

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

Reference - https://issues.redhat.com/browse/RFE-2017

cluster-config-operator makes Infrastructure CRD available for installer, which is included in it's container image from the openshift/api package and requires the package to be updated to have the latest CRD.

Epic Goal

  • Update all images that we ship with OpenShift to the latest upstream releases and libraries.
  • Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.

Why is this important?

  • We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

  • Kubernetes:
    • client-go
    • controller-runtime
  • OCP:
    • library-go
    • openshift/api
    • openshift/client-go
    • operator-sdk

Operators:

  • aws-ebs-csi-driver-operator 
  • aws-efs-csi-driver-operator
  • azure-disk-csi-driver-operator
  • azure-file-csi-driver-operator
  • openstack-cinder-csi-driver-operator
  • gcp-pd-csi-driver-operator
  • gcp-filestore-csi-driver-operator
  • csi-driver-manila-operator
  • vmware-vsphere-csi-driver-operator
  • alibaba-disk-csi-driver-operator
  • ibm-vpc-block-csi-driver-operator
  • csi-driver-shared-resource-operator
  • ibm-powervs-block-csi-driver-operator

 

  • cluster-storage-operator
  • cluster-csi-snapshot-controller-operator
  • local-storage-operator
  • vsphere-problem-detector

EOL, do not upgrade:

  • github.com/oVirt/csi-driver-operator

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

  • external-attacher
  • external-provisioner
  • external-resizer
  • external-snapshotter
  • node-driver-registrar
  • livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in  go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.

Epic Goal*

What is our purpose in implementing this?  What new capability will be available to customers?

 
Why is this important? (mandatory)

What are the benefits to the customer or Red Hat?   Does it improve security, performance, supportability, etc?  Why is work a priority?

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1.  

 
Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation -
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

 

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Complete during New status.

 

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

 

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

 

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Problem:

As a developer of serverless functions, we don't provide any samples.

Goal:

Provide Serverless Function samples in the sample catalog.  These would be utilizing the Builder Image capabilities.

Why is it important?

Use cases:

  1. <case>

Acceptance criteria:

  1. <criteria>

Dependencies (External/Internal):

  • Serverless team would need to provide sample repo for serverless function
  • Samples operator would need to be update

Design Artifacts:

Exploration:

Note:

  • Need to define the API and confirm with other stakeholders - need to support a serverless func image stream "tag"
  • Serverless team will need to provide updates to the existing Image Streams, as well as maintain the sample repositories which are referenced in the Image Streams.
  • Need to understand the relationship between ImageStream and Image Stream Tag
  • Should serverless function samples in the catalog have "builder image" tag?  or should it be "serverless function"

Description

As an operator author, I want to provide additional samples that are tied to an operator version, not an OpenShift release. For that, I want to create a resource to add new samples to the web console.

Acceptance Criteria

  1. openshift/console-operator update so that new clusters have the new ConsoleSample CRD
  2. Add RBAC permissions (roles and rolebinding?) so that all users have access to ConsoleSample resources

Additional Details:

Note: Replace text in red with details of your feature request.

Feature Overview

Extend the Workload Partitioning feature to support multi-node clusters.

Goals

Customers running RAN workloads on C-RAN Hubs (i.e. multi-node clusters) that want to maximize the cores available to the workloads (DU) should be able to utilize WP to isolate CP processes to reserved cores.

Requirements

A list of specific needs or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts.  If a non MVP requirement slips, it does not shift the feature.

requirement Notes isMvp?
     
     
     

 

Describe Use Cases (if needed)

< How will the user interact with this feature? >

< Which users will use this and when will they use it? >

< Is this feature used as part of current user interface? >

Out of Scope

 

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

< Are there Upgrade considerations that customers need to account for or that the feature should address on behalf of the customer?>

<Does the Feature introduce data that could be gathered and used for Insights purposes?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

< What does success look like?>

< Does this feature have doc impact?  Possible values are: New Content, Updates to existing content,  Release Note, or No Doc Impact>

< If unsure and no Technical Writer is available, please contact Content Strategy. If yes, complete the following.>

  • <What concepts do customers need to understand to be successful in [action]?>
  • <How do we expect customers will use the feature? For what purpose(s)?>
  • <What reference material might a customer want/need to complete [action]?>
  • <Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available. >
  • <What is the doc impact (New Content, Updates to existing content, or Release Note)?>

Interoperability Considerations

< Which other products and versions in our portfolio does this feature impact?>

< What interoperability test scenarios should be factored by the layered product(s)?>

Questions

Question Outcome
   

 

 

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Incomplete Features

When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

For users who are using OpenShift but have not yet begun to explore multicluster and we we offer them.

I'm investigating where Learning paths are today and what is required.

As a user I'd like to have learning path for how to get started with Multicluster.
Install MCE
Create multiple clusters
Use HyperShift
Provide access to cluster creation to devs via templates
Scale up to ACM/ACS (OPP?)

Status
https://github.com/patternfly/patternfly-quickstarts/issues/37#issuecomment-1199840223

Goal: Resources provided via the Dynamic Resource Allocation Kubernetes mechanism can be consumed by VMs.

Details: Dynamic Resource Allocation

Goal

Come up with a design of how resources provided by Dynamic Resource Allocation can be consumed by KubeVirt VMs.

Description

The Dynamic Resource Allocation (DRA) feature is an alpha API in Kubernetes 1.26, which is the base for OpenShift 4.13.
This feature provides the ability to create ResourceClaim and ResourceClasse to request access to Resources. This is similar to the dynamic provisioning of PersistentVolume via PersistentVolumeClaim and StorageClasse.

NVIDIA has been a lead contributor to the KEP and has already an initial implementation of a DRA driver and plugin, with a nice demo recording. NVIDIA is expecting to have this DRA driver available in CY23 Q3 or Q4, so likely in NVIDIA GPU Operator v23.9, around OpenShift 4.14.

When asked about the availability of MIG-backed vGPU for Kubernetes, NVIDIA said that the timeframe is not decided yet, because it will likely use DRA for the MIG devices creation and their registration with the vGPU host driver. The MIG-base vGPU feature for OpenShift Virtualization will then likely require support of DRA to request vGPU resources for the VMs.

Not having MIG-backed vGPU is a risk for OpenShift Virtualization adoption in GPU use cases, such as virtual workstations for rendering with Windows-only softwares. Customers who want to have a mix of passthrough, time-based vGPU and MIG-backed vGPU will prefer competitors who offer the full range of options. And the certification of NVIDIA solutions like NVIDIA Omniverse will be blocked, despite a great potential to increase the OpenShift consumption, as it uses RTX/A40 GPU for virtual workstations (not certified by NVIDIA on OpenShift Virtualization yet) and A100/H100 for physics simulation, both use cases probably leveraring vGPUs [7]. There's a lot of necessary conditions for that to happen and MIG-backed vGPU support is one of them.

User Stories

  • GPU consumption optimization
    "As an Admin, I want to let NVIDIA GPU DRA driver provision vGPUs for OpenShift Virtualization, so that it optimizes the allocation with dynamic provisioning of time or MIG backed vGPUs"
  • GPU mixed types per server
    "As an Admin, I want to be able to mix different types of GPU to collocate different types of workloads on the same host, in order to improve multi-pod/stack performance.

Non-Requirements

  • List of things not included in this epic, to alleviate any doubt raised during the grooming process.

Notes

  • Any additional details or decisions made/needed

References

Done Checklist

Who What Reference
DEV Upstream roadmap issue (or individual upstream PRs) <link to GitHub Issue>
DEV Upstream documentation merged <link to meaningful PR>
DEV gap doc updated <name sheet and cell>
DEV Upgrade consideration <link to upgrade-related test or design doc>
DEV CEE/PX summary presentation label epic with cee-training and add a <link to your support-facing preso>
QE Test plans in Polarion <link or reference to Polarion>
QE Automated tests merged <link or reference to automated tests>
DOC Downstream documentation merged <link to meaningful PR>

Feature Overview

Plugin teams need a mechanism to extend the OCP console that is decoupled enough so they can deliver at the cadence of their projects and not be forced in to the OCP Console release timelines.

The OCP Console Dynamic Plugin Framework will enable all our plugin teams to do the following:

  • Extend the Console
  • Deliver UI code with their Operator
  • Work in their own git Repo
  • Deliver at their own cadence

Goals

    • Operators can deliver console plugins separate from the console image and update plugins when the operator updates.
    • The dynamic plugin API is similar to the static plugin API to ease migration.
    • Plugins can use shared console components such as list and details page components.
    • Shared components from core will be part of a well-defined plugin API.
    • Plugins can use Patternfly 4 components.
    • Cluster admins control what plugins are enabled.
    • Misbehaving plugins should not break console.
    • Existing static plugins are not affected and will continue to work as expected.

Out of Scope

    • Initially we don't plan to make this a public API. The target use is for Red Hat operators. We might reevaluate later when dynamic plugins are more mature.
    • We can't avoid breaking changes in console dependencies such as Patternfly even if we don't break the console plugin API itself. We'll need a way for plugins to declare compatibility.
    • Plugins won't be sandboxed. They will have full JavaScript access to the DOM and network. Plugins won't be enabled by default, however. A cluster admin will need to enable the plugin.
    • This proposal does not cover allowing plugins to contribute backend console endpoints.

 

Requirements

 

Requirement Notes isMvp?
 UI to enable and disable plugins    YES 
 Dynamic Plugin Framework in place    YES 
Testing Infra up and running   YES 
 Docs and read me for creating and testing Plugins    YES 
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 
 Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Console static plugins, maintained as part of the frontend monorepo, currently use static extension types (packages/console-plugin-sdk/src/typings) which directly reference various kinds of objects, including React components and arbitrary functions.

To ease the long-term transition from static to dynamic plugins, we should support a use case where an existing static plugin goes through the following stages:

  1. use static extensions only (we are here)
  2. use both static and dynamic extensions
  3. use dynamic extensions only (ideal target for 4.7)

Once a static plugin reaches the "use dynamic extensions only" stage, its maintainers can move it out of the Console monorepo - the plugin becomes dynamic, shipped via the corresponding operator and loaded by Console app at runtime.

Dynamic plugins will need changes to the console operator config to be enabled and disabled. We'll need either a new CRD or an annotation on CSVs for console to discover when plugins are available.

This story tracks any API updates needed to openshift/api and any operator updates needed to wire through dynamic plugins as console config.

See https://github.com/openshift/enhancements/pull/441 for design details.

Feature Overview

  • This Section:* High-Level description of the feature ie: Executive Summary
  • Note: A Feature is a capability or a well defined set of functionality that delivers business value. Features can include additions or changes to existing functionality. Features can easily span multiple teams, and multiple releases.

 

Goals

  • This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

 

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

 

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

(Optional) Use Cases

This Section: 

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

 

Questions to answer…

  • ...

 

Out of Scope

 

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

 

Assumptions

  • ...

 

Customer Considerations

  • ...

 

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

The work on this story is dependent on following changes:

 

The console already supports custom routes on the operator config. With the new proposed CustomDomains API introduces a unified way how to stock install custom domains for routes, which both names and serving cert/keys, customers want to customise. From the console perspective those are:

  • openshift-console / console
  • openshift-console / downloads (CLI downloads)

 

The setup should be done on the Ingress config. There two new fields are introduced:

  • ComponentRouteSpec - contains configuration of the for the custom domain(name, namespace, custom hostname, TLS secret reference)
  • ComponentRouteStatus - contains status of the custom domain(condition, previous hostname, rbac needed to read the TLS secret, ...)

 

Console-operator will be only consuming the API and check for any changes. If a custom domain is set for either `console` or `downloads` route in the `openshift-console` namespace, console-operator will read the setup set a custom route accordingly. When a custom route will be setup for any of console's route, the default route wont be deleted, but instead it will updated so it redirects to the custom one. This is done because of two reasons:

  1. we want to prevent somebody from stealing the default hostname of both routes (console, downloads)
  2. we want to prevent users from having unusable bookmarks that are pointing to the default hostname

 

Console-operator will still need to support the CustomDomain API that is available on it's config.

Acceptance criteria:

  • Console supports the new CustomDomains API for configuring a custom domain for both `console` and `downloads` routes
  • Console falls back to the deprecated API in the console operator config if present
  • Console supports the original default domains and redirects to the new ones

 

Questions:

  • Which CustomDomain API takes precedens? Ingress config vs. Console-operator config. Can upgrade cause any issues?
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

During master nodes upgrade when nodes are getting drained there's currently no protection from two or more operands going down. If your component is required to be available during upgrade or other voluntary disruptions, please consider deploying PDB to protect your operands.

The effort is tracked in https://issues.redhat.com/browse/WRKLDS-293.

Example:

 

Acceptance Criteria:
1. Create PDB controller in console-operator for both console and downloads pods
2. Add e2e tests for PDB in single node and multi node cluster

 

Note: We should consider to backport this to 4.10

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

When OCP is performing cluster upgrade user should be notified about this fact.

There are two possibilities how to surface the cluster upgrade to the users:

  • Display a console notification throughout OCP web UI saying that the cluster is currently under upgrade.
  • Global notification throughout OCP web UI saying that the cluster is currently under upgrade.
  • Have an alert firing for all the users of OCP stating the cluster is undergoing an upgrade. 

 

AC:

  • Console-operator will create a ConsoleNotification CR when the cluster is being upgraded. Once the upgrade is done console-operator will remote that CR. These are the three statuses based on which we are determining if the cluster is being upgraded.
  • Add unit tests

 

Note: We need to decide if we want to distinguish this particular notification by a different color? ccing Ali Mobrem 

 

Created from: https://issues.redhat.com/browse/RFE-3024

 

Feature Overview

Quick Starts are key tool for helping our customer to discover and understand how to take advantage of services that run on top of the OpenShift Platform. This feature will focus on making the Quick Starts extensible. With this Console extension our customer and partners will be able to add in their own Quick Starts to help drive a great user experience.

Enhancement PR: https://github.com/openshift/enhancements/pull/360 

Goals

  • Provide a supported API that our internal teams, external partners and customers can use to create Quick Starts for the OCP Console.( QUICKSTART CRD)
  • Provide proper documentation and templates to make it as easy as possible to create Quick Starts
  • Update the Console Operator to generate Quick Starts for enabling key services on the OCP Clusters:
    • Serverless
    • Pipelines
    • Virtualization
    • OCS
    • ServiceMesh
  • Process to validate Quick Starts for each release (Internal Teams Only)
  • Support Internationalization

Requirements

 

Requirement Notes isMvp?
Define QuickStart CRD   YES
Console Operator: Out of the box support for installing Quick Starts for enabling Operators   YES
Process( Design\Review), Documentation + Template for providing out of the box quick starts   YES
Process, Docs, for enabling Operators to add Quick Starts   YES
Migrate existing UI to work with CRD    
Move Existing Quick Starts to new CRD   YES
Support Internationalization   NO
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?

Goal

Provide a dynamic extensible mechanism to add Guided Tours to the OCP Console.

  • Format of Guided Tours (Almost like Interactive Documentation)
    • Markup language
    • Made up of Long Description, Steps, Links(Internal, External), Images
    • Steps should be links to the page. User navigates to the next page by clicking a button
    • A tour may offer a link to a secondary tour
    • A tour may offer an external link to documentation, etc
    • Progress indicator, indicating the number of steps in the tour
    • Perspective
  • Tour information to be displayed on the Cards in the landing page
    • Title
    • Summary (aka Short Desc)
    • Image
    • Approx time
    • Perspective
    • Indication if already completed (local storage, maybe this is punted til User Pref)

User Stories/Scenarios
As a product owner, I need a mechanism to add guide tours that will help guide my users to enable my service.
As a product owner, I need a mechanism to add guide tours that will help guide my users to consume my service.
As an Operator, I need a mechanism to add guide tours that will help guide my users to consume my service.

+Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

As user I would like to see a YAMLSample for the new QuickStart CRD when I go and create my own QuickStarts.

 

Epic Goal

  • ...

Why is this important?

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Epic Goal*

Provide a long term solution to SELinux context labeling in OCP.

 
Why is this important? (mandatory)

As of today when selinux is enabled, the PV's files are relabeled when attaching the PV to the pod, this can cause timeout when the PVs contains lot of files as well as overloading the storage backend.

https://access.redhat.com/solutions/6221251 provides few workarounds until the proper fix is implemented. Unfortunately these workaround are not perfect and we need a long term seamless optimised solution.

This feature tracks the long term solution where the PV FS will be mounted with the right selinux context thus avoiding to relabel every file.

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1. Apply new context when there is none
  2. Change context of all files/folders when changing context
  3. RWO & RWX PVs
    1. ReadWriteOncePod PVs first
    2. RWX PV in a second phase

As we are relying on mount context there should not be any relabeling (chcon) because all files / folders will inherit the context from the mount context

More on design & scenarios in the KEP  and related epic STOR-1173

Dependencies (internal and external) (mandatory)

None for the core feature

However the driver will have to set SELinuxMountSupported to true in the CSIDriverSpec to enable this feature. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - STOR
  • Documentation - STOR
  • QE - STOR
  • PX - 
  • Others -

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Epic Goal

Support upstream feature "SELinux relabeling using mount options (CSIDriver API change)"" in OCP as Beta, i.e. test it and have docs for it (unless it's Alpha upstream).

Summary: If Pod has defined SELinux context (e.g. it uses "resticted" SCC) and it uses ReadWriteOncePod PVC and CSI driver responsible for the volume supports this feature, kubelet + the CSI driver will mount the volume directly with the correct SELinux labels. Therefore CRI-O does not need to recursive relabel the volume and pod startup can be significantly faster. We will need a thorough documentation for this.

This upstream epic actually will be implemented by us!

Why is this important?

  • We get this upstream feature through Kubernetes rebase. We should ensure it works well in OCP and we have docs for it.

Upstream links

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. External: the feature is currently scheduled for Beta in Kubernetes 1.27, i.e. OCP 4.14, but it may change before Kubernetes 1.27 GA.

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

As a cluster user, I want to use mounting with SELinux context without any configuration.

This means OCP ships CSIDriver objects with "SELinuxMount: true" for CSI drivers that support mounting with "-o context". I.e. all CSI drivers that are based on block volumes and use ext4/xfs should have this enabled.

This Epic is to track upstream work in the Storage SIG community

This Epic is to track the SELinux specific work required. fsGroup work is not included here.

Goal: 

Continue contributing to and help move along the upstream efforts to enable recursive permissions functionality.

Finish current SELinuxMountReadWriteOncePod feature upstream:

  • Implement it in all volume plugins (current alpha has just iSCSI and CSI
  • Add e2e test + fixing all tests that don't work well with SELinux
  • Implement necessary changes in volume reconstruction to reconstruct also SELinux context.

The feature is probably going to stay alpha upstream.

Problem: 

Recursive permission change takes very long for fsGroup and SELinux. For volumes with many small files Kubernetes currently does a chown for every file on the volume (due to fsGroup). Similarly for container runtimes (such as CRI-O) a chcon of every file on the volume is performed due to SCC's SELinux context. Data on the volume may already have the correct GID/SELinux context so Kubernetes needs way to detect this automatically to avoid the long delay.

Why is this important: 

  • A user wants to bring their pod online quickly and efficiently.  

Dependencies (internal and external):

 

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL):

 

Previous Work:

Customers:

Open questions:

  •  

Notes:

As OCP developer (and as OCP user in the future), I want all CSI drivers shipped as part of OCP to support mounting with -o context=XYZ, so I can test with CSIDriver.SELinuxMount: true (or my pods are running without CRI-O recursively relabeling my volume).

 

In detail:

  • For CSI drivers based on block devices, pass host's /etc/selinux and /sys/fs/ to the CSI drvier container on the node as HostPath volumes
  • For CSI drivers based on NFS / CIFS: do the same as for block volumes (it won't harm the driver in any way), but investigate if these drivers can actually run with CSIDriver.SELinuxMount: true.

Details: https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling#selinux-support-in-volumes

 

Exit criteria:

  • Verify that CSI drivers shipped by OCP based on block volumes mount volumes with -o context=xyz instead of relabeling the volumes by CRI-O. That should happen when all these conditions are satisfied:
    • SELinuxMountReadWriteOncePod and ReadWriteOncePod feature gates are enabled
    • CSIDriver.SELinuxMount is set to true manually for the CSI driver. OCP will not do it by default in 4.13, because it requires the alpha feature gates from the previous bullet.
    • PVC has AccessMode: [ReadWriteOncePod] 
    • Pod has SELinux context explicitly assigned, i.e. pod.spec.securityContext (or pod.spec.containers[*].securityContext) has seLinuxOptions set, incl. {{level }}(based on SCC, OCP might do it automatically)
  • This is alpha / dev preview feature, so QE might done when graduating to Beta / tech preview.

Feature Overview (aka. Goal Summary)  

The storage operators need to be automatically restarted after the certificates are renewed.

From OCP doc "The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left."

Since OCP is now offering an 18 months lifecycle per release, the storage operator pods need to be automatically restarted after the certificates are renewed.

Goals (aka. expected user outcomes)

The storage operators will be transparently restarted. The customer benefit should be transparent, it avoids manually restart of the storage operators.

 

Requirements (aka. Acceptance Criteria):

The administrator should not need to restart the storage operator when certificates are renew.

This should apply to all relevant operators with a consistent experience.

 

Use Cases (Optional):

As an administrator I want the storage operators to be automatically restarted when certificates are renewed.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

This feature request is triggered by the new extended OCP lifecycle. We are moving from 12 to 18 months support per release.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

No doc is required

 

Interoperability Considerations

This feature only cover storage but the same behavior should be applied to every relevant  components. 

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

The pod `aws-ebs-csi-driver-controller` mounts the secret:

$ oc get po -n openshift-cluster-csi-drivers aws-ebs-csi-driver-controller-559f74d7cd-5tk4p -o yaml
...
    name: driver-kube-rbac-proxy
    name: provisioner-kube-rbac-proxy
	name: attacher-kube-rbac-proxy
	name: resizer-kube-rbac-proxy
	name: snapshotter-kube-rbac-proxy

    volumeMounts:
    - mountPath: /etc/tls/private
      name: metrics-serving-cert

  volumes:
  - name: metrics-serving-cert
    secret:
      defaultMode: 420
      secretName: aws-ebs-csi-driver-controller-metrics-serving-cert

Hence, if the secret is updated (e.g. as a result of CA cert update), the Pod must be restarted

Feature Overview

Console enhancements based on customer RFEs that improve customer user experience.

 

Goals

  • This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

 

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

 

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

(Optional) Use Cases

This Section: 

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

 

Questions to answer…

  • ...

 

Out of Scope

 

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

 

Assumptions

  • ...

 

Customer Considerations

  • ...

 

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Based on https://issues.redhat.com/browse/RFE-3775 we should be extending our proxy package timeout to match the browser's timeout, which is 5 minutes.

AC: Bump the 30second timeout in the proxy pkg to 5 minutes

Feature Overview (aka. Goal Summary)  

Description of problem:

Even though in 4.11 we introduced LegacyServiceAccountTokenNoAutoGeneration to be compatible with upstream K8s to not generate secrets with tokens when service accounts are created, today OpenShift still creates secrets and tokens that are used for legacy usage of openshift-controller as well as the image-pull secrets. 

 

Customer issues:

Customers see auto-generated secrets for service accounts which is flagged as a security risk. 

 

This Feature is to track the implementation for removing legacy usage and image-pull secret generation as well so that NO secrets are auto-generated when a Service Account is created on OpenShift cluster. 

 

Goals (aka. expected user outcomes)

NO Secrets to be auto-generated when creating service accounts 

Requirements (aka. Acceptance Criteria):

Following *secrets need to NOT be generated automatically with every Serivce account creation:*  

  1. ImagePullSecrets : This is needed for Kubelet to fetch registry credentials directly. Implementation needed for the following upstream feature.
    https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2133-kubelet-credential-providers/README.md
  2. Dockerconfig secrets: The openshift-controller-manager relies on the old token secrets and it creates them so that it's able to generate registry credentials for the SAs. There is a PR that was created to remove this https://github.com/openshift/openshift-controller-manager/pull/223.

 

 

 Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

Concerns/Risks: Replacing functionality of one of the openshift-controller used for controllers that's been in the code for a long time may impact behaviors that w

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

Existing documentation needs to be clear on where we are today and why we are providing the above 2 credentials. Related Tracker: https://issues.redhat.com/browse/OCPBUGS-13226 

 

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Feature Overview

As an Infrastructure Administrator, I want to deploy OpenShift on Nutanix distributing the control plane and compute nodes across multiple regions and zones, forming different failure domains.

As an Infrastructure Administrator, I want to configure an existing OpenShift cluster to distribute the nodes across regions and zones, forming different failure domains.

Goals

Install OpenShift on Nutanix using IPI / UPI in multiple regions and zones.

Requirements (aka. Acceptance Criteria):

  • Ensure Nutanix IPI can successfully be deployed with ODF across multiple zones (like we do with vSphere, AWS, GCP & Azure)
  • Ensure zonal configuration in Nutanix using UPI is documented and tested

vSphere Implementation

This implementation would follow the same idea that has been done for vSphere. The following are the main PRs for vSphere:

https://github.com/openshift/enhancements/blob/master/enhancements/installer/vsphere-ipi-zonal.md 

 

Existing vSphere documentation

https://docs.openshift.com/container-platform/4.13/installing/installing_vsphere/installing-vsphere-installer-provisioned-customizations.html#configuring-vsphere-regions-zones_installing-vsphere-installer-provisioned-customizations

https://docs.openshift.com/container-platform/4.13/post_installation_configuration/post-install-vsphere-zones-regions-configuration.html

Feature Overview (aka. Goal Summary)  

This feature will track upstream work from the OpenShift Control Plane teams - API, Auth, etcd, Workloads, and Storage.

Goals (aka. expected user outcomes)

To continue and develop meaningful contributions to the upstream community including feature delivery, bug fixes, and leadership contributions.

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

From https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#topologyspreadconstraints-field:

Note: The matchLabelKeys field is a beta-level field and enabled by default in 1.27. You can disable it by disabling the MatchLabelKeysInPodTopologySpread [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/).

Removing from the TP as the feature is enabled by default.

Just a clean up work.

Feature Overview

Console enhancements based on customer RFEs that improve customer user experience.

 

Goals

  • This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

 

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

 

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

(Optional) Use Cases

This Section: 

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

 

Questions to answer…

  • ...

 

Out of Scope

 

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

 

Assumptions

  • ...

 

Customer Considerations

  • ...

 

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

According to security best practice, it's recommended to set readOnlyRootFilesystem: true for all containers running on kubernetes. Given that openshift-console does not set that explicitly, it's requested that this is being evaluated and if possible set to readOnlyRootFilesystem: true or otherwise to readOnlyRootFilesystem: false with a potential explanation why the file-system needs to be write-able.

3. Why does the customer need this? (List the business requirements here)
Extensive security audits are run on OpenShift Container Platform 4 and are highlighting that many vendor specific container is missing to set readOnlyRootFilesystem: true or else justify why readOnlyRootFilesystem: false is set.

 

AC: Set up readOnlyRootFilesystem field on both console and console-operator deployment's spec. Part of the work is to determine the value. True if the pod if not doing any writing to its filesystem, otherwise false.

Feature Overview (aka. Goal Summary)  

Unify and update hosted control planes storage operators so that they have similar code patterns and can run properly in both standalone OCP and HyperShift's control plane.

Goals (aka. expected user outcomes)

  • Simplify the operators with a unified code pattern
  • Expose metrics from control-plane components
  • Use proper RBACs in the guest cluster
  • Scale the pods according to HostedControlPlane's AvailabilityPolicy
  • Add proper node selector and pod affinity for mgmt cluster pods

Requirements (aka. Acceptance Criteria):

  • OCP regression tests work in both standalone OCP and HyperShift
  • Code in the operators looks the same
  • Metrics from control-plane components are exposed
  • Proper RBACs are used in the guest cluster
  • Pods scale according to HostedControlPlane's AvailabilityPolicy
  • Proper node selector and pod affinity is added for mgmt cluster pods

 

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

 

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Epic Goal*

Our current design of EBS driver operator to support Hypershift does not scale well to other drivers. Existing design will lead to more code duplication between driver operators and possibility of errors.
 
Why is this important? (mandatory)

An improved design will allow more storage drivers and their operators to be added to hypershift without requiring significant changes in the code internals.
 
Scenarios (mandatory) 

 
Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation -
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

This feature is the place holder for all epics related to technical debt associated with Console team 

Outcome Overview

Once all Features and/or Initiatives in this Outcome are complete, what tangible, incremental, and (ideally) measurable movement will be made toward the company's Strategic Goal(s)?

 

Success Criteria

What is the success criteria for this strategic outcome?  Avoid listing Features or Initiatives and instead describe "what must be true" for the outcome to be considered delivered.

 

 

Expected Results (what, how, when)

What incremental impact do you expect to create toward the company's Strategic Goals by delivering this outcome?  (possible examples:  unblocking sales, shifts in product metrics, etc. + provide links to metrics that will be used post-completion for review & pivot decisions). {}For each expected result, list what you will measure and when you will measure it (ex. provide links to existing information or metrics that will be used post-completion for review and specify when you will review the measurement such as 60 days after the work is complete)

 

 

Post Completion Review – Actual Results

After completing the work (as determined by the "when" in Expected Results above), list the actual results observed / measured during Post Completion review(s).

 

Feature Overview (aka. Goal Summary)  

Upstream Kuberenetes is following other SIGs by moving it's intree cloud providers to an out of tree plugin format, Cloud Controller Manager, at some point in a future Kubernetes release. OpenShift needs to be ready to action this change  

Goals (aka. expected user outcomes)

GA of the cloud controller manager for the GCP platform

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

 

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

 

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Epic Goal

  • We need to GA the GCP Cloud Controller Manager 

Why is this important?

  • Upstream is moving to out of tree cloud providers and we need to be ready

Scenarios

  1. ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Background

To make the CCM GA, we need to update the switch case in library go to make sure the GCP CCM is always considered external.

We then need to update the vendor in KCMO, CCMO, KASO and MCO.

Steps

  • Create a PR for updating library go
  • Create PRs for updating the vendor in dependent repos
  • Leverage an engineer with merge right (eg David Eads) to merge the library go, KCMO and CCMO changes simultaneously
  • Merge KASO and MCO changes

Stakeholders

  • Cluster Infra
  •  

Definition of Done

  • GCP CCM is enabled by default
  • Docs
  • N/A
  • Testing
  • <Explain testing that will be added>

Feature Overview (aka. Goal Summary)  

Add ControlPlaneMachineSet for vSphere

 

The OpenShift API needs to be updated to define VSphereFailureDomain.  A draft PR is here: https://github.com/openshift/api/pull/1539

Also, ensure that the client-go and openshift-cluster-config-operator projects are bumped once the API changes merge.

Epic Goal*

What is our purpose in implementing this?  What new capability will be available to customers?

 
Why is this important? (mandatory)

What are the benefits to the customer or Red Hat?   Does it improve security, performance, supportability, etc?  Why is work a priority?

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1.  

 
Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation -
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Epic Goal

  • Update all images that we ship with OpenShift to the latest upstream releases and libraries.
  • Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF).

Why is this important?

  • We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

  • external-attacher
  • external-provisioner
  • external-resizer
  • external-snapshotter
  • node-driver-registrar
  • livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in  go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.

< High-Level description of the feature ie: Executive Summary >

Goals

Cluster administrators need an in-product experience to discover and install new Red Hat offerings that can add high value to developer workflows.

Requirements

Requirements Notes IS MVP
Discover new offerings in Home Dashboard   Y
Access details outlining value of offerings   Y
Access step-by-step guide to install offering   N
Allow developers to easily find and use newly installed offerings   Y
Support air-gapped clusters   Y
    • (Optional) Use Cases

< What are we making, for who, and why/what problem are we solving?>

Out of scope

Discovering solutions that are not available for installation on cluster

Dependencies

No known dependencies

Background, and strategic fit

 

Assumptions

None

 

Customer Considerations

 

Documentation Considerations

Quick Starts 

What does success look like?

 

QE Contact

 

Impact

 

Related Architecture/Technical Documents

 

Done Checklist

  • Acceptance criteria are met
  • Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
  • User Journey automation is delivered
  • Support and SRE teams are provided with enough skills to support the feature in production environment

Problem:

Cluster admins need to be guided to install RHDH on the cluster.

Goal:

Enable admins to discover RHDH, be guided to installing it on the cluster, and verifying its configuration.

Why is it important?

RHDH is a key multi-cluster offering for developers. This will enable customers to self-discover and install RHDH.

Acceptance criteria:

  1. Show RHDH card in Admin->Dashboard view
  2. Enable link to RHDH documentation from the card
  3. Quick start to install RHDH operator
  4. Guided flow to installation and configuration of operator from Quick Start
  5. RHDH UI link in top menu
  6. Successful log in to RHDH

Dependencies (External/Internal):

RHDH operator

Design Artifacts:

Exploration:

Note:

Description

As a cluster admin, I want to see and learn how to install Janus IDP / Red Hat Developer Hub (RHDH)

Acceptance Criteria

  1. Create a Quickstart that is part of the console operator that explains how to install Janus IDP / Red Hat Developer Hub (RHDH)

Additional Details:

Feature Overview (aka. Goal Summary)  

The MCO should properly report its state in a way that's consistent and able to be understood by customers, troubleshooters, and maintainers alike. 

Some customer cases have revealed scenarios where the MCO state reporting is misleading and therefore could be unreliable to base decisions and automation on.

In addition to correcting some incorrect states, the MCO will be enhanced for a more granular view of update rollouts across machines.

The MCO should properly report its state in a way that's consistent and able to be understood by customers, troubleshooters, and maintainers alike. 

For this epic, "state" means "what is the MCO doing?" – so the goal here is to try to make sure that it's always known what the MCO is doing. 

This includes: 

  • Conditions
  • Some Logging 
  • Possibly Some Events 

While this probably crosses a little bit into the "status" portion of certain MCO objects, as some state is definitely recorded there, this probably shouldn't turn into a "better status reporting" epic.  I'm interpreting "status" to mean "how is it going" so status is maybe a "detail attached to a state". 

 

Exploration here: https://docs.google.com/document/d/1j6Qea98aVP12kzmPbR_3Y-3-meJQBf0_K6HxZOkzbNk/edit?usp=sharing

 

https://docs.google.com/document/d/17qYml7CETIaDmcEO-6OGQGNO0d7HtfyU7W4OMA6kTeM/edit?usp=sharing

 

Ensure that the pod exists but the functionality behind the pod is not exposed by default in the release version this work ships in.

This can be done by creating a new featuregate in openshift/api, vendoring that into the cluster config operator, and then checking for this featuregate in the state controller code of the MCO.

Feature Overview (aka. Goal Summary)

When the internal oauth-server and oauth-apiserver are removed and replaced with an external OIDC issuer (like azure AD), the console must work for human users of the external OIDC issuer.

Goals (aka. expected user outcomes)

An end user can use the openshift console without a notable difference in experience.  This must eventually work on both hypershift and standalone, but hypershift is the first priority if it impacts delivery

Requirements (aka. Acceptance Criteria):

  1. User can log in and use the console
  2. User can get a kubeconfig that functions on the CLI with matching oc
  3. Both of those work on hypershift
  4. both of those work on standalone.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • When installed with external OIDC, the clientID and clientSecret need to be configurable to match the external (and unmanaged) OIDC server

Why is this important?

  • Without a configurable clientID and secret, I don't think the console can identify the user.
  • There must be a mechanism to do this on both hypershift and openshift, though the API may be very similar.

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • When the oauthclient API is not present, the operator must stop creating the oauthclient

Why is this important?

  • This is preventing the operator from creating its deployment
  •  

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Problem statement

DPDK applications require dedicated CPUs, and isolated any preemption (other processes, kernel threads, interrupts), and this can be achieved with the “static” policy of the CPU manager: the container resources need to include an integer number of CPUs of equal value in “limits” and “request”. For instance, to get six exclusive CPUs:

spec:

  containers:

  - name: CNF

    image: myCNF

    resources:

      limits:

        cpu: "6"

      requests:

        cpu: "6"

 

The six CPUs are dedicated to that container, however non trivial, meaning real DPDK applications do not use all of those CPUs as there is always at least one of the CPU running a slow-path, processing configuration, printing logs (among DPDK coding rules: no syscall in PMD threads, or you are in trouble). Even the DPDK PMD drivers and core libraries include pthreads which are intended to sleep, they are infrastructure pthreads processing link change interrupts for instance.

Can we envision going with two processes, one with isolated cores, one with the slow-path ones, so we can have two containers? Unfortunately no: going in a multi-process design, where only dedicated pthreads would run on a process is not an option as DPDK multi-process is going deprecated upstream and has never picked up as it never properly worked. Fixing it and changing DPDK architecture to systematically have two processes is absolutely not possible within a year, and would require all DPDK applications to be re-written. Knowing that the first and current multi-process implementation is a failure, nothing guarantees that a second one would be successful.

The slow-path CPUs are only consuming a fraction of a real CPU and can safely be run on the “shared” CPU pool of the CPU Manager, however containers specifications do not accept to request two kinds of CPUs, for instance:

 

spec:

  containers:

  - name: CNF

    image: myCNF

    resources:

      limits:

        cpu_dedicated: "4"

        cpu_shared: "20m"

      requests:

        cpu_dedicated: "4"

        cpu_shared: "20m"

Why do we care about allocating one extra CPU per container?

  • Allocating one extra CPU means allocating an additional physical core, as the CPUs running DPDK application should run on a dedicated physical core, in order to get maximum and deterministic performances, as caches and CPU units are shared between the two hyperthreads.
  • CNFs are built with a minimum of CPUs per container. This is still between 10 and 20, sometime more, today, but the intent is to decrease this number of CPU and increase the number of containers as this is the “cloud native” way to waste resources by having too large containers to schedule, like in the VNF days (tetris effect)

Let’s take a realistic example, based on a real RAN CNF: running 6 containers with dedicated CPUs on a worker node, with a slow Path requiring 0.1 CPUs means that we waste 5 CPUs, meaning 3 physical cores. With real life numbers:

  • For a single datacenter composed of 100 nodes, we waste 300 physical cores
  • For a single datacenter composed of 500 nodes, we waste 1500 physical cores
  • For a single node OpenShift deployed on 1 Millions of nodes, we waste 3 Millions of physical cores

Intel public CPU price per core is around 150 US$, not even taking into account the ecological aspect of the waste of (rare) materials and the electricity and cooling…

 

Goals

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

Questions to answer…

  • Would an implementation based on annotations be possible rather than an implementation requiring a container (so pod) definition change, like the CPU pooler does?

Out of Scope

Background, and strategic fit

This issue has been addressed lately by OpenStack.

Assumptions

  • ...

Customer Considerations

  • ...

Documentation Considerations

  • The feature needs documentation on how to configure OCP, create pods, and troubleshoot

Epic Goal

  • An NRI plugin that invoked by CRI-O right before the container creation, and updates the container's cpuset and quota to match the mixed-cpus request. 
  • The cpu pinning reconciliation operation must also execute the NRI API call on every update (so we can intercept kubelet and it does not destroy our changes)
  • Dev Preview for 4.15

Why is this important?

  • This would unblock lots of options including mixed cpu workloads where some CPUs could be shared among containers / pods CNF-3706
  • This would also allow further research on dynamic (simulated) hyper threading CNF-3743

Scenarios

  1. ...

Acceptance Criteria

  • Have an NRI plugin which called by the runtime and updates the container with mutual cpus.
  • The plugin must be able to override CPU manager conciliation loop and immune to future CPU manager changes.
  • The plugin must be robust and handle node reboot/kubelet/crio restart scenarios
  • upstream CI - MUST be running successfully with tests automated.
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • OCP adoption in relevant OCP version 
  • NTO shall be able to deploy the new plugin

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. https://issues.redhat.com/browse/CNF-3706 : Spike - mix of shared and pinned/dedicated cpus within a container
  2. https://issues.redhat.com/browse/CNF-3743 : Spike: Dynamic offlining of cpu siblings to simulate no-smt
  3. upstream Node Resource Interface project - https://github.com/containerd/nri 
  4. https://issues.redhat.com/browse/CNF-6082: [SPIKE] Cpus assigned hook point in CRI-O
  5. https://issues.redhat.com/browse/CNF-7603 

Open questions::

  N/A

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview

Telecommunications providers continue to deploy OpenShift at the Far Edge. The acceleration of this adoption and the nature of existing Telecommunication infrastructure and processes drive the need to improve OpenShift provisioning speed at the Far Edge site and the simplicity of preparation and deployment of Far Edge clusters, at scale.

Goals

  • Simplicity The folks preparing and installing OpenShift clusters (typically SNO) at the Far Edge range in technical expertise from technician to barista. The preparation and installation phases need to be reduced to a human-readable script that can be utilized by a variety of non-technical operators. There should be as few steps as possible in both the preparation and installation phases.
  • Minimize Deployment Time A telecommunications provider technician or brick-and-mortar employee who is installing an OpenShift cluster, at the Far Edge site, needs to be able to do it quickly. The technician has to wait for the node to become in-service (CaaS and CNF provisioned and running) before they can move on to installing another cluster at a different site. The brick-and-mortar employee has other job functions to fulfill and can't stare at the server for 2 hours. The install time at the far edge site should be in the order of minutes, ideally less than 20m.
  • Utilize Telco Facilities Telecommunication providers have existing Service Depots where they currently prepare SW/HW prior to shipping servers to Far Edge sites. They have asked RH to provide a simple method to pre-install OCP onto servers in these facilities. They want to do parallelized batch installation to a set of servers so that they can put these servers into a pool from which any server can be shipped to any site. They also would like to validate and update servers in these pre-installed server pools, as needed.
  • Validation before Shipment Telecommunications Providers incur a large cost if forced to manage software failures at the Far Edge due to the scale and physical disparate nature of the use case. They want to be able to validate the OCP and CNF software before taking the server to the Far Edge site as a last minute sanity check before shipping the platform to the Far Edge site.
  • IPSec Support at Cluster Boot Some far edge deployments occur on an insecure network and for that reason access to the host’s BMC is not allowed, additionally an IPSec tunnel must be established before any traffic leaves the cluster once its at the Far Edge site. It is not possible to enable IPSec on the BMC NIC and therefore even OpenShift has booted the BMC is still not accessible.

Requirements

  • Factory Depot: Install OCP with minimal steps
    • Telecommunications Providers don't want an installation experience, just pick a version and hit enter to install
    • Configuration w/ DU Profile (PTP, SR-IOV, see telco engineering for details) as well as customer-specific addons (Ignition Overrides, MachineConfig, and other operators: ODF, FEC SR-IOV, for example)
    • The installation cannot increase in-service OCP compute budget (don't install anything other that what is needed for DU)
    • Provide ability to validate previously installed OCP nodes
    • Provide ability to update previously installed OCP nodes
    • 100 parallel installations at Service Depot
  • Far Edge: Deploy OCP with minimal steps
    • Provide site specific information via usb/file mount or simple interface
    • Minimize time spent at far edge site by technician/barista/installer
    • Register with desired RHACM Hub cluster for ongoing LCM
  • Minimal ongoing maintenance of solution
    • Some, but not all telco operators, do not want to install and maintain an OCP / ACM cluster at Service Depot
  • The current IPSec solution requires a libreswan container to run on the host so that all N/S OCP traffic is encrypted. With the current IPSec solution this feature would need to support provisioning host-based containers.

 

A list of specific needs or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts.  If a non MVP requirement slips, it does not shift the feature.

requirement Notes isMvp?
     
     
     

 

Describe Use Cases (if needed)

Telecommunications Service Provider Technicians will be rolling out OCP w/ a vDU configuration to new Far Edge sites, at scale. They will be working from a service depot where they will pre-install/pre-image a set of Far Edge servers to be deployed at a later date. When ready for deployment, a technician will take one of these generic-OCP servers to a Far Edge site, enter the site specific information, wait for confirmation that the vDU is in-service/online, and then move on to deploy another server to a different Far Edge site.

 

Retail employees in brick-and-mortar stores will install SNO servers and it needs to be as simple as possible. The servers will likely be shipped to the retail store, cabled and powered by a retail employee and the site-specific information needs to be provided to the system in the simplest way possible, ideally without any action from the retail employee.

 

Out of Scope

Q: how challenging will it be to support multi-node clusters with this feature?

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

< Are there Upgrade considerations that customers need to account for or that the feature should address on behalf of the customer?>

<Does the Feature introduce data that could be gathered and used for Insights purposes?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

< What does success look like?>

< Does this feature have doc impact?  Possible values are: New Content, Updates to existing content,  Release Note, or No Doc Impact>

< If unsure and no Technical Writer is available, please contact Content Strategy. If yes, complete the following.>

  • <What concepts do customers need to understand to be successful in [action]?>
  • <How do we expect customers will use the feature? For what purpose(s)?>
  • <What reference material might a customer want/need to complete [action]?>
  • <Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available. >
  • <What is the doc impact (New Content, Updates to existing content, or Release Note)?>

Interoperability Considerations

< Which other products and versions in our portfolio does this feature impact?>

< What interoperability test scenarios should be factored by the layered product(s)?>

Questions

Question Outcome
   

 

 

Epic Goal

  • Install SNO within 10 minutes

Why is this important?

  • SNO installation takes around 40+ minutes.
  • This makes SNO less appealing when compared to k3s/microshift.
  • We should analyze the  SNO installation, figure our why it takes so long and come up with ways to optimize it

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

  1. https://docs.google.com/document/d/1ULmKBzfT7MibbTS6Sy3cNtjqDX1o7Q0Rek3tAe1LSGA/edit?usp=sharing

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>
  • I'm not sure this is a CVO issue, but I think CVO is the one creating the namespace, CVO also renders some manifests during bootkube so it seems like the right component.

Description of problem:

The bootkube scripts spend ~1 minute failing to apply manifests while waiting fot eh openshift-config namespace to get created

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1.Run the POC using the makefile here https://github.com/eranco74/bootstrap-in-place-poc
2. Observe the bootkube logs (pre-reboot) 

Actual results:

Jan 12 17:37:09 master1 cluster-bootstrap[5156]: Failed to create "0000_00_cluster-version-operator_01_adminack_configmap.yaml" configmaps.v1./admin-acks -n openshift-config: namespaces "openshift-config" not found
....
Jan 12 17:38:27 master1 cluster-bootstrap[5156]: "secret-initial-kube-controller-manager-service-account-private-key.yaml": failed to create secrets.v1./initial-service-account-private-key -n openshift-config: namespaces "openshift-config" not found

Here are the logs from another installation showing that it's not 1 or 2 manifests that require this namespace to get created earlier:

Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-ca-bundle-configmap.yaml": failed to create configmaps.v1./etcd-ca-bundle -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-client-secret.yaml": failed to create secrets.v1./etcd-client -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-metric-client-secret.yaml": failed to create secrets.v1./etcd-metric-client -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-metric-serving-ca-configmap.yaml": failed to create configmaps.v1./etcd-metric-serving-ca -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-metric-signer-secret.yaml": failed to create secrets.v1./etcd-metric-signer -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-serving-ca-configmap.yaml": failed to create configmaps.v1./etcd-serving-ca -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-signer-secret.yaml": failed to create secrets.v1./etcd-signer -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "kube-apiserver-serving-ca-configmap.yaml": failed to create configmaps.v1./initial-kube-apiserver-server-ca -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "openshift-config-secret-pull-secret.yaml": failed to create secrets.v1./pull-secret -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "openshift-install-manifests.yaml": failed to create configmaps.v1./openshift-install-manifests -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "secret-initial-kube-controller-manager-service-account-private-key.yaml": failed to create secrets.v1./initial-service-account-private-key -n openshift-config: namespaces "openshift-config" not found

Expected results:

expected resources to get created successfully without having to wait for the namespace to get created.

Additional info:

 

Description of problem:

while trying to figure out why it takes so long to install Single node OpenShift I noticed that the kube-controller-manager cluster operator is degraded for ~5 minutes due to:
GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused
I don't understand how the prometheusClient is successfully initialized, but we get a connection refused once we try to query the rules.
Note that if the client initialization fails the kube-controller-manger won't set the  GarbageCollectorDegraded to true.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. install SNO with bootstrap in place (https://github.com/eranco74/bootstrap-in-place-poc)

2. monitor the cluster operators staus 

Actual results:

GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused 

Expected results:

Expected the GarbageCollectorDegraded status to be false

Additional info:

It seems that for PrometheusClient to be successfully initialised it needs to successfully create a connection but we get connection refused once we make the query.
Note that installing SNO with this patch (https://github.com/eranco74/cluster-kube-controller-manager-operator/commit/26e644503a8f04aa6d116ace6b9eb7b9b9f2f23f) reduces the installation time by 3 minutes


openshift- service-ca service-ca pod takes a few minutes to start when installing SNO

kubectl get events -n openshift-service-ca --sort-by='.metadata.creationTimestamp' -o custom-columns=FirstSeen:.firstTimestamp,LastSeen:.lastTimestamp,Count:.count,From:.source.component,Type:.type,Reason:.reason,Message:.message                      
FirstSeen              LastSeen               Count   From                                                                                              Type      Reason                 Message
2023-01-22T12:25:58Z   2023-01-22T12:25:58Z   1       deployment-controller                                                                             Normal    ScalingReplicaSet      Scaled up replica set service-ca-6dc5c758d to 1
2023-01-22T12:26:12Z   2023-01-22T12:27:53Z   9       replicaset-controller                                                                             Warning   FailedCreate           Error creating: pods "service-ca-6dc5c758d-" is forbidden: error fetching namespace "openshift-service-ca": unable to find annotation openshift.io/sa.scc.uid-range
2023-01-22T12:27:58Z   2023-01-22T12:27:58Z   1       replicaset-controller                                                                             Normal    SuccessfulCreate       Created pod: service-ca-6dc5c758d-k7bsd
2023-01-22T12:27:58Z   2023-01-22T12:27:58Z   1       default-scheduler                                                                                 Normal    Scheduled              Successfully assigned openshift-service-ca/service-ca-6dc5c758d-k7bsd to master1
 

Seems that creating the serivce-ca namespace early allows it to get
openshift.io/sa.scc.uid-range annotation and start running earlier, the
service-ca pod is required for other pods (CVO and all the control plane pods) to start since it's creating the serving-cert 

Feature Overview

Reduce the OpenShift platform and associated RH provided components to a single physical core on Intel Sapphire Rapids platform for vDU deployments on SingleNode OpenShift.

Goals

  • Reduce CaaS platform compute needs so that it can fit within a single physical core with Hyperthreading enabled. (i.e. 2 CPUs)
  • Ensure existing DU Profile components fit within reduced compute budget.
  • Ensure existing ZTP, TALM, Observability and ACM functionality is not affected.
  • Ensure largest partner vDU can run on Single Core OCP.

Requirements

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES
 
Provide a mechanism to tune the platform to use only one physical core. 
Users need to be able to tune different platforms.  YES 
Allow for full zero touch provisioning of a node with the minimal core budget configuration.   Node provisioned with SNO Far Edge provisioning method - i.e. ZTP via RHACM, using DU Profile. YES 
Platform meets all MVP KPIs   YES

(Optional) Use Cases

  • Main success scenario: A telecommunications provider uses ZTP to provision a vDU workload on Single Node OpenShift instance running on an Intel Sapphire Rapids platform. The SNO is managed by an ACM instance and it's lifecycle is managed by TALM.

Questions to answer…

  • N/A

Out of Scope

  • Core budget reduction on the Remote Worker Node deployment model.

Background, and strategic fit

Assumptions

  • The more compute power available for RAN workloads directly translates to the volume of cell coverage that a Far Edge node can support.
  • Telecommunications providers want to maximize the cell coverage on Far Edge nodes.
  • To provide as much compute power as possible the OpenShift platform must use as little compute power as possible.
  • As newer generations of servers are deployed at the Far Edge and the core count increases, no additional cores will be given to the platform for basic operation, all resources will be given to the workloads.

Customer Considerations

  • ...

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
    • Administrators must know how to tune their Far Edge nodes to make them as computationally efficient as possible.
  • Does this feature have doc impact?
    • Possibly, there should be documentation describing how to tune the Far Edge node such that the platform uses as little compute power as possible.
  • New Content, Updates to existing content, Release Note, or No Doc Impact
    • Probably updates to existing content
  • If unsure and no Technical Writer is available, please contact Content Strategy. What concepts do customers need to understand to be successful in [action]?
    • Performance Addon Operator, tuned, MCO, Performance Profile Creator
  • How do we expect customers will use the feature? For what purpose(s)?
    • Customers will use the Performance Profile Creator to tune their Far Edge nodes. They will use RHACM (ZTP) to provision a Far Edge Single-Node OpenShift deployment with the appropriate Performance Profile.
  • What reference material might a customer want/need to complete [action]?
    • Performance Addon Operator, Performance Profile Creator
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
    • N/A
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?
    • Likely updates to existing content / unsure
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Complete Epics

This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were completed when this image was assembled

 

Monitoring needs to be reliable and is the very useful when trying to debug clusters in an already degraded state. We want to ensure that metrics scraping can always work if the scraper can reach the target, even if the kube-apiserver is unavailable or unreachable. To do this, we will combine a local authorizer (already merged in many binaries and the rbac-proxy) and client-cert based authentication to have a fully local authentication and authorization path for scraper targets.

If networking (or part of networking) is down and a scraper target cannot reach the kube-apiserver to verify a token and a subjectaccessreview, then the metrics scraper can be rejected. The subjectaccessreview (authorization) is already largely addressed, but service account tokens are still used for scraping targets. Tokens require an external network call that we can avoid by using client certificates. Gathering metrics, especially client metrics, from partially functionally clusters helps narrow the search area between kube-apiserver, etcd, kubelet, and SDN considerably.

In addition, this will significantly reduce the load on the kube-apiserver. We have observed in the CI cluster that token and subject access reviews are a significant percentage of all kube-apiserver traffic.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User story:

As cluster-policy-controller I automatically approve cert signing requests issued by monitoring.

DoD:

  • cert signing requests issued by the cluster-monitoring-operator service account are approved automatically.

Implementation hints: leverage approving logic implemented in https://github.com/openshift/library-go/pull/1083.

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

User Story

As an cluster administrator of a disconnected OCP cluster,
I want a list of possible sample images to mirror
So that I can configure my image mirror prior to installing OCP in a disconnected environment.

Acceptance Criteria

  • Publish the list of the sample images to mirror as a ConfigMap in the samples operator namespace.
  • Provide instructions on how to obtain the current image SHAs from the list above (via podman or skopeo).
  • Reference the ConfigMap name in our "import failing" alert.
  • [optional] Reference the ConfigMap name in our "Removed" condition message.

Notes

it is too onerous to find a connected cluster in order to obtain the list of possible samples images to mirror using the current documented procedures.
I need a list make available to me in my disconnected cluster that I can reference after initial install.

User Story

Sample operator use of the reason field in its config object to track imagestream import completion has resulted in that singleton being a bottleneck and source of update conflicts (we are talking 60 or 70 imagestreams potentially updating that once field concurrently).

Acceptance Criteria

  • Reduction in reconciliation errors when imagestream imports complete (success or failure).
  • ConfigMaps containing failing imagstream imports are added to must-gather
  • After all imagestreams successfully import, there are no error-recording ConfigMaps in the samples-operator namespace.

Notes

See this hackday PR

for an alternative approach which uses a configmap per imagestream

Goal:

Notify users that the lastImageChangeTriggeredID field in the BuildConfig spec is deprecated, and will be ignored in a future release.

Problem:

BuildConfigs use the lastImageChangeTriggeredID field to record the SHA of the last image used to trigger a build. This information should only be managed by the Build controllers and be placed in the BuildConfig status. By keeping the data in spec, cluster admins and developers can easily alter or remove this information.

Why is this important?

Moving this information to status is a prerequisite for using GitOps to manage BuildConfigs with ImageChange triggers.

Dependencies

None

Stories and Deliverables

  • BUILD-187: Fire event if build was triggered by clearing the LastImageChangeTriggeredID

Estimate: (XS, S, M, L, XL, XXL) S

Previous Work:

https://bugzilla.redhat.com/show_bug.cgi?id=1876500

User Stories:

As an OpenShift engineer
I want to image change triggers to record their information BuildConfig status
So that eventually changes to ImageChange triggers in the BuildConfig spec do not cause builds to launch

As a developer using OpenShift to build images
I want to trigger a build when the base image of my application changes
So that bug fixes and security patches are applied to my application.

As an OpenShift cluster admin
I want to be alerted if something cleared the LastImageTriggeredID in a BuildConfig
So that I am aware that cluster users are relying on deprecated behavior.

Success Criteria:

1. Image change triggers record lastImageTriggeredID in the BuildConfig's status
2. Customers are notified that lastImageTriggeredID is deprecated in BuildConfig's spec
3. Customers are alerted if their users are relying on deprecated behavior

Open Questions:


See https://docs.google.com/document/d/1thcVR31TElJBXBGmyaYXZ9jRejp5W4-ppNYn4g3_mgk/edit# for a fuller explanation on how to use this template.

User Story

As an OpenShift cluster admin
I want to be alerted if something cleared the LastImageTriggeredID in a BuildConfig
So that I am aware that cluster users are relying on deprecated behavior.

Acceptance Criteria

  • Cluster administrators can use system information to identify if a build was triggered by having the LastImagechangeTriggerID cleared
  • Deprecation notice regarding this behavior is visible in the web console.

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

After review with the monitoring team, we agreed to use Warning events rather than alerts. To properly create an alert for this behavior, we would have needed to introduce a metric with unbound cardinality. Such metrics can lead to performance degradation in the cluster monitoring system.

When the LastImageChangeTriggerID is cleared in a BuildConfig object, a warning event is created which should appear in relevant contexts within the Web Console.


Guiding Questions

User Story

  • Is this intended for an administrator, application developer, or other type of OpenShift user?
  • What experience level is this intended for? New, experienced, etc.?
  • Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
  • Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

  • How should a customer use and/or configure the feature?
  • Are there any prerequisites for using/enabling the feature?

Notes

  • Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
  • Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
  • Are there any recommended best practices when using this feature?
  • On feature completion, are there any known issues that customers should be aware of?

Goal:

Record the lastImageChangeTriggeredID field in the BuildConfig status, instead of spec.

Problem:

BuildConfigs use the lastImageChangeTriggeredID field to record the SHA of the last image used to trigger a build. This information should only be managed by the Build controllers and be placed in the BuildConfig status. By keeping the data in spec, cluster admins and developers can easily alter or remove this information.

Why is this important?

Moving this information to status is a prerequisite for using GitOps to manage BuildConfigs with ImageChange triggers.

Dependencies

None

Stories and Deliverables

  • BUILD-186: Record LastImageChangeTriggeredID in status

Estimate: (XS, S, M, L, XL, XXL) S

Previous Work:

https://bugzilla.redhat.com/show_bug.cgi?id=1876500

User Stories:

As an OpenShift engineer
I want to image change triggers to record their information BuildConfig status
So that eventually changes to ImageChange triggers in the BuildConfig spec do not cause builds to launch

As a developer using OpenShift to build images
I want to trigger a build when the base image of my application changes
So that bug fixes and security patches are applied to my application.

As an OpenShift cluster admin
I want to be alerted if something cleared the LastImageTriggeredID in a BuildConfig
So that I am aware that cluster users are relying on deprecated behavior.

Success Criteria:

1. Image change triggers record lastImageTriggeredID in the BuildConfig's status
2. Customers are notified that lastImageTriggeredID is deprecated in BuildConfig's spec
3. Customers are alerted if their users are relying on deprecated behavior

Open Questions:


See https://docs.google.com/document/d/1thcVR31TElJBXBGmyaYXZ9jRejp5W4-ppNYn4g3_mgk/edit# for a fuller explanation on how to use this template.

User Story

As an OpenShift engineer
I want to image change triggers to record their information BuildConfig status
So that eventually changes to ImageChange triggers in the BuildConfig spec do not cause builds to launch

As a developer using OpenShift to build images
I want to trigger a build when the base image of my application changes
So that bug fixes and security patches are applied to my application.

Acceptance Criteria

Docs Impact

Release note should inform customers that the lastImageChangeTriggeredID field in the BuildConfig spec is deprecated, and will be ignored in OCP 4.9. Users relying on this information should update their scripts and jobs to read the triggered image ID from BuildConfig status. 

Please see https://issues.redhat.com/browse/RHDEVDOCS-2738 for more details.

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

If lastImageTriggeredID is set to the empty string or nil, today this will trigger a build. By moving the data to status, we can ignore the lastImageTriggeredID field in spec in a future release. To give users proper notice, we are not going to remove the current behavior until a later release.

A deprecation notice will need to be added to the release notes upon completion.

Changes will be needed in the following:

1. API
2. openshift-apiserver
3. openshift-controller-manager (build controller)

See https://bugzilla.redhat.com/show_bug.cgi?id=1876500.


Guiding Questions

User Story

  • Is this intended for an administrator, application developer, or other type of OpenShift user?
  • What experience level is this intended for? New, experienced, etc.?
  • Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
  • Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

  • How should a customer use and/or configure the feature?
  • Are there any prerequisites for using/enabling the feature?

Notes

  • Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
  • Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
  • Are there any recommended best practices when using this feature?
  • On feature completion, are there any known issues that customers should be aware of?

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Rebase OpenShift components to k8s v1.22
  • Rebase Jenkins and plugins to latest long term support versions

Why is this important?

  • Rebasing ensures components work with the upcoming release of Kubernetes
  • Address tech debt related to upstream deprecations and removals.

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. k8s 1.22 release - expected August 4th 2021

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story

Rebase samples operator to k8s 1.22

Acceptance Criteria

  • Samples operator deploys with k8s 1.22 libraries
  • Core components continue to function (CI tests pass, including build suite).

Docs Impact

None

Notes

Epic Goal

  • Update OpenShift components that are owned by the Builds + Jenkins Team to use Kubernetes 1.25

Why is this important?

  • Our components need to be updated to ensure that they are using the latest bug/CVE fixes, features, and that they are API compatible with other OpenShift components.

Acceptance Criteria

  • Existing CI/CD tests must be passing

Epic Goal

  • Update OpenShift components that are owned by the Builds + Jenkins Team to use Kubernetes 1.27

Why is this important?

  • Our components need to be updated to ensure that they are using the latest bug/CVE fixes, features, and that they are API compatible with other OpenShift components.

Acceptance Criteria

  • Existing CI/CD tests must be passing

We should expand the set of Cypress e2e tests we started in 4.6. This can either be new tests, or migrating some of our more flaky protractor tests.

An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.

It has been a few releases since we've gone through and updated the various frontend dependencies for console. We should update major dependencies such as TypeScript, React, and webpack.

We should consider updating our TypeScript typings as well, which have gotten out of date.

An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.

Console operator should swap from using monis.app to openshift/operator-boilerplate-legacy. This will allow switching to klog/v2, which the shared libs (api,client-go,library-go) have already done.

This epic contains all the Dynamic Plugins related stories for OCP release-4.11 

Epic Goal

  • Track all the stories under a single epic

Acceptance Criteria

  •  

In the 4.11 release, a console.openshift.io/default-i18next-namespace annotation is being introduced. The annotation indicates whether the ConsolePlugin contains localization resources. If the annotation is set to "true", the localization resources from the i18n namespace named after the dynamic plugin (e.g. plugin__kubevirt), are loaded. If the annotation is set to any other value or is missing on the ConsolePlugin resource, localization resources are not loaded. 

 

In case these resources are not present in the dynamic plugin, the initial console load will be slowed down. For more info check BZ#2015654

 

AC:

  • console-operator should be checking for the new console.openshift.io/use-i18n annotation, update the console-config.yaml accordingly and redeploy the console server
  • console server should pick up the changes in the console-config.yaml and only load the i18n namespace that are available

 

Follow up of https://issues.redhat.com/browse/CONSOLE-3159

 

 

This epic contains all the Dynamic Plugins related stories for OCP release-4.12

Epic Goal

  • Track all the stories under a single epic

Acceptance Criteria

when defining two proxy endpoints, 
apiVersion: console.openshift.io/v1alpha1
kind: ConsolePlugin
metadata:
...
name: forklift-console-plugin
spec:
displayName: Console Plugin Template
proxy:

  • alias: forklift-inventory
    authorize: true
    service:
    name: forklift-inventory
    namespace: konveyor-forklift
    port: 8443
    type: Service
  • alias: forklift-must-gather-api
    authorize: true
    service:
    name: forklift-must-gather-api
    namespace: konveyor-forklift
    port: 8443
    type: Service

service:
basePath: /
I get two proxy endpoints
/api/proxy/plugin/forklift-console-plugin/forklift-inventory
and
/api/proxy/plugin/forklift-console-plugin/forklift-must-gather-api

but both proxy to the `forklift-must-gather-api` service

e.g.
curl to:
[server url]/api/proxy/plugin/forklift-console-plugin/forklift-inventory
will point to the `forklift-must-gather-api` service, instead of the `forklift-inventory` service

Currently the ConsolePlugins API version is v1alpha1. Since we are going GA with dynamic plugins we should be creating a v1 version.

This would require updates in following repositories:

  1. openshift/api (add the v1 version and generate a new CRD)
  2. openshift/client-go (picku the changes in the openshift/api repo and generate clients & informers for the new v1 version)
  3. openshift/console-operator repository will using both the new v1 version and v1alpha1 in code and manifests folder.

AC:

  • both v1 and v1alpha1 ConsolePlugins should be passed to the console-config.yaml when the plugins are enabled and present on the cluster.

 

NOTE: This story does not include the conversion webhook change which will be created as a follow on story

This epic contains all the OLM related stories for OCP release-4.12

Epic Goal

  • Track all the stories under a single epic

This enhancement Introduces support for provisioning and upgrading heterogenous architecture clusters in phases.

 

We need to scan through the compute nodes and build a set of supported architectures from those. Each node on the cluster has a label for architecture: e.g. kubernetes.io/arch=arm64, kubernetes.io/arch=amd64 etc. Based on the set of supported architectures console will need to surface only those operators in the Operator Hub, which are supported on our Nodes.

 

AC: 

  1. Implement logic in the console-operator that will scan though all the nodes and build a set of all the architecture types that the cluster nodes run on and pass it to the console-config.yaml
  2. Add unit and e2e test cases in the console-operator repository.

 

@jpoulin is good to ask about heterogeneous clusters.

Goal: Support OCI images.

Problem: Buildah and podman use OCI format by default and OpenShift Image Registry and ImageStream API doesn't understand it.

Why is this important: OCI images are supposed to replace Docker schema 2 images, OpenShift should be ready when OCI images become widely adopted.

Dependencies (internal and external):

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL): XL

Previous Work:

Customers:

Open Questions:

User Story

As a user of OpenShift
I want to push OCI images to the registry
So that I can use buildah and podman with their defaults to push images

Acceptance Criteria

  • An OCI image can be pushed to the registry by buildah or podman
  • An imported OCI image can be pulled from the registry
  • The registry should be able to pull-through OCI images from other registries

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:

  • Enhancement proposal link
  • Previous product docs
  • Best practices
  • Known issues

Guiding Questions

User Story

  • Is this intended for an administrator, application developer, or other type of OpenShift user?
  • What experience level is this intended for? New, experienced, etc.?
  • Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
  • Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

  • How should a customer use and/or configure the feature?
  • Are there any prerequisites for using/enabling the feature?

Notes

  • Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
  • Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
  • Are there any recommended best practices when using this feature?
  • On feature completion, are there any known issues that customers should be aware of?

User Story

As a user of OpenShift
I want to import OCI images
So that I can use images that are built by new tools

Acceptance Criteria

  • oc import-image works with OCI images

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:


Guiding Questions

User Story

  • Is this intended for an administrator, application developer, or other type of OpenShift user?
  • What experience level is this intended for? New, experienced, etc.?
  • Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
  • Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

  • How should a customer use and/or configure the feature?
  • Are there any prerequisites for using/enabling the feature?

Notes

  • Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
  • Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
  • Are there any recommended best practices when using this feature?
  • On feature completion, are there any known issues that customers should be aware of?

User Story

As a user of OpenShift
I want the image pruner to be aware of OCI images
So that it doesn't delete their layers/configs

Acceptance Criteria

    • When
      • an OCI image is pushed/mirrored to the registry
      • an schema 2 image is pushes/mirrored to the registry and share its layers/config with the OCI image
      • the schema 2 image is eligible to be pruned
      • the shared layers/config are not shared with other images
    • the pruner
      • will delete the schema 2 image
      • will NOT delete the OCI image and its layers/config

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:

  • Enhancement proposal link
  • Previous product docs
  • Best practices
  • Known issues

Guiding Questions

User Story

  • Is this intended for an administrator, application developer, or other type of OpenShift user?
  • What experience level is this intended for? New, experienced, etc.?
  • Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
  • Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

  • How should a customer use and/or configure the feature?
  • Are there any prerequisites for using/enabling the feature?

Notes

  • Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
  • Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
  • Are there any recommended best practices when using this feature?
  • On feature completion, are there any known issues that customers should be aware of?

Epic Goal

  • Improve CI testing of the image registry components.

Why is this important?

  • The image registry, image API and the image pruner had a lot of tests removed during transition 4.0. This may make the platform less stable and/or slow down the team.

Scenarios

  1. ...

Acceptance Criteria

  • CI - tests should be more stable and have broader coverage

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.

An the registry developer
I want e2e-upgrade jobs to monitor availability of the registry during upgrades
So that I can be sure that clients can use the registry without disruptions.

Acceptance Criteria

  • A new test in openshift/origin repo.

Notes

https://github.com/openshift/origin/blob/e6b3d1ece61d7c3ab5a23151c9875e1f9ad36838/test/extended/util/disruption/controlplane/controlplane.go#L69

https://bugzilla.redhat.com/show_bug.cgi?id=1884380

https://github.com/openshift/origin/pull/25475/files marked our tests for ISI as Disruptive.

Tests should wait until operators become stable, otherwise other tests will be run on an unstable cluster and it'll cause flakes.

Acceptance Criteria

  • The tests wait until operators stable after image.config changes.
  • The tests is no longer [Disruptive].
  • If tests are slow (it depends on other operator, but MCO tends to be slow), they should be [Slow].

The integration tests for the image registry expect that OpenShift and tests are run on the same machine (i.e. OpenShift can connect to sockets that the tests listen). This is not the case with e2e tests.

Acceptance Criteria

  • every integration test is converted into an e2e test or a techdebt story
  • image-registry tests are green

Goal: Rebase registry to Docker Distribution 

Problem: The registry is currently based on an outdated version of the upstream docker/distribution project. The base does not even have a version associated with it - DevEx last rebased on an untagged commit.

Why is this important: Update the registry with improvements and bug fixes from the upstream community.

Dependencies (internal and external):

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL): M

Previous Work:

Customers:

 

Open questions:

User Story

As a user of OpenShift
I want the image registry to be rebased on the latest docker/distribution release (v2.7.1)
So that the image registry has the latest upstream bugfixes and enhancements

Acceptance Criteria

  • Image registry is based on docker/distribution v2.7.1

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:

  • Enhancement proposal link
  • Previous product docs
  • Best practices
  • Known issues

Guiding Questions

User Story

  • Is this intended for an administrator, application developer, or other type of OpenShift user?
  • What experience level is this intended for? New, experienced, etc.?
  • Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
  • Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

  • How should a customer use and/or configure the feature?
  • Are there any prerequisites for using/enabling the feature?

Notes

  • Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
  • Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
  • Are there any recommended best practices when using this feature?
  • On feature completion, are there any known issues that customers should be aware of?

Epic Goal

  • We need the installer to accept a LB type from user and then we could set type of LB in the following object.
    oc get ingress.config.openshift.io/cluster -o yaml
    Then we can fetch info from this object and reconcile the operator to have the NLB changes reflected.

 

This is an API change and we will consider this as a feature request.

Why is this important?

https://issues.redhat.com/browse/NE-799 Please check this for more details

 

Scenarios

https://issues.redhat.com/browse/NE-799 Please check this for more details

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. installer
  2. ingress operator

Previous Work (Optional):

 No

Open questions::

N/A

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

We need tests for the ovirt-csi-driver and the cluster-api-provider-ovirt. These tests help us to

  • minimize bugs,
  • reproduce and fix them faster and
  • pin down current behavior of the driver

Also, having dedicated tests on lower levels with a smaller scope (unit, integration, ...) has the following benefits:

  • fast feedback cycle (local test execution)
  • developer in-code documentation
  • easier onboarding for new contributers
  • lower resource consumption

Overview

Today in the Dev Perspective navigation, we have pre-pinned resources in the "third section" of the vertical navigation. Customers have asked if there is a way for admins to to provide defaults for those pre-pinned resources for all users.

Acceptance Criteria

  1. Add a section to the Cluster/Console menu for Developer pre-pinned resources
  2. Admin can define which nav items exist by default for all users & is able to re-order them
  3. These defaults should be used as the default pre-pinned items for all new users
  4. Users who's nav items have not been customized will inherit these defaults the next time they log in. 

Exploration Results

Miro Board

Slack Channel

tbd

Description

As an admin, I want to define the pre-pinned resources on the developer perspective navigation

Based on the https://issues.redhat.com/browse/ODC-7181 enhancement proposal, it is required to extend the console configuration CRD to enable the cluster admins to configure this data in the console resource

Acceptance Criteria

  1. Extend the "customization" spec type definition for the CRD in the openshift/api project

Additional Details:

Previous customization work:

  1. https://issues.redhat.com/browse/ODC-5416
  2. https://issues.redhat.com/browse/ODC-5020
  3. https://issues.redhat.com/browse/ODC-5447

Goal

Provide a form driven experience to allow cluster admins to manage the perspectives to meet the ACs below.

Problem:

We have heard the following requests from customers and developer advocates:

  • Some admins do not want to provide access to the Developer Perspective from the console
  • Some admins do not want to provide non-priv users access to the Admin Perspective from the console

Acceptance criteria:

  1. Cluster administrator is able to "hide" the admin perspective for non-priv users
  2. Cluster administrator is able to "hide" the developer perspective for all users
  3. Be user that User Preferences for individual users behaves appropriately. If only one perspective is available, the perspective switcher is not needed.

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Description

As an admin, I want to hide the admin perspective for non-privileged users or hide the developer perspective for all users

Based on the https://issues.redhat.com/browse/ODC-6730 enhancement proposal, it is required to extend the console configuration CRD to enable the cluster admins to configure this data in the console resource

Acceptance Criteria

  1. Extend the "customization" spec type definition for the CRD in the openshift/api project

Additional Details:

Previous customization work:

  1. https://issues.redhat.com/browse/ODC-5416
  2. https://issues.redhat.com/browse/ODC-5020
  3. https://issues.redhat.com/browse/ODC-5447

Problem:

Currently we are only able to get limited telemetry from the Dev Sandbox, but not from any of our managed clusters or on prem clusters.

Goals:

  1. Enable gathering segment telemetry whenever cluster telemetry is enabled on OSD clusters
  2. Have our OSD clusters opt into telemetry by default
  3. Work with PM & UX to identify additional metrics to capture in addition to what we have enabled currently on Sandbox.
  4. Ability to get a single report from woopra across all of our Sandbox and OSD clusters.
  5. Be able to generate a report including metrics of a single cluster or all clusters of a certain type ( sandbox, or OSD)

Why is it important?

In order to improve properly analyze usage and the user experience, we need to be able to gather as much data as possible.

Acceptance Criteria

  1. Extend console backend (bridge) to provide configuration as SERVER_FLAGS
    // JS type
    telemetry?: Record<string, string>
    
    1. Read the annotation of the cluster ConfigMap for telemetry data and pass them into the internal serverconfig.
    2. Pass through this internal serverconfig and export it as SERVER_FLAGS.
    3. Add a new --telemetry CLI option so that the telemetry options could be tested in a dev environment:
      ./bin/bridge --telemetry SEGMENT_API_KEY=a-key-123-xzy
      ./bin/bridge --telemetry CONSOLE_LOG=debug
      
  2. TBD: In best case the new annotation could be read from the cluster ConfigMap...
    1. Otherwise update the console-operator to pass the annotation from the console cluster configuration to the console ConfigMap.

Additional Details:

  1. More information about the integration with the backend could be found in the Telemetry on OSD clusters Google Doc

Problem:

Customers don't want their users to have access to some/all of the items which are available in the Developer Catalog.  The request is to change access for the cluster, not per user or persona.

Goal:

Provide a form driven experience to allow cluster admins easily disable the Developer Catalog, or one or more of the sub catalogs in the Developer Catalog.

Why is it important?

Multiple customer requests.

Acceptance criteria:

  1. As a cluster admin, I can hide/disable access to the developer catalog for all users across all namespaces.
  2. As a cluster admin, I can hide/disable access to a specific sub-catalog in the developer catalog for all users across all namespaces.
    1. Builder Images
    2. Templates
    3. Helm Charts
    4. Devfiles
    5. Operator Backed

Notes

We need to consider how this will work with subcatalogs which are installed by operators: VMs, Event Sources, Event Catalogs, Managed Services, Cloud based services

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Description

As an admin, I want to hide/disable access to specific sub-catalogs in the developer catalog or the complete dev catalog for all users across all namespaces.

Based on the https://issues.redhat.com/browse/ODC-6732 enhancement proposal, it is required to extend the console configuration CRD to enable the cluster admins to configure this data in the console resource

Acceptance Criteria

Extend the "customization" spec type definition for the CRD in the openshift/api project

Additional Details:

Previous customization work:

  1. https://issues.redhat.com/browse/ODC-5416
  2. https://issues.redhat.com/browse/ODC-5020
  3. https://issues.redhat.com/browse/ODC-5447

Goal

This epic has 3 main goals

  1. Improve segment implementation so that we can easily enable additional telemetry pieces (hotjar, etc) for particular cluster types (starting with sandbox, maybe expanding to RHPDS). This will help us better understand where errors and drop off occurs in our trial and workshop clusters, thus being able to (1) help conversion and (2) proactively detect issues before they are "reported" by customers.
  2. Improve telemetry so we can START capturing console usage across the fleet
  3. Additional improvements to segment, to enable proper gathering of user telemetry and analysis

Problem

Currently we have no accurate telemetry of usage of the OpenShift Console across all clusters in the fleet. We should be able to utilize the auth and console telemetry to glean details which will allow us to get a picture of console usage by our customers.

Acceptance criteria

Let's do a spike to validate, and possibly have to update this list after the spike:

Need to verify HOW do we define a cluster Admin -> Listing all namespaces in a cluster? Install operators? Make sure that we consider OSD cluster admins as well (this should be aligned with how we send people to dev perspective in my mind)

Capture additional information via console plugin ( and possibly the auth operator )

  1. Average number of users per cluster
  2. Average number of cluster admin users per cluster
  3. Average number of dev users per cluster
  4. Average # of page views across the fleet
  5. Average # of page views per perspective across the fleet
  6. # of cluster which have disabled the admin perspective for any users
  7. # of cluster which have disabled the dev perspective for any users
  8. # of cluster which have disabled the “any” perspective for any users
  9. # of clusters which have plugin “x” installed
  10. Total number of unique users across the fleet
  11. Total number of cluster admin users across the fleet
  12. Total number of developer users across the fleet

Dependencies (External/Internal):

Understanding how to capture telemetry via the console operator

Exploration:

Note:

We have removed the following ACs for this release:

  1. (p2) Average total active time spent per User in console (per cluster for all users)
    1. per Cluster Admins
    2. per non-Cluster Admins
  2. (p2) Average active time spent in Dev Perspective [implies we can calculate this for admin perspective]
    1. per Cluster Admins
    2. per non-Cluster Admins-
  3. (p3) Average # of times they change the perspective (per cluster for all users)

As Red Hat, we want to understand the usage of the (dev) console, for that, we want to add new Prometheus metrics (how many users have a cluster, etc.) and collect them later (as telemetry data) via cluster-monitoring-operator.

Eigther the console-operator or cluster-monitoring-operator needs to apply a PrometheusRule to collect the right data and make these later available in Superset DataHat or Tableau.

Incomplete Epics

This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were not completed when this image was assembled

An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.

Console-operator should switch from using bindata to using assets, similar to what cluster-kube-apiserver-operator and other operators are doing so we dont need to regenerate the bindata when yaml files are changes. 

There is also an issue with generating bindata on ARM and other arch., where switching to assets, will  make it obsolete.

 

https://github.com/openshift/cluster-kube-apiserver-operator/blob/005a95607cf9f8db490e962b549811d8bc0c5eaf/bindata/assets.go

Cluster-version operator (CVO) manifests declaring a runlevel are supposed to use 0000_<runlevel>_<dash-separated-component>_<manifest_filename>, but since api#598 added 0000_10-helm-chart-repository.crd.yaml and api#1084 added 0000_10-project-helm-chart-repository.crd.yaml, those have diverged from that pattern, so the cluster-version operator will fail to parse their runlevel. They're still getting sorted into a runlevel around 10 by this code, but unless there are reasons that the CRD needs to be pushed early in an update, it gives the CVO the ability to parallelize the reconciliation with more sibling resources if you leave the runlevel prefix off in the API repository (after which these COPY lines would need a matching bump).

This epic contains all the Dynamic Plugins related stories for OCP release-4.14 and implementing Core SDK utils.

Epic Goal

  • Track all the stories under a single epic

Acceptance Criteria

We need to enable the storage for the v1 version of our ConsolePlugin CRD in the API repository. ConsolePlugin v1 CRD was added in CONSOLE-3077.

 

AC: Enable the storage for the v1 version of ConsolePlugin CRD and disable the storage for v1alpha1 version

Epic Goal

Remove code that was added thought the ACM integration into all of the console's codebase repositories

Why is this important?

Since there was decision made stop with the ACM integration, we as a team decided that it would be better to remove the unused code in order avoid any confusion or regressions.

Acceptance Criteria

  • Identify all the places from which we need to remove the code that was added during the ACM integration.
  • Come up with a plan how to remove the code from our repositories and CI
  • Remove the code from console-operator repoy
  • Start with code removal from the console repository

Remove all multicluster-related code from the console operator repo.

AC:

  • All multicluster code is removed
    • controllers
    • helpers
    • tests
    • bindata
    • api
    • types
    • manifests
  • No regressions are introduced
  • Remove ACM related conditions in the starter

Epic Goal

Why is this important?

  • For autoscaling OpenShift needs an resource metrics implementation. Currently this depends on CMO's Prometheus stack
  • Some users would like to opt-out of running a fully fledged Prometheus stack, see https://issues.redhat.com/browse/MON-3152

Scenarios

  1. A cluster admin decides to not deploy a full Monitoring stack, autoscaling must still work.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. https://github.com/jan--f/cluster-monitoring-operator/tree/metrics-server 

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Following points to consider to implement for switching to metrics-server:

  • Add this feature in Tech-Preview mode
  • Config option to switch metrics-server: https://issues.redhat.com/browse/MON-3214
    • enable-metrics-server
    • disable-metrics-server (deploy prometheus adapter back and update APIService to point to prometheus adapter)
  • Flow of creating/deleting objects:
    • Deploy metrics-server
    • Verify metrics-server resource metrics api is working
    • Update APIService object to point to metrics-server
    • Delete prometheus-adapter resources
  • Deploy in HA mode (2 replicas which is the case for all components we deploy)

 

Acceptance Criteria:

  • we have a PR in the CMO repo
  • we have a POC
  • people can start deploying it in their cluster

Vendor openshift/api to openshift/cluster-config-operator to bring featuregate changes. 

 

https://github.com/openshift/api/pull/1615 is merged but this change won't be available unless we vendor change in openshift/cluster-config-operator 

We need to Bump the version of K8 and to run a library sync for OCP4.13 .Two stories will be created for each activity

Owner: Architect:

Story (Required)

As a Sample Operator Developer, I will like to run the library sync process, so the new libraries can be pushed to OCP 4.15

Background (Required)

This is a runbook we need to execute on every release of OpenShift

Glossary

NA

Out of scope

NA

In Scope

NA

Approach(Required)

Follow instructions here: https://source.redhat.com/groups/public/appservices/wiki/cluster_samples_operator_release_activities

Dependencies

Library Repo

Edge Case

Acceptance Criteria

 Library sync PR is merged in master

INVEST Checklist

 Dependencies identified
 Blockers noted and expected delivery timelines set
 Design is implementable
 Acceptance criteria agreed upon
 Story estimated

Legend

 Unknown
 Verified
 Unsatisfied

 

Other Complete

This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were completed when this image was assembled

Description of problem:

ROSA is being branded via custom branding; as a result, the favicon disappears since we do not want any Red Hat/Openshift-specific branding to appear when custom branding is in use.  Since ROSA is a Red Hat product, it should get a branding option added to the console so all the correct branding including favicon appears.

Version-Release number of selected component (if applicable):

4.14.0, 4.13.z, 4.12.z, 4.11.z

How reproducible:

Always

Steps to Reproduce:

1.  View a ROSA cluster
2.  Note the absence of the OpenShift logo favicon

Please review the following PR: https://github.com/openshift/console-operator/pull/737

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Refer to the CIS RedHat OpenShift Container Platform Benchmark PDF: https://drive.google.com/file/d/12o6O-M2lqz__BgmtBrfeJu1GA2SJ352c/view
1.1.7 Ensure that the etcd pod specification file permissions are set to 600 or more restrictive (Manual)
======================================================================================================
As per CIS v1.3 PDF permissions should be 600 with the following statement:
"The pod specification file is created on control plane nodes at /etc/kubernetes/manifests/etcd-member.yaml with permissions 644. Verify that the permissions are 600 or more restrictive."
But when I ran the following command it was showing 644 permissions

for i in $(oc get pods -n openshift-etcd -l app=etcd -o name | grep etcd )
do
echo "check pod $i"
oc rsh -n openshift-etcd $i \
stat -c %a /etc/kubernetes/manifests/etcd-pod.yaml
done

Description of problem:

https://github.com/openshift/api/pull/1186 - https://issues.redhat.com/browse/CONSOLE-3069 promoted ConsolePlugin CRD to v1.

The PR introduces also a conversion webhook from v1alpha1 to v1.

In new CRD version I18n ConsolePluginI18n is marked as optional.
The conversion webhook will not set a default valid ("Lazy"/"Preload") value writing the v1 object and a v1 object completely omitting spec.i18n will be accepted we no valid default value as well.

On the other side, at garbage collection time the object will be stuck forever due to the lack of a valid value for spec.i18n.loadType

Example,
create a v1 ConsolePlugin object:

cat <<EOF | oc apply -f -
apiVersion: console.openshift.io/v1
kind: ConsolePlugin
metadata:
  name: test472
spec:
  backend:
    service:
      basePath: /
      name: test472-service
      namespace: kubevirt-hyperconverged
      port: 9443
    type: Service
  displayName: Test 472 Plugin
EOF

Delete it in foreground mode:
stirabos@t14s:~$ oc delete consoleplugin test472 --timeout=30s --cascade='foreground' -v 7
I1011 18:20:03.255605   31610 loader.go:372] Config loaded from file:  /home/stirabos/.kube/config
I1011 18:20:03.266567   31610 round_trippers.go:463] DELETE https://api.ci-ln-krdzphb-72292.gcp-2.ci.openshift.org:6443/apis/console.openshift.io/v1/consoleplugins/test472
I1011 18:20:03.266581   31610 round_trippers.go:469] Request Headers:
I1011 18:20:03.266588   31610 round_trippers.go:473]     Accept: application/json
I1011 18:20:03.266594   31610 round_trippers.go:473]     Content-Type: application/json
I1011 18:20:03.266600   31610 round_trippers.go:473]     User-Agent: oc/4.11.0 (linux/amd64) kubernetes/fcf512e
I1011 18:20:03.266606   31610 round_trippers.go:473]     Authorization: Bearer <masked>
I1011 18:20:03.688569   31610 round_trippers.go:574] Response Status: 200 OK in 421 milliseconds
consoleplugin.console.openshift.io "test472" deleted
I1011 18:20:03.688911   31610 round_trippers.go:463] GET https://api.ci-ln-krdzphb-72292.gcp-2.ci.openshift.org:6443/apis/console.openshift.io/v1/consoleplugins?fieldSelector=metadata.name%3Dtest472
I1011 18:20:03.688919   31610 round_trippers.go:469] Request Headers:
I1011 18:20:03.688928   31610 round_trippers.go:473]     Authorization: Bearer <masked>
I1011 18:20:03.688935   31610 round_trippers.go:473]     Accept: application/json
I1011 18:20:03.688941   31610 round_trippers.go:473]     User-Agent: oc/4.11.0 (linux/amd64) kubernetes/fcf512e
I1011 18:20:03.840103   31610 round_trippers.go:574] Response Status: 200 OK in 151 milliseconds
I1011 18:20:03.840825   31610 round_trippers.go:463] GET https://api.ci-ln-krdzphb-72292.gcp-2.ci.openshift.org:6443/apis/console.openshift.io/v1/consoleplugins?fieldSelector=metadata.name%3Dtest472&resourceVersion=175205&watch=true
I1011 18:20:03.840848   31610 round_trippers.go:469] Request Headers:
I1011 18:20:03.840884   31610 round_trippers.go:473]     Accept: application/json
I1011 18:20:03.840907   31610 round_trippers.go:473]     User-Agent: oc/4.11.0 (linux/amd64) kubernetes/fcf512e
I1011 18:20:03.840928   31610 round_trippers.go:473]     Authorization: Bearer <masked>
I1011 18:20:03.972219   31610 round_trippers.go:574] Response Status: 200 OK in 131 milliseconds
error: timed out waiting for the condition on consoleplugins/test472

and in kube-controller-manager logs we see:

2022-10-11T16:25:32.192864016Z I1011 16:25:32.192788       1 garbagecollector.go:501] "Processing object" object="test472" objectUID=0cc46a01-113b-4bbe-9c7a-829a97d6867c kind="ConsolePlugin" virtual=false
2022-10-11T16:25:32.282303274Z I1011 16:25:32.282161       1 garbagecollector.go:623] remove DeleteDependents finalizer for item [console.openshift.io/v1/ConsolePlugin, namespace: , name: test472, uid: 0cc46a01-113b-4bbe-9c7a-829a97d6867c]
2022-10-11T16:25:32.304835330Z E1011 16:25:32.304730       1 garbagecollector.go:379] error syncing item &garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"console.openshift.io/v1", Kind:"ConsolePlugin", Name:"test472", UID:"0cc46a01-113b-4bbe-9c7a-829a97d6867c", Controller:(*bool)(nil), BlockOwnerDeletion:(*bool)(nil)}, Namespace:""}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:1, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{}, deletingDependents:true, deletingDependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, beingDeleted:true, beingDeletedLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, virtual:false, virtualLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, owners:[]v1.OwnerReference(nil)}: ConsolePlugin.console.openshift.io "test472" is invalid: spec.i18n.loadType: Unsupported value: "": supported values: "Preload", "Lazy"

Version-Release number of selected component (if applicable):

OCP 4.12.0 ec4

How reproducible:

100% 

Steps to Reproduce:

1. cat <<EOF | oc apply -f -
apiVersion: console.openshift.io/v1
kind: ConsolePlugin
metadata:
  name: test472
spec:
  backend:
    service:
      basePath: /
      name: test472-service
      namespace: kubevirt-hyperconverged
      port: 9443
    type: Service
  displayName: Test 472 Plugin
EOF
2. oc delete consoleplugin test472 --timeout=30s --cascade='foreground' -v 7

Actual results:

2022-10-11T16:25:32.192864016Z I1011 16:25:32.192788       1 garbagecollector.go:501] "Processing object" object="test472" objectUID=0cc46a01-113b-4bbe-9c7a-829a97d6867c kind="ConsolePlugin" virtual=false
2022-10-11T16:25:32.282303274Z I1011 16:25:32.282161       1 garbagecollector.go:623] remove DeleteDependents finalizer for item [console.openshift.io/v1/ConsolePlugin, namespace: , name: test472, uid: 0cc46a01-113b-4bbe-9c7a-829a97d6867c]
2022-10-11T16:25:32.304835330Z E1011 16:25:32.304730       1 garbagecollector.go:379] error syncing item &garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"console.openshift.io/v1", Kind:"ConsolePlugin", Name:"test472", UID:"0cc46a01-113b-4bbe-9c7a-829a97d6867c", Controller:(*bool)(nil), BlockOwnerDeletion:(*bool)(nil)}, Namespace:""}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:1, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{}, deletingDependents:true, deletingDependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, beingDeleted:true, beingDeletedLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, virtual:false, virtualLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, owners:[]v1.OwnerReference(nil)}: ConsolePlugin.console.openshift.io "test472" is invalid: spec.i18n.loadType: Unsupported value: "": supported values: "Preload", "Lazy"

Expected results:

Object correctly deleted

Additional info:

The issue doesn't happen with --cascade='background' which is the default on the CLI client