Feature ACM-2781: Update Quickstart to have operator as a prerequisite

View the Description View the linked PRs

Currently the Get started with on-premise host inventory quickstart gets delivered in the Core console. If we are going to keep it here we need to add the MCE or ACM operator as a prerequisite, otherwise it's very confusing.

https://github.com/openshift/console-operator/pull/717

Feature OCPPLAN-4333: The details of this Jira Card are restricted (Only Red Hat employees and contractors)

View the Description

The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Feature OCPPLAN-5083: The details of this Jira Card are restricted (Only Red Hat employees and contractors)

View the Description

The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Epic NE-338: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story NE-356: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/must-gather/pull/189

Epic NE-414: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story NE-425: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/477

Feature OCPPLAN-5819: The details of this Jira Card are restricted (Only Red Hat employees and contractors)

View the Description

The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Epic CONSOLE-2465: OpenShift console supports running on a single-node production deployment of OpenShift

View the Description

Epic Goal

Support running console in single-node OpenShift configurations for production use in edge computing use cases.
Support disabling the console entirely in some of these configurations to reduce overhead in constrained environments.

Why is this important?

Some bare metal edge customers, especially in the telco market, want to use kubernetes at physically remote sites with minimal hardware.

Scenarios

As a user, I want to deploy a fully supported instance of OpenShift on a single node.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
console can be deployed with a single replica

Dependencies (internal and external)

~~CORS-1589~~

Previous Work (Optional):

Open questions::

Should the console configuration API have a separate option for this setting, or should it use the API created from ~~CORS-1589~~?

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CONSOLE-2526: Update console operator for single-node clusters

View the Description View the linked PRs

https://github.com/openshift/enhancements/pull/555
https://github.com/openshift/api/pull/827

The console operator will need to support single-node clusters.

We have a console deployment and downloads deployment. Each will to be updated so that there's only a single replica when high availability mode is disabled in the Infrastructure config. We should also remove the anti-affinity rule in the console deployment that tries to spread console pods across nodes.

The downloads deployment is currently a static manifest. That likely needs to be created by the console operator instead going forward.

Acceptance Criteria:

Console operator deploys console with 1 replica and no anti-affinity rules when not in high availability mode
Console operator deploys the downloads deployment with 1 replica when not in high availability mode
The console and downloads deployments do not change when in high availability mode
The feature is well-covered by tests

https://github.com/openshift/console-operator/pull/508

Sub-task CONSOLE-2757: Console Operator: Bump github.com/openshift/api

View the Description View the linked PRs

Bump github.com/openshift/api to pickup changes from openshift/api#827

https://github.com/openshift/console-operator/pull/506

Feature OCPPLAN-6839: Single replica control plane topology expansion

View the Description

OCP/Telco Definition of Done
Feature Template descriptions and documentation.

Feature Overview.

Early customer feedback is that they see SNO as a great solution covering smaller footprint deployment, but are wondering what is the evolution story OpenShift is going to provide where more capacity or high availability are needed in the future.

While migration tooling (moving workload/config to new cluster) could be a mid-term solution, customer desire is not to include extra hardware to be involved in this process.

For Telecommunications Providers, at the Far Edge they intend to start small and then grow. Many of these operators will start with a SNO-based DU deployment as an initial investment, but as DUs evolve, different segments of the radio spectrum are added, various radio hardware is provisioned and features delivered to the Far Edge, the Telecommunication Providers desire the ability for their Far Edge deployments to scale up from 1 node to 2 nodes to n nodes. On the opposite side of the spectrum from SNO is MMIMO where there is a robust cluster and workloads use HPA.

Goals

Provide the capability to expand a single replica control plane topology to host more workloads capacity - add worker
Provide the capability to expand a single replica control plane to be a highly available control plane
To satisfy MMIMO Telecommunications providers will want the ability to scale a SNO to a multi-node cluster that can support HPA.
Telecommunications providers do not want workload (DU specifically) downtime when migrating from SNO to a multi-node cluster.
Telecommunications providers wish to be able to scale from one to two or more nodes to support a variety of radio hardware.
Support CP scaling (CP HA) for 2 node cluster, 3 node cluster and n node cluster. As the number of nodes in the cluster increases so does the failure domain of the cluster. The cluster is now supporting more cell sectors and therefore has more of a need for HA and resiliency including the cluster CP.

Requirements

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories
Alternate flow/scenarios - high-level user stories
...

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content Strategy.
What concepts do customers need to understand to be successful in [action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic MGMT-8414: Single Node OpenShift worker node expansion

View the Description

Epic Goal

Documented and supported flow for adding 1, 2, 3 or more workers to a Single Node OpenShift (SNO) deployment without requiring cluster downtime and the understanding that this action will not make the cluster itself highly available.

Why is this important?

Telecommunications and Edge scenarios where HA is handled via failover to another site but single site capacity may vary or need to be expanded over time.
Similar scenarios exist for some ISV vendors where OpenShift is an implementation detail of how they deliver their solution on top of another platform (e.g. VMware).

Scenarios

Adding a worker to a single node openshift cluster.
Adding a second worker to a single node openshift cluster.
Adding a third worker to a single node openshift cluster.
Removing a worker node from a single node openshift cluster that has had 1 or more workers added.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
Customer facing documentation of the add worker flow for SNO.

Dependencies (internal and external)

Previous Work (Optional):

~~MGMT-6606~~

Open questions::

Presumably there is a scale limit on how many workers could be added to an SNO control plane, and it is lower than the limit for a "normal" 3 node control plane. It is not anticipated that this limit will be established in this epic. Intent is to focus on small scale sites where adding 1-3 worker nodes would be beneficial.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task MGMT-9797: Single node + workers enhancement implementation

View the Description View the linked PRs

This is a ticket meant to track all the all the OCP PRs that are involved in the implementation of the SNO + workers enhancement

https://github.com/openshift/console-operator/pull/644

Feature OCPPLAN-7589: The details of this Jira Card are restricted (Only Red Hat employees and contractors)

View the Description

The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Epic CONSOLE-3160: Make console operator optional

View the Description

Epic Goal

Make it possible to disable the console operator at install time, while still having a supported+upgradeable cluster.

Why is this important?

It's possible to disable console itself using spec.managementState in the console operator config. There is no way to remove the console operator, though. For clusters where an admin wants to completely remove console, we should give the option to disable the console operator as well.

Scenarios

I'm an administrator who wants to minimize my OpenShift cluster footprint and who does not want the console installed on my cluster

Acceptance Criteria

It is possible at install time to opt-out of having the console operator installed. Once the cluster comes up, the console operator is not running.

Dependencies (internal and external)

Composable cluster installation

Previous Work (Optional):

Open questions::

The console operator manages the downloads deployment as well. Do we disable the downloads deployment? Long term we want to move to CLI manager: https://github.com/openshift/enhancements/blob/6ae78842d4a87593c63274e02ac7a33cc7f296c3/enhancements/oc/cli-manager.md

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CONSOLE-3220: Update console-operator's manifests with capability annotation

View the Description View the linked PRs

In the console-operator repo we need to add `capability.openshift.io/console` annotation to all the manifests that the operator either contains creates on the fly.

Manifests are currently present in /bindata and /manifest directories.

Here is example of the insights-operator change.

Here is the overall enhancement doc.

https://github.com/openshift/console-operator/pull/665

Feature OCPPLAN-7784: Storage Maintainability

View the Description

We need to continue to maintain specific areas within storage, this is to capture that effort and track it across releases.

Goals

To allow OCP users and cluster admins to detect problems early and with as little interaction with Red Hat as possible.
When Red Hat is involved, make sure we have all the information we need from the customer, i.e. in metrics / telemetry / must-gather.
Reduce storage test flakiness so we can spot real bugs in our CI.

Requirements

Requirement	Notes	isMvp?
Telemetry		No
Certification		No
API metrics		No

Out of Scope

n/a

Background, and strategic fit
With the expected scale of our customer base, we want to keep load of customer tickets / BZs low

Assumptions

Customer Considerations

Documentation Considerations

Target audience: internal
Updated content: none at this time.

Notes

In progress:

CSI certification flakes a lot. We should fix it before we start testing migration.
- In progress (API server restarts...) https://bugzilla.redhat.com/show_bug.cgi?id=1865857

Get local-storage-operator and AWS EBS CSI driver operator logs in must-gather (OLM-managed operators are not included there)
- In progress for LSO (must-gather script being included in image) https://bugzilla.redhat.com/show_bug.cgi?id=1756096

CI flakes:
- Configurable timeouts for e2e tests
  - Azure is slow and times out often
  - Cinder times out formatting volumes
  - AWS resize test times out

High prio:

Env. check tool for VMware - users often mis-configure permissions there and blame OpenShift. If we had a tool they could run, it might report better errors.
- Should it be part of the installer?
- Spike exists

Add / use cloud API call metrics

- Helps customers to understand why things are slow
- Helps build cop to understand a flake
  - With a post-install step that filters data from Prometheus that’s still running in the CI job.
- Ideas:
  - Cloud is throttling X% of API calls longer than Y seconds
  - Attach / detach / provisioning / deletion / mount / unmount / resize takes longer than X seconds?
- Capture metrics of operations that are stuck and won’t finish.
  - Sweep operation map from executioner???
  - Report operation metric into the highest bucket after the bucket threshold (i.e. if 10minutes is the last bucket, report an operation into this bucket after 10 minutes and don’t wait for its completion)?
  - Ask the monitoring team?
- Include in CSI drivers too.
  - With alerts too

Report events for cloud issues
- E.g. cloud API reports weird attach/provision error (e.g. due to outage)
What volume plugins actually users use the most? https://issues.redhat.com/browse/STOR-324

Unsorted

As the number of storage operators grows, it would be grafana board for storage operators
- CSI driver metrics (from CSI sidecars + the driver itself + its operator?)
- CSI migration?

Get aggregated logs in cluster
- They're rotated too soon
- No logs from dead / restarted pods
- No tools to combine logs from multiple pods (e.g. 3 controller managers)

What storage issues customers have? it was 22% of all issues.
- Insufficient docs?
- Probably garbage

Document basic storage troubleshooting for our supports
- What logs are useful when, what log level to use
- This has been discussed during the GSS weekly team meeting; however, it would be beneficial to have this documented.

Common vSphere errors, their debugging and fixing.

Document sig-storage flake handling - not all failed [sig-storage] tests are ours

Epic STOR-857: OCP 4.12 release chores

View the Description

Epic Goal

Update all images that we ship with OpenShift to the latest upstream releases and libraries.
Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.

Why is this important?

We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Story STOR-858: Chore: update libraries in all operators

View the Description View the linked PRs

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

Kubernetes:
- client-go
- controller-runtime
OCP:
- library-go
- openshift/api
- openshift/client-go
- operator-sdk

Operators:

aws-ebs-csi-driver-operator
aws-efs-csi-driver-operator
azure-disk-csi-driver-operator
azure-file-csi-driver-operator
openstack-cinder-csi-driver-operator
gcp-pd-csi-driver-operator
gcp-filestore-csi-driver-operator
manila-csi-driver-operator
ovirt-csi-driver-operator
vmware-vsphere-csi-driver-operator
alibaba-disk-csi-driver-operator
ibm-vpc-block-csi-driver-operator
csi-driver-shared-resource-operator

cluster-storage-operator
csi-snapshot-controller-operator
local-storage-operator
vsphere-problem-detector

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/161

Story STOR-859: Chore: update CSI sidecars

View the Description View the linked PRs

Update all CSI sidecars to the latest upstream release.

external-attacher
external-provisioner
external-resizer
external-snapshotter
node-driver-registrar
livenessprobe

This includes update of VolumeSnapshot CRDs in https://github.com/openshift/cluster-csi-snapshot-controller-operator/tree/master/assets

https://github.com/openshift/csi-external-resizer/pull/132

Feature OCPPLAN-7786: CSI Migration

View the Description

Epic Goal

Enable the migration from a storage intree driver to a CSI based driver with minimal impact to the end user, applications and cluster
These migrations would include, but are not limited to:
- CSI driver for AWS EBS
- CSI driver for GCP
- CSI driver for Azure (file and disk)
- CSI driver for VMware vSphere

Why is this important?

OpenShift needs to maintain it's ability to enable PVCs and PVs of the main storage types
CSI Migration is getting close to GA, we need to have the feature fully tested and enabled in OpenShift
Upstream intree drivers are being deprecated to make way for the CSI drivers prior to intree driver removal

Scenarios

User initiated move to from intree to CSI driver
Upgrade initiated move from intree to CSI driver
Upgrade from EUS to EUS

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic STOR-575: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story STOR-764: Change the default StorageClass to the CSI one

View the Description View the linked PRs

On new installations, we should make the StorageClass created by the CSI operator the default one.

However, we shouldn't do that on an upgrade scenario. The main reason is that users might have set a different quota on the CSI driver Storage Class.

Exit criteria:

New clusters get the CSI Storage Class as the default one.
Existing clusters don't get their default Storage Classes changed.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/160

Feature OCPPLAN-8037: OLM UI: Operator first-class experience

View the Description

Feature Overview

OpenShift console supports new features and elevated experience for Operator Lifecycle Manager (OLM) Operators and Cluster Operators.

Goal:

OCP Console improves the controls and visibility for managing vendor-provided software in customers’ infrastructure and making these solutions available for customers' internal users.

To achieve this,

Operator Lifecycle Manager (OLM) teams have been introducing new features aiming towards simplification and ease of use for both developers and cluster admins.

On the Cluster Operators side, the console iteratively improves the visibilities to the resources being associated with the Operators to improve the overall managing experience.

We want to make sure OLM’s and Cluster Operators' new features are exposed in the console so admin console users can benefit from them.

Benefits:

Cluster admin/Operator consumers:
- Able to see, learn, and interact with OLM managed and/or Cluster Operators associated resources in openShift console.

Requirements

Requirement	Notes	isMvp?
OCP console supports the latest OLM APIs and features	This is a requirement for ALL features.	YES
OCP console improves visibility to Cluster Operators related resources and features.	This is a requirement for ALL features.	YES

(Optional) Use Cases
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Main success scenarios - high-level user stories
* Alternate flow/scenarios - high-level user stories
* ...

Questions to answer...
How will the user interact with this feature?
Which users will use this and when will they use it?
Is this feature used as part of the current user interface?

Out of Scope
<--- Remove this text when creating a Feature in Jira, only for reference --->
# List of non-requirements or things not included in this feature
# ...

Background, and strategic fit
<--- Remove this text when creating a Feature in Jira, only for reference --->
What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there assumptions being made regarding prerequisites and dependencies?
* Are there assumptions about hardware, software or people resources?
* ...

Customer Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there specific customer environments that need to be considered (such as working with existing h/w and software)?
...

Documentation Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-2284: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-2297: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/6765

Epic CONSOLE-2244: Improve Cluster Config ‘OperatorHub’ and ‘CatalogSource’ views for new OLM features

View the Description

Background:

OpenShift console allows users (cluster admins) to change the state of the “default hub sources” for OperatorHub on the cluster from “enabled” to “disabled” and vice versa through “Global Configuration → OperatorHub” view from the “Cluster Settings” view.

Starting from OpenShift 4.4, the console and OLM provides richer configurations for the ‘CatalogSource’ objects that enable users to create their curated sources for OperatorHub with custom “Display Name”, “URL of Image Registry”, and the “Polling Interval” for updating the custom OperatorHub source.

This epic is about reflecting/exposing the newer capabilities on the ‘OperatorHub’ (Cluster Config view) and the ‘CatalogSource’ list and details views.

Goals:

1. As an admin user of console, I'd like to:
easily disable/enable the predefined Operator sources for the OperatorHub

so that I can:
control the sources of the Operators my cluster users see on the OperatorHub view.

2. As an admin user of console, I'd like to:
easily understand how to add/edit/view/remove my custom Operator catalogSource for the OperatorHub

so that I can
easier managing (add/edit/view/remove) my custom sources for the OperatorHub

3. As an admin user of console, I'd like to:
easily see the configurations and status of my custom Operator catalogSource for the OperatorHub

so that I can
easier managing (review/edit) the configurations of my custom sources for the OperatorHub

Acceptance Criteria:

Improve cluster config: ‘OperatorHub’ Detail view

Add toggles for “disabled/enabled” predefined Operators
- "Red Hat Operators"
- "Certified Operators"
- "Community Operators"
- "Marketplace Operators"

Add “help text” on details view to guide users:

Change the state of the default hub sources for OperatorHub on the cluster from enabled to disabled and vice versa. Add and manage your curated sources for OperatorHub on the Sources tab with custom Display Name, URL of Image Registry, and the Polling Interval for updating your custom OperatorHub source.

Embedded a link to the "Sources" tab in the help text above:
```
URL/k8s/cluster/config.openshift.io~v1~OperatorHub/cluster/sources
```
- Easier access to create/manage custom ‘CatalogSource

Improve ‘CatalogSource’ list view (on the “Source” tab) and details view
- On ‘CatalogSource’ list view:
  - Show “Catalog Polling Interval” (if `spec.sourceType: grpc`)
  - Expose “Status” (status.connectionState.lastObservedState)
- On ‘CatalogSource’ details view:
  - Show & Edit “Catalog Polling Interval” (if `spec.sourceType: grpc`)
    --> A dropdown with options in ‘15m’, ‘30m’, ‘45m’, ‘60m’.
  - (Sync up with the list view) Expose newly introduced fields on the ‘CatalogSource’ object:
    - Expose “Status” (status.connectionState.lastObservedState)
    - Display Name (spec.displayName)
    - Publisher (spec.publisher)
    - Availability
    - Endpoint (spec.image)
    - Polling Interval
    - # of Operators
  - Add an “Operators” tab - show a list of “PackageManifests” (Operators) consists of this ‘CatalogSource‘

Current UI in OpenShift console:

See current console screenshots in the attachments for reference:

Cluster config: ‘OperatorHub’ Detail view
‘CatalogSource’ list view (on the “Source” tab)
‘CatalogSource‘ Details view
mocked screenshot - "Operators" tab on the ‘CatalogSource‘ Details view

Story CONSOLE-2456: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/7191

Story CONSOLE-2477: Improve Operator hub "Sources" view to add additional columns & modify the Actions kebab

View the Description View the linked PRs

The Sources tab list view now includes 2 new columns for the Catalog Sources:

Status and Registry Poll Interval

Action menu changes:

For default sources: Only the existing Disable and Edit CatalogSource actions are available since any other delete or edit will immediately be reverted by the cluster operator
For custom sources: The menus match the Actions menu in the Catalog Source

https://github.com/openshift/console/pull/7259

Story CONSOLE-2457: Improve ‘CatalogSource’ Details view & add a new "Operators" tab

View the Description View the linked PRs

Improve ‘CatalogSource’ details view (on the “Details” tab)

On ‘CatalogSource’ details view:
- Show & Edit “Catalog Polling Interval” (if `spec.sourceType: grpc`)
  --> A dropdown with options in ‘15m’, ‘30m’, ‘45m’, ‘60m’.
- (Sync up with the list view) Expose newly introduced fields on the ‘CatalogSource’ object:
  - Expose “Status” (status.connectionState.lastObservedState)
  - Display Name (spec.displayName)
  - Publisher (spec.publisher)
  - Availability
  - Endpoint (spec.image)
  - Polling Interval
  - # of Operators
- Add an “Operators” tab - show a list of “PackageManifests” (Operators) consists of this ‘CatalogSource‘

https://github.com/openshift/console/pull/7430

Feature OCPPLAN-8041: Console: PF4 Usability Enhancements

View the Description

Feature Overview
PF4 has introduced a bevy of new components to really enhance the user experience, specifically around list views. These new components make it easier than ever for customers to find the data they care about and to take action.

Goals

Migrate to PF4 Components
Improve Search Page
- Saved Search
Improve List View
- Bulk Actions
  - delete
  - add label
  - add annotation
- Column Management
  - Show\Hide Columns
- Favoriting
  - Bubble favorites to the top of the list view
- Toolbar
  - Advance Filter

Requirements

Requirement	Notes	isMvp?
Improvements must be applied to all resource list view pages including the search page	This is a requirement for ALL features.	YES
Search page will be the only page that needs "Saved Searches"	This is a requirement for ALL features.	NO
All components updated must be PF4 supported	This is a requirement for ALL features.	YES

Questions to answer...
How will the user interact with this feature?
Which users will use this and when will they use it?
Is this feature used as part of the current user interface?

Out of Scope
<--- Remove this text when creating a Feature in Jira, only for reference --->
# List of non-requirements or things not included in this feature
# ...

Epic CONSOLE-2249: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task CONSOLE-1916: Inventory card visual updates

View the Description View the linked PRs

The inventory card needs a couple of visual tweaks:

the individual links should be 14px (not 16px) to be consistent with other cards
the order of the icon/# should be switched so that the number precedes the icon

Attaching the desired design from Michael

https://github.com/openshift/console/pull/7289

Feature OCPPLAN-8043: Console: Internationalization

View the Description

Feature Overview

This will be phase 1 of Internationalization of the OpenShift Console.

Phase 1 will include the following:

UI based language Selector instead of using browser detection
Externalize all hard coded strings in the client code including all OpenShift static plugins
1. Admin Console
2. Dev Console
3. Serverless
4. Pipelines
5. CNV
6. OCS
7. CSO
Localized Date\Time
Setup all processes, infrastructure, and testing required
We will start with support for Chinese and Japanese lang

Phase 1 will not include:

Dynamically generated UI (Operator, OpenAPIV3Schema)
1. Operators that surface informational messages may not have translations available
Strings from non client code
1. This may include items such as events surfaced from Kuberenetes, alerts, and error messages displayed to the user or in logs
Localization of logging messages at any level is not in scope
Any CLI
Language support for left to right languages ie Arabic

Initial List of Languages to Support

---------- 4.7* ----------

Japanese - Code: ja
Chinese - Code: zh_CN, zh_TW
Korean - Code: ko

*This will be based on the ability to get all the strings externalized, there is a good chance this gets pushed to 4.8.

---------- Post 4.7 ----------

Spanish: - Code: es_419, es
German: - Code: de
French - Code: fr
Italian - Code: it
Portuguese - Code: pt_BR
Korean - Code: ko
Hindi - Code: hi

POC

Initial POC PR

Goals

Internationalization has become table stakes. OpenShift Console needs to support different languages in each of the major markets. This is key functionality that will help unlock sales in different regions.

Requirements

Requirement	Notes	isMvp?
Language Selector		YES
Localized Date. + Time		YES
Externalization and translation of all client side strings		YES
Translation for Chinese and Japanese		YES
Process, infra, and testing capabilities put into place		YES
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES

Out of Scope

Dynamically generated UI (Operator, OpenAPIV3Schema)
1. Operators that surface informational messages may not have translations available
Strings from non client code
1. This may include items such as events surfaced from Kuberenetes, alerts, and error messages displayed to the user or in logs
Localization of logging messages at any level is not in scope
Any CLI support
Language support for left to right languages ie. Arabic

Assumptions

Each static plugin team will be responsible for externalizing all their client code strings.
Quick Starts will need to be translated.

Customer Considerations

We are rolling this feature in phases, based on customer feedback, there may be no phase 2.

Documentation Considerations

I believe documentation already supports a large language set.

Epic CONSOLE-2325: Console: Internationalization - Admin

View the Description View the linked PRs

Goal

Localize OpenShift Admin Console
- Externalize hardcoded String on the client side
- Update Capitalization for Strings
- Add a language selector under User Menu
- Localize date + time
- Setup Build system
- Setup CI system

Why is this important?

Our goal should be to make OCP Console accessible to everyone. By providing different language support we open up OCP to many people that previously couldn't use the Console because of language barriers.

Acceptance Criteria

Strings Externalized from client code
Localized date and time supported
UI offer Language selector with all supported languages
- Design doc: https://docs.google.com/document/d/17iIBDlEneu0DNhWi2TkQShRSobQZubWOhS0OQr_T3hE/edit?usp=sharing
Translation process and timelines in place
Build and testing Infra in place
Languages QA-ed per release
Docs Updated
CI - MUST be running successfully with tests automated

Release Technical Enablement

https://github.com/openshift/console/pull/7092

Story CONSOLE-2394: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/6883

Task CONSOLE-2461: Add infrastructure for Korean

View the Description View the linked PRs

Update i18n files/scripts to support Korean in advance of translation work from Terry.

Specifically:

Add moment.js locale for "ko" (Korean)
Remove remaining language references in scripts where possible so they're more language-agnostic (did some here: https://github.com/openshift/console/pull/7030; blocked until this merges)
Update bash script from https://github.com/openshift/console/pull/7030 to export Korean po files
Add timestamp format in "ko" folder in public/locales

https://github.com/openshift/console/pull/7226

Story CONSOLE-2393: i18n: externalize strings in User Management nav section

View the Description View the linked PRs

Externalize strings in the User Management nav section (Users, Groups, Service Accounts, Roles, Role Bindings).

https://github.com/openshift/console/pull/7098

Story CONSOLE-2395: i18n: add language selector to console

View the Description View the linked PRs

By default, we should use the user's browser preference for language, but we should give users a language selector to override. Here is a proposed design:

https://docs.google.com/document/d/17iIBDlEneu0DNhWi2TkQShRSobQZubWOhS0OQr_T3hE/edit#

https://github.com/openshift/console/pull/7094

Task CONSOLE-2539: Check in Sprint 194/195 translations

View the Description View the linked PRs

Add zh-cn and ja-jp translations.

https://github.com/openshift/console/pull/7981

Story CONSOLE-2420: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/7213

Story CONSOLE-2416: i18n: Translate common component used by list and details pages

View the Description

We need to translate following common components used in the list and details pages:

Common ListPage component
Column management modal
Common list view kebab
Common filter toolbar
Details page breadcrumbs
ResourceSummary/DetailsItem component
Details page Actions dropdown
Common details page tabs
Managed by operator link
Conditions table
Common dropdown component

Sub-task CONSOLE-2437: i18n for shared dropdown component/kebabs/action menus

View the linked PRs

https://github.com/openshift/console/pull/6993

Sub-task CONSOLE-2438: i18n for shared environment components

View the linked PRs

https://github.com/openshift/console/pull/6891

Sub-task CONSOLE-2439: i18n for shared list view/details page components

View the Description View the linked PRs

Includes list page component, filter toolbar, details page breadcrumbs, resource summary/details item, common details page headings, managed by operator link, and the conditions table.

https://github.com/openshift/console/pull/6888

Story CONSOLE-2371: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task CONSOLE-2466: Bump i18n dependencies

View the Description View the linked PRs

Bump i18next dependencies to the latest versions.

https://github.com/openshift/console/pull/7128

Story CONSOLE-2372: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/6928

Story CONSOLE-2369: i18n: externalize strings in Workloads pages

View the Description

Externalize strings in the pages under the console Workloads nav section.

Sub-task CONSOLE-2449: i18n for cron job

View the Description View the linked PRs

i18n for cron job

https://github.com/openshift/console/pull/7031

Sub-task CONSOLE-2444: i18n for deployments

View the Description View the linked PRs

i18n for deployments

https://github.com/openshift/console/pull/7058

Sub-task CONSOLE-2452: i18n for Replication Controllers

View the Description View the linked PRs

i18n for Replication Controllers

https://github.com/openshift/console/pull/7042

Sub-task CONSOLE-2445: i18n for deployment config

View the Description View the linked PRs

i18n for deployment config

https://github.com/openshift/console/pull/7057

Sub-task CONSOLE-2447: i18n for secrets

View the Description View the linked PRs

i18n for secrets

https://github.com/openshift/console/pull/7047

Sub-task CONSOLE-2448: i18n for config maps

View the Description View the linked PRs

i18n for config maps

https://github.com/openshift/console/pull/7045

Sub-task CONSOLE-2446: i18n for stateful sets

View the Description View the linked PRs

i18n for stateful sets

https://github.com/openshift/console/pull/7056

Sub-task CONSOLE-2451: i18n for Daemon Set

View the Description View the linked PRs

i18n for Daemon Set

https://github.com/openshift/console/pull/7039

Sub-task CONSOLE-2442: i18n for ReplicaSets

View the Description View the linked PRs

i18n for ReplicaSets

https://github.com/openshift/console/pull/7025

Sub-task CONSOLE-2453: i18n for Horizontal Pod AutoScalers

View the Description View the linked PRs

i18n for Horizontal Pod AutoScalers

https://github.com/openshift/console/pull/7043

Sub-task CONSOLE-2443: i18n for pods

View the Description View the linked PRs

i18n for pods

https://github.com/openshift/console/pull/7060

Story CONSOLE-2392: i18n: externalize strings in Compute nav section

View the Description View the linked PRs

Externalize strings in the Compute nav section (Nodes, Machines, Machine Sets, Machine Autoscalers, Machine Health Checks, Machine Configs, Machine Config Pools).

https://github.com/openshift/console/pull/6929

Task CONSOLE-2475: Check in Sprint 192 translations

View the Description View the linked PRs

Check in translations from Sprint 192.

https://github.com/openshift/console/pull/7224

Story CONSOLE-2370: i18n: externalize strings in the Home nav section

View the Description

Externalize strings in the pages under the console Home nav section.

Sub-task CONSOLE-2429: i18n for namespace

View the Description View the linked PRs

externalize strings for home / namespace pages

https://github.com/openshift/console/pull/6953

Sub-task CONSOLE-2431: i18n for explore

View the Description View the linked PRs

Externalize strings in Home / Explore pages

https://github.com/openshift/console/pull/7009

Sub-task CONSOLE-2467: i18n for project dashboard

View the Description View the linked PRs

Externalize strings for Home / project / dashboard section

https://github.com/openshift/console/pull/7474

Sub-task CONSOLE-2428: i18n for search page

View the Description View the linked PRs

externalize strings for the home / search page

https://github.com/openshift/console/pull/6948

Sub-task CONSOLE-2432: i18n for events

View the Description View the linked PRs

Externalize strings for Home / Events section

https://github.com/openshift/console/pull/7130

Story CONSOLE-2390: i18n: externalize strings in Builds nav section

View the Description View the linked PRs

Externalize strings in the Builds nav section (Build Configs, Builds, Image Streams).

Story CONSOLE-2391: i18n: externalize strings in Monitoring nav section

View the Description View the linked PRs

Externalize strings in the Monitoring nav section (Alerting, Metrics, Dashboards).

Epic CONSOLE-2527: Internationalization Cont. - 4.8

View the Description

Epic Goal

This is the continuation of the Internationalization work... the following items remain:

- All existing QuickStarts get Translated
- Automation Completed
- Any remaining items cleaned up

Why is this important?

Automating as much as possible with the detecting duplicate strings, building, translation drops will ensure we will be successful for all future releases
Quick Start are important part of the product that enable our users to maximize usage of the Console
Best to clean up anything left over to reduce future Tech Debt

Acceptance Criteria

Quick Starts are translated
Everything is automated for building, and pushing translation drops to the globalization team
Source code should be up to quality standards

Previous Work (Optional):

https://issues.redhat.com/browse/CONSOLE-2325

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CONSOLE-2487: Consolidate i18n namespaces

View the Description View the linked PRs

We have too many namespaces if we're loading them upfront. We should consolidate some of the files.

https://github.com/openshift/console/pull/7831

Feature OCPPLAN-8108: The details of this Jira Card are restricted (Only Red Hat employees and contractors)

View the Description

The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Epic OCPBUILD-30: Build Rebases OCP 4.11

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Rebase OpenShift components to k8s v1.24

Why is this important?

Rebasing ensures components work with the upcoming release of Kubernetes
Address tech debt related to upstream deprecations and removals.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

k8s 1.24 release

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story BUILD-417: Rebase openshift-controller-manager to k8s 1.24

View the Description View the linked PRs

Rebase openshift-controller-manager to k8s 1.24

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/250

Story BUILD-418: Rebase openshift-controller-manager-operator to k8s 1.24

View the linked PRs

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/242

Feature OCPSTRAT-103: Ensuring the Control Plane is Fully Decoupled for Hosted Control Planes

View the Description

Why?

Decouple control and data plane.
- Customers do not pay Red Hat more to run HyperShift control planes and supporting infrastructure than Standalone control planes and supporting infrastructure.
Improve security
- Shift credentials out of cluster that support the operation of core platform vs workload
Improve cost
- Allow a user to toggle what they don’t need.
- Ensure a smooth path to scale to 0 workers and upgrade with 0 workers.

Assumption

A customer will be able to associate a cluster as “Infrastructure only”
E.g. one option: management cluster has role=master, and role=infra nodes only, control planes are packed on role=infra nodes
OR the entire cluster is labeled infrastructure , and node roles are ignored.

Anything that runs on a master node by default in Standalone that is present in HyperShift MUST be hosted and not run on a customer worker node.

Doc: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit

Epic STOR-960: Ensure openshift cluster-storage-operator & aws-ebs-csi-driver-operator are running on hosted control planes

View the Description

Overview

Customers do not pay Red Hat more to run HyperShift control planes and supporting infrastructure than Standalone control planes and supporting infrastructure.

Assumption

A customer will be able to associate a cluster as “Infrastructure only”
E.g. one option: management cluster has role=master, and role=infra nodes only, control planes are packed on role=infra nodes
OR the entire cluster is labeled infrastructure, and node roles are ignored.
Anything that runs on a master node by default in Standalone that is present in HyperShift MUST be hosted and not run on a customer worker node.

DoD

Run cluster-storage-operator (CSO) + AWS EBS CSI driver operator + AWS EBS CSI driver control-plane Pods in the management cluster, run the driver DaemonSet in the hosted cluster.

More information here: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit

Story STOR-1040: Update AWS EBS CSI driver operator for HyperShift

View the Description View the linked PRs

As HyperShift Cluster Instance Admin, I want to run AWS EBS CSI driver operator + control plane of the CSI driver in the management cluster, so the guest cluster runs just my applications.

Add a new cmdline option for the guest cluster kubeconfig file location

Parse both kubeconfigs:
- One from projected service account, which leads to the management cluster.
- Second from the new cmdline option introduced above. This one leads to the guest cluster.

Only on HyperShift:

- When interacting with Kubernetes API, carefully choose the right kubeconfig to watch / create / update objects in the right cluster.

- Replace namespaces in all Deployments and other objects that are created in the management cluster. They must be created in the same namespace as the operator.

- Pass only the guest kubeconfig to the operand (control-plane Deployment of the CSI driver).

Exit criteria:

Control plane Deployment of AWS EBS CSI driver runs in the management cluster in HyperShift.
Storage works in the guest cluster.

No regressions in standalone OCP.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/159

Feature OCPSTRAT-109: Provide discoverability about available RH developer tooling

View the Description

< High-Level description of the feature ie: Executive Summary >

Goals

Cluster administrators need an in-product experience to discover and install new Red Hat offerings that can add high value to developer workflows.

Requirements

Requirements	Notes	IS MVP
Discover new offerings in Home Dashboard		Y
Access details outlining value of offerings		Y
Access step-by-step guide to install offering		N
Allow developers to easily find and use newly installed offerings		Y
Support air-gapped clusters		Y

- (Optional) Use Cases

< What are we making, for who, and why/what problem are we solving?>

Out of scope

Discovering solutions that are not available for installation on cluster

Dependencies

No known dependencies

Background, and strategic fit

Assumptions

None

Customer Considerations

Documentation Considerations

Quick Starts

What does success look like?

QE Contact

Impact

Related Architecture/Technical Documents

Done Checklist

Acceptance criteria are met
Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
User Journey automation is delivered
Support and SRE teams are provided with enough skills to support the feature in production environment

Epic ODC-7111: Provide discoverability about RH developer tooling that is available to me

View the Description

Problem:

Developers using Dev Console need to be made aware of the RH developer tooling available to them.

Goal:

Provide awareness to developers using Dev Console of the RH developer tooling that is available to them, including:

odo
OpenShift IDE extension, which is available on IntelliJ & VScode
- VSCode Knative IDE extension: https://marketplace.visualstudio.com/items?itemName=redhat.vscode-knative
- IntelliJ Knative Plugin: https://plugins.jetbrains.com/plugin/16476-knative--serverless-functions-by-red-hat
Dev spaces

Consider enhancing the +Add page and/or the Guided tour

Provide a Quick Start for installing the Cryostat Operator

Why is it important?

To increase usage of our RH portfolio

Acceptance criteria:

Quick Start - Installing Cryostat Operator
Quick Start - Get started with JBoss EAP using a Helm Chart
Discoverability of the IDE extensions from Create Serverless form
Update Terminal step of the Guided Tour to indicate that odo CLI is accessible (link to https://developers.redhat.com/products/odo/overview)

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Story ODC-7339: Follow up ticket for ODC-7292

View the Description View the linked PRs

This issue is to handle the PR comment - https://github.com/openshift/console-operator/pull/770#pullrequestreview-1501727662 for the issue https://issues.redhat.com/browse/ODC-7292

https://github.com/openshift/console-operator/pull/773

Story ODC-7292: Create new Quick Start for installing the Cryostat Operator

View the Description View the linked PRs

Description

This story is to add new Quick Start for installing the Cryostat Operator

Acceptance Criteria

Create new Quick Start for installing the Cryostat Operator

Additional Details:

https://github.com/openshift/console-operator/pull/770

Story ODC-7312: Add OpenShift Quickstart for JBoss EAP 7

View the Description View the linked PRs

Description

Add OpenShift Quickstart for JBoss EAP 7

Acceptance Criteria

Add OpenShift Quickstart for JBoss EAP 7

Additional Details:

https://github.com/openshift/console-operator/pull/760

Feature OCPSTRAT-120: Implement RWOP SELinux context mounts (TechPreview)

View the Description

Epic Goal*

Provide a long term solution to SELinux context labeling in OCP.

Why is this important? (mandatory)

As of today when selinux is enabled, the PV's files are relabeled when attaching the PV to the pod, this can cause timeout when the PVs contains lot of files as well as overloading the storage backend.

https://access.redhat.com/solutions/6221251 provides few workarounds until the proper fix is implemented. Unfortunately these workaround are not perfect and we need a long term seamless optimised solution.

This feature tracks the long term solution where the PV FS will be mounted with the right selinux context thus avoiding to relabel every file.

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Apply new context when there is none
Change context of all files/folders when changing context
RWO & RWX PVs
1. ReadWriteOncePod PVs first
2. RWX PV in a second phase

As we are relying on mount context there should not be any relabeling (chcon) because all files / folders will inherit the context from the mount context

More on design & scenarios in the KEP and related epic ~~STOR-1173~~

Dependencies (internal and external) (mandatory)

None for the core feature

However the driver will have to set SELinuxMountSupported to true in the CSIDriverSpec to enable this feature.

Contributing Teams(and contacts) (mandatory)

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

Development - STOR
Documentation - STOR
QE - STOR
PX -
Others -

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Epic STOR-1173: Upstream Beta: SELinux relabeling using mount options (TP)

View the Description

Epic Goal

Support upstream feature "SELinux relabeling using mount options (CSIDriver API change)"" in OCP as Beta, i.e. test it and have docs for it (unless it's Alpha upstream).

Summary: If Pod has defined SELinux context (e.g. it uses "resticted" SCC) and it uses ReadWriteOncePod PVC and CSI driver responsible for the volume supports this feature, kubelet + the CSI driver will mount the volume directly with the correct SELinux labels. Therefore CRI-O does not need to recursive relabel the volume and pod startup can be significantly faster. We will need a thorough documentation for this.

This upstream epic actually will be implemented by us!

Why is this important?

We get this upstream feature through Kubernetes rebase. We should ensure it works well in OCP and we have docs for it.

Upstream links

Enhancement issue: [1710]
KEP: https://github.com/kubernetes/enhancements/pull/3172

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

External: the feature is currently scheduled for Beta in Kubernetes 1.27, i.e. OCP 4.14, but it may change before Kubernetes 1.27 GA.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story STOR-1276: Update all OCP CSIDriver instances

View the Description View the linked PRs

As a cluster user, I want to use mounting with SELinux context without any configuration.

This means OCP ships CSIDriver objects with "SELinuxMount: true" for CSI drivers that support mounting with "-o context". I.e. all CSI drivers that are based on block volumes and use ext4/xfs should have this enabled.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/264

Epic STOR-966: Storage: [UPSTREAM Alpha] Recursive Permissions (SELinux) - Alpha 2/2

View the Description

This Epic is to track upstream work in the Storage SIG community

This Epic is to track the SELinux specific work required. fsGroup work is not included here.

Goal:

Continue contributing to and help move along the upstream efforts to enable recursive permissions functionality.

Finish current SELinuxMountReadWriteOncePod feature upstream:

Implement it in all volume plugins (current alpha has just iSCSI and CSI
Add e2e test + fixing all tests that don't work well with SELinux
Implement necessary changes in volume reconstruction to reconstruct also SELinux context.

The feature is probably going to stay alpha upstream.

Problem:

Recursive permission change takes very long for fsGroup and SELinux. For volumes with many small files Kubernetes currently does a chown for every file on the volume (due to fsGroup). Similarly for container runtimes (such as CRI-O) a chcon of every file on the volume is performed due to SCC's SELinux context. Data on the volume may already have the correct GID/SELinux context so Kubernetes needs way to detect this automatically to avoid the long delay.

Why is this important:

A user wants to bring their pod online quickly and efficiently.

Dependencies (internal and external):

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL):

Previous Work:

Customers:

Open questions:

Notes:

Story STOR-1078: Update CSI drivers operators in OCP to support mount with SELinux

View the Description View the linked PRs

As OCP developer (and as OCP user in the future), I want all CSI drivers shipped as part of OCP to support mounting with -o context=XYZ, so I can test with CSIDriver.SELinuxMount: true (or my pods are running without CRI-O recursively relabeling my volume).

In detail:

For CSI drivers based on block devices, pass host's /etc/selinux and /sys/fs/ to the CSI drvier container on the node as HostPath volumes
For CSI drivers based on NFS / CIFS: do the same as for block volumes (it won't harm the driver in any way), but investigate if these drivers can actually run with CSIDriver.SELinuxMount: true.

Details: https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling#selinux-support-in-volumes

Exit criteria:

Verify that CSI drivers shipped by OCP based on block volumes mount volumes with -o context=xyz instead of relabeling the volumes by CRI-O. That should happen when all these conditions are satisfied:
- SELinuxMountReadWriteOncePod and ReadWriteOncePod feature gates are enabled
- CSIDriver.SELinuxMount is set to true manually for the CSI driver. OCP will not do it by default in 4.13, because it requires the alpha feature gates from the previous bullet.
- PVC has AccessMode: [ReadWriteOncePod]
- Pod has SELinux context explicitly assigned, i.e. pod.spec.securityContext (or pod.spec.containers[*].securityContext) has seLinuxOptions set, incl. {{level }}(based on SCC, OCP might do it automatically)
This is alpha / dev preview feature, so QE might done when graduating to Beta / tech preview.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/174

Feature OCPSTRAT-158: Default Storage Class Management

View the Description

Feature Goal*

What is our purpose in implementing this? What new capability will be available to customers?

The goal of this feature is to provide a consistent, predictable and deterministic approach on how the default storage class(es) is managed.

Why is this important? (mandatory)

The current default storage class implementation has corner cases which can result in PVs staying in pending because there is either no default storage class OR multiple storage classes are defined

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

No default storage class

In some cases there is no default SC defined, this can happen during OCP deployment where components such as the registry request a PV whereas the SC are not been defined yet. This can also happen during a change in default SC, there won't be any between the admin unset the current one and set the new on.

The admin marks the current default SC1 as non-default.

Another user creates PVC requesting a default SC, by leaving pvc.spec.storageClassName=nil. The default SC does not exist at this point, therefore the admission plugin leaves the PVC untouched with pvc.spec.storageClassName=nil.
The admin marks SC2 as default.
PV controller, when reconciling the PVC, updates pvc.spec.storageClassName=nil to the new SC2.
PV controller uses the new SC2 when binding / provisioning the PVC.

The installer creates PVC for the image registry first, requesting the default storage class by leaving pvc.spec.storageClassName=nil.

The installer creates a default SC.
PV controller, when reconciling the PVC, updates pvc.spec.storageClassName=nil to the new default SC.
PV controller uses the new default SC when binding / provisioning the PVC.

Multiple Storage Classes

In some cases there are multiple default SC, this can be an admin mistake (forget to unset the old one) or during the period where a new default SC is created but the old one is still present.

New behavior:

Create a default storage class A
Create a default storage class B
Create PVC with pvc.spec.storageCLassName = nil

-> the PVC will get the default storage class with the newest CreationTimestamp (i.e. B) and no error should show.

-> admin will get an alert that there are multiple default storage classes and they should do something about it.

CSI that are shipped as part of OCP

The CSI drivers we ship as part of OCP are deployed and managed by RH operators. These operators automatically create a default storage class. Some customers don't like this approach and prefer to:

Create their own default storage class
Have no default storage class in order to disable dynamic provisioning

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

No external dependencies.

Contributing Teams(and contacts) (mandatory)

Development - STOR
Documentation - STOR
QE - STOR
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Can bring confusion to customer as there is a change in the default behavior customer are used to. This needs to be carefully documented.

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Epic STOR-743: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story STOR-947: Update all CSI driver operators to support new API

View the linked PRs

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/173

Feature OCPSTRAT-169: [Tech Preview] Apply user defined tags to all resources created by OpenShift (GCP)

View the Description

Feature Overview

Create a GCP cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in GCP) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

Goals

Functionality on GCP Tech Preview
inclusion in the cluster backups
flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

List any affected packages or components.

Installer
Cluster Infrastructure
Storage
Node
NetworkEdge
Internal Registry
CCO

Epic CORS-2455: User defined labels for GCP Resources(TP)

View the Description

This epic covers the work to apply user defined labels GCP resources created for openshift cluster available as tech preview.

The user should be able to define GCP labels to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags/labels in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for labeling when the resources are created.

Updating/deleting of labels added during cluster creation or adding new labels as Day-2 operation is out of scope of this epic.

List any affected packages or components.

Installer
Cluster Infrastructure
Storage
Node
NetworkEdge
Internal Registry
CCO

Reference - https://issues.redhat.com/browse/RFE-2017

Story CFE-689: Update openshift/api package version in cluster-config-operator for GCP labels definition

View the Description View the linked PRs

cluster-config-operator makes Infrastructure CRD available for installer, which is included in it's container image from the openshift/api package and requires the package to be updated to have the latest CRD.

https://github.com/openshift/cluster-config-operator/pull/335

Feature OCPSTRAT-193: Automatically restart storage operators pods when the CA certificates are updated

View the Description

Feature Overview (aka. Goal Summary)

The storage operators need to be automatically restarted after the certificates are renewed.

From OCP doc "The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left."

Since OCP is now offering an 18 months lifecycle per release, the storage operator pods need to be automatically restarted after the certificates are renewed.

Goals (aka. expected user outcomes)

The storage operators will be transparently restarted. The customer benefit should be transparent, it avoids manually restart of the storage operators.

Requirements (aka. Acceptance Criteria):

The administrator should not need to restart the storage operator when certificates are renew.

This should apply to all relevant operators with a consistent experience.

Use Cases (Optional):

As an administrator I want the storage operators to be automatically restarted when certificates are renewed.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

This feature request is triggered by the new extended OCP lifecycle. We are moving from 12 to 18 months support per release.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

No doc is required

Interoperability Considerations

This feature only cover storage but the same behavior should be applied to every relevant components.

Epic STOR-991: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story STOR-1300: Automatically restart `aws-ebs-csi-driver-controller` pods when the secret `aws-ebs-csi-driver-controller-metrics-serving-cert` is updated

View the Description View the linked PRs

The pod `aws-ebs-csi-driver-controller` mounts the secret:

$ oc get po -n openshift-cluster-csi-drivers aws-ebs-csi-driver-controller-559f74d7cd-5tk4p -o yaml
...
    name: driver-kube-rbac-proxy
    name: provisioner-kube-rbac-proxy
	name: attacher-kube-rbac-proxy
	name: resizer-kube-rbac-proxy
	name: snapshotter-kube-rbac-proxy

    volumeMounts:
    - mountPath: /etc/tls/private
      name: metrics-serving-cert

  volumes:
  - name: metrics-serving-cert
    secret:
      defaultMode: 420
      secretName: aws-ebs-csi-driver-controller-metrics-serving-cert

Hence, if the secret is updated (e.g. as a result of CA cert update), the Pod must be restarted

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/216

Feature OCPSTRAT-242: No auto-generated secrets for SA when Registry is disabled

View the Description

Feature Overview (aka. Goal Summary)

Description of problem:

Even though in 4.11 we introduced LegacyServiceAccountTokenNoAutoGeneration to be compatible with upstream K8s to not generate secrets with tokens when service accounts are created, today OpenShift still creates secrets and tokens that are used for legacy usage of openshift-controller as well as the image-pull secrets.

Customer issues:

Customers see auto-generated secrets for service accounts which is flagged as a security risk.

This Feature is to track the implementation for removing legacy usage and image-pull secret generation as well so that NO secrets are auto-generated when a Service Account is created on OpenShift cluster.

Goals (aka. expected user outcomes)

NO Secrets to be auto-generated when creating service accounts

Requirements (aka. Acceptance Criteria):

Following *secrets need to NOT be generated automatically with every Serivce account creation:*

ImagePullSecrets : This is needed for Kubelet to fetch registry credentials directly. Implementation needed for the following upstream feature.
https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2133-kubelet-credential-providers/README.md
Dockerconfig secrets: The openshift-controller-manager relies on the old token secrets and it creates them so that it's able to generate registry credentials for the SAs. There is a PR that was created to remove this https://github.com/openshift/openshift-controller-manager/pull/223.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Concerns/Risks: Replacing functionality of one of the openshift-controller used for controllers that's been in the code for a long time may impact behaviors that w

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Existing documentation needs to be clear on where we are today and why we are providing the above 2 credentials. Related Tracker: https://issues.redhat.com/browse/OCPBUGS-13226

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

Epic API-1642: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature OCPSTRAT-257: Nutanix Zonal: Multiple regions and zones support for Nutanix IPI and UPI

View the Description

Feature Overview

As an Infrastructure Administrator, I want to deploy OpenShift on Nutanix distributing the control plane and compute nodes across multiple regions and zones, forming different failure domains.

As an Infrastructure Administrator, I want to configure an existing OpenShift cluster to distribute the nodes across regions and zones, forming different failure domains.

Goals

Install OpenShift on Nutanix using IPI / UPI in multiple regions and zones.

Requirements (aka. Acceptance Criteria):

Ensure Nutanix IPI can successfully be deployed with ODF across multiple zones (like we do with vSphere, AWS, GCP & Azure)
Ensure zonal configuration in Nutanix using UPI is documented and tested

vSphere Implementation

This implementation would follow the same idea that has been done for vSphere. The following are the main PRs for vSphere:

https://github.com/openshift/enhancements/blob/master/enhancements/installer/vsphere-ipi-zonal.md

OpenShift 4.12

OpenShift 4.13
https://github.com/openshift/installer/pull/6770 https://github.com/openshift/installer/pull/6782 https://github.com/openshift/installer/pull/6750 https://github.com/openshift/installer/pull/6738 https://github.com/openshift/installer/pull/6612 https://github.com/openshift/installer/pull/6327 https://github.com/openshift/api/pull/1388 https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/224 https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/218 https://github.com/openshift/openshift-docs/pull/54788 https://github.com/openshift/installer/pull/6905

OpenShift 4.13

Existing vSphere documentation

https://docs.openshift.com/container-platform/4.13/installing/installing_vsphere/installing-vsphere-installer-provisioned-customizations.html#configuring-vsphere-regions-zones_installing-vsphere-installer-provisioned-customizations

https://docs.openshift.com/container-platform/4.13/post_installation_configuration/post-install-vsphere-zones-regions-configuration.html

Epic CORS-2728: Nutanix Zonal: Multiple regions and zones support for Nutanix IPI and Assisted Installer (PRs Reviews)

View the Description

Epic Goal

Nutanix Zonal: Multiple regions and zones support for Nutanix IPI and Assisted Installer

Note

Nutanix Engineering team is driving this implementation based on the vSphere zonal implementation, led by Yanhua Li .
The Installer team is expected to just review PRs.
PRs are expected in these repos:

Spike SPLAT-1272: Nutanix Zonal: Multiple zones support for Nutanix IPI and UPI

View the Description View the linked PRs

As a user, I want to be able to spread control plane nodes for an OCP clusters across Prism Elements (zones).

https://github.com/openshift/cluster-config-operator/pull/378

Feature OCPSTRAT-258: Unified Console - Tech Preview Update

View the Description

BU Priority Overview

As our customers create more and more clusters, it will become vital for us to help them support their fleet of clusters. Currently, our users have to use a different interface(ACM UI) in order to manage their fleet of clusters. Our goal is to provide our users with a single interface for managing a fleet of clusters to deep diving into a single cluster. This means going to a single URL – your Hub – to interact with your OCP fleet.

Goals

The goal of this tech preview update is to improve the experience from the last round of tech preview. The following items will be improved:

Improved Cluster Picker: Moved to Masthead for better usability, filter/search
Support for Metrics: Metrics are now visualized from Spoke Clusters
Avoid UI Mismatch: Dynamic Plugins from Spoke Clusters are disabled
Console URLs Enhanced: Cluster Name Add to URL for Quick Links
Security Improvements: Backend Proxy and Auth updates

Epic CONSOLE-3052: Phase 3 - Unite Consoles (ACM, OCP)

View the Description

Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster.
Why customers want this?

Single interface to accomplish their tasks
Consistent UX and patterns
Easily accessible: One URL, one set of credentials

Why we want this?

Shared code - improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
Pre-built PF4 components
Accessibility & i18n
Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console

Enable user to quickly change context from fleet view to single cluster view
1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
2. Shared SSO across the fleet
3. Hub OCP Console can connect to remote clusters API
4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
Share UX between views
1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
2. Add Cluster List to OCP —> Create Cluster

Bug OCPBUGS-4008: Console deployment does not roll out when managed cluster configmap is updated

View the Description View the linked PRs

Description of problem:

There is a possible race condition in the console operator where the managed cluster config gets updated after the console deployment and doesn't trigger a rollout.

Version-Release number of selected component (if applicable):

4.10

How reproducible:

Rarely

Steps to Reproduce:

1. Enable multicluster tech preview by adding TechPreviewNoUpgrade featureSet to FeatureGate config. (NOTE THIS ACTION IS IRREVERSIBLE AND WILL MAKE THE CLUSTER UNUPGRADEABLE AND UNSUPPORTED) 
2. Install ACM 2.5+
3. Import a managed cluster using either the ACM console or the CLI
4. Once that managed cluster is showing in the cluster dropdown, import a second managed cluster

Actual results:

Sometimes the second managed cluster will never show up in the cluster dropdown

Expected results:

The second managed cluster eventually shows up in the cluster dropdown after a page refresh

Additional info:

Migrated from bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2055415

Story CONSOLE-3355: Console operator should sync managed cluster "copiedCSVsDisabled" flag to managed cluster config

View the Description View the linked PRs

In order for hub cluster console OLM screens to behave as expected in a multicluster environment, we need to gather "copiedCSVsDisabled" flags from managed clusters so that the console backend/frontend can consume this information.

AC:

The console operator syncs "copiedCSVsDisabled" flags from managed clusters into the hub cluster managed cluster config.

https://github.com/openshift/console-operator/pull/701

Feature OCPSTRAT-264: Compute and control plane nodes on separate subnets for on-prem IPI [ Phase 1]

View the Description

Feature Overview

Allow to configure compute and control plane nodes on across multiple subnets for on-premise IPI deployments. With separating nodes in subnets, also allow using an external load balancer, instead of the built-in (keepalived/haproxy) that the IPI workflow installs, so that the customer can configure their own load balancer with the ingress and API VIPs pointing to nodes in the separate subnets.

Goals

I want to install OpenShift with IPI on an on-premise platform (high priority for bare metal and vSphere) and I need to distribute my control plane and compute nodes across multiple subnets.

I want to use IPI automation but I will configure an external load balancer for the API and Ingress VIPs, instead of using the built-in keepalived/haproxy-based load balancer that come with the on-prem platforms.

Background, and strategic fit

Customers require using multiple logical availability zones to define their architecture and topology for their datacenter. OpenShift clusters are expected to fit in this architecture for the high availability and disaster recovery plans of their datacenters.

Customers want the benefits of IPI and automated installations (and avoid UPI) and at the same time when they expect high traffic in their workloads they will design their clusters with external load balancers that will have the VIPs of the OpenShift clusters.

Load balancers can distribute incoming traffic across multiple subnets, which is something our built-in load balancers aren't able to do and which represents a big limitation for the topologies customers are designing.

While this is possible with IPI AWS, this isn't available with on-premise platforms installed with IPI (for the control plane nodes specifically), and customers see this as a gap in OpenShift for on-premise platforms.

Functionalities per Epic

Epic	Control Plane with Multiple Subnets	Compute with Multiple Subnets	Doesn't need external LB	Built-in LB
NE-1069 (all-platforms)	✓	✓	✓	✓
NE-905 (all-platforms)	✓	✓	✓	✕
~~NE-1086~~ (vSphere)	✓	✓	✓	✓
~~NE-1087~~ (Bare Metal)	✓	✓	✓	✓
~~OSASINFRA-2999~~ (OSP)	✓	✓	✓
~~SPLAT-860~~ (vSphere)	✓	✓	✓	✕
NE-905 (all platforms)	✓	✓	✓	✕
~~OPNET-133~~ (vSphere/Bare Metal for AI/ZTP)	✓	✓	✓	✓
~~OSASINFRA-2087~~ (OSP)	✕	✓	✓	✓
~~KNIDEPLOY-4421~~ (Bare Metal workaround)	✕	✓	✓	✓
~~SPLAT-409~~ (vSphere)	✕	✓	✓	✓

Previous Work

Workers on separate subnets with IPI documentation

We can already deploy compute nodes on separate subnets by preventing the built-in LBs from running on the compute nodes. This is documented for bare metal only for the Remote Worker Nodes use case: https://docs.openshift.com/container-platform/4.11/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.html#configure-network-components-to-run-on-the-control-plane_ipi-install-installation-workflow

This procedure works on vSphere too, albeit no QE CI and not documented.

External load balancer with IPI documentation

Scenarios

vSphere: I can define 3 or more networks in vSphere and distribute my masters and workers across them. I can configure an external load balancer for the VIPs.
Bare metal: I can configure the IPI installer and the agent-based installer to place my control plane nodes and compute nodes on 3 or more subnets at installation time. I can configure an external load balancer for the VIPs.

Acceptance Criteria

Can place compute nodes on multiple subnets with IPI installations
Can place control plane nodes on multiple subnets with IPI installations
Can configure external load balancers for clusters deployed with IPI with control plane and compute nodes on multiple subnets
Can configure VIPs to in external load balancer routed to nodes on separate subnets and VLANs
Documentation exists for all the above cases

Epic OSASINFRA-3069: External load balancers with OpenStack IPI (Tech Preview)

View the Description

Epic Goal

As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers

Why is this important?

Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy.

Scenarios

A large deployment routed across multiple failure domains without stretched L2 networks, would require to dynamically route the control plane VIP traffic through load-balancers capable of living in multiple L2.
Customers who want to use their existing LB appliances for the control plane.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
QE - must be testing a scenario where we disable the internal LB and setup an external LB and OCP deployment is running fine.
Documentation - we need to document all the gotchas regarding this type of deployment, even the specifics about the load-balancer itself (routing policy, dynamic routing, etc)
For Tech Preview, we won't require Fixed IPs. This is something targeted for 4.14.

Dependencies (internal and external)

For GA, we'll need Fixed IPs, already WIP by vsphere: https://issues.redhat.com/browse/OCPBU-179

Previous Work:

vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task OSASINFRA-3097: Revendor openshift/api in CCO

View the Description View the linked PRs

This is needed once the API patch for External LB has merged.

https://github.com/openshift/cluster-config-operator/pull/278

Feature OCPSTRAT-31: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic STOR-1007: OCP 4.13 release chores

View the Description

Epic Goal

Update all images that we ship with OpenShift to the latest upstream releases and libraries.
Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Why is this important?

We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Story STOR-1020: Chore: update CSI sidecars

View the Description View the linked PRs

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

external-attacher
external-provisioner
external-resizer
external-snapshotter
node-driver-registrar
livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.

Story STOR-1019: Chore: update libraries in all operators

View the Description View the linked PRs

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

Kubernetes:
- client-go
- controller-runtime
OCP:
- library-go
- openshift/api
- openshift/client-go
- operator-sdk

Operators:

aws-ebs-csi-driver-operator
aws-efs-csi-driver-operator
azure-disk-csi-driver-operator
azure-file-csi-driver-operator
cinder-csi-driver-operator
gcp-pd-csi-driver-operator
gcp-filestore-csi-driver-operator
manila-csi-driver-operator
ovirt-csi-driver-operator
vmware-vsphere-csi-driver-operator
alibaba-disk-csi-driver-operator
ibm-vpc-block-csi-driver-operator
csi-driver-shared-resource-operator

cluster-storage-operator
csi-snapshot-controller-operator
local-storage-operator
vsphere-problem-detector

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/179

Epic WRKLDS-594: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/277

Feature OCPSTRAT-339: [Phase 2] Add a new platform type ("external") to identify clusters with non-integrated partner components enabled

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Feature Goal

Enable platform=external to support onboarding new partners, e.g. Oracle Cloud Infrastructure and VCSP partners.
Create a new platform type, working name "External", that will signify when a cluster is deployed on a partner infrastructure where core cluster components have been replaced by the partner. “External” is different from our current platform types in that it will signal that the infrastructure is specifically not “None” or any of the known providers (eg AWS, GCP, etc). This will allow infrastructure partners to clearly designate when their OpenShift deployments contain components that replace the core Red Hat components.

This work will require updates to the core OpenShift API repository to add the new platform type, and then a distribution of this change to all components that use the platform type information. For components that partners might replace, per-component action will need to be taken, with the project team's guidance, to ensure that the component properly handles the "External" platform. These changes will look slightly different for each component.

To integrate these changes more easily into OpenShift, it is possible to take a multi-phase approach which could be spread over a release boundary (eg phase 1 is done in 4.X, phase 2 is done in 4.X+1).

~~OCPBU-5~~: Phase 1

Write platform “External” enhancement.
Evaluate changes to cluster capability annotations to ensure coverage for all replaceable components.
Meet with component teams to plan specific changes that will allow for supplement or replacement under platform "External".
Start implementing changes towards Phase 2.

~~OCPBU-510~~: Phase 2

Update OpenShift API with new platform and ensure all components have updated dependencies.
Update capabilities API to include coverage for all replaceable components.
Ensure all Red Hat operators tolerate the "External" platform and treat it the same as "None" platform.

OCPBU-329: Phase.Next

Why is this important?

As partners begin to supplement OpenShift's core functionality with their own platform specific components, having a way to recognize clusters that are in this state helps Red Hat created components to know when they should expect their functionality to be replaced or supplemented. Adding a new platform type is a significant data point that will allow Red Hat components to understand the cluster configuration and make any specific adjustments to their operation while a partner's component may be performing a similar duty.
The new platform type also helps with support to give a clear signal that a cluster has modifications to its core components that might require additional interaction with the partner instead of Red Hat. When combined with the cluster capabilities configuration, the platform "External" can be used to positively identify when a cluster is being supplemented by a partner, and which components are being supplemented or replaced.

Scenarios

A partner wishes to replace the Machine controller with a custom version that they have written for their infrastructure. Setting the platform to "External" and advertising the Machine API capability gives a clear signal to the Red Hat created Machine API components that they should start the infrastructure generic controllers but not start a Machine controller.
A partner wishes to add their own Cloud Controller Manager (CCM) written for their infrastructure. Setting the platform to "External" and advertising the CCM capability gives a clear to the Red Hat created CCM operator that the cluster should be configured for an external CCM that will be managed outside the operator. Although the Red Hat operator will not provide this functionality, it will configure the cluster to expect a CCM.

Acceptance Criteria

Phase 1

Partners can read "External" platform enhancement and plan for their platform integrations.
Teams can view jira cards for component changes and capability updates and plan their work as appropriate.

Phase 2

Components running in cluster can detect the “External” platform through the Infrastructure config API
Components running in cluster react to “External” platform as if it is “None” platform
Partners can disable any of the platform specific components through the capabilities API

Phase 3

Components running in cluster react to the “External” platform based on their function.
- for example, the Machine API Operator needs to run a set of controllers that are platform agnostic when running in platform “External” mode.
- the specific component reactions are difficult to predict currently, this criteria could change based on the output of phase 1.

Dependencies (internal and external)

Previous Work (Optional):

Identifying OpenShift Components for Install Flexibility

Open questions::

Phase 1 requires talking with several component teams, the specific action that will be needed will depend on the needs of the specific component. At the least the components need to treat platform "External" as "None", but there could be more changes depending on the component (eg Machine API Operator running non-platform specific controllers).

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic OCPCLOUD-2011: Update External platform with CCM settings

View the Description

Epic Goal

Empower External platform type user to specify when they will run their own CCM

Why is this important?

For partners wishing to use components that require zonal awareness provided by the infrastructure (for example CSI drivers), they will need to exercise their own cloud controller managers. This epic is about adding the proper configuration to OpenShift to allow users of External platform types to run their own CCMs.

Scenarios

As a Red Hat partner, I would like to deploy OpenShift with my own CSI driver. To do this I need my CCM deployed as well. Having a way to instruct OpenShift to expect an external CCM deployment would allow me to do this.

Acceptance Criteria

CI - A new periodic test based on the External platform test would be ideal
Release Technical Enablement - Provide necessary release enablement details and documents.
- Update docs.ci.openshift.org with CCM docs

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPCLOUD-2010: Add CCM support to External platform API

View the Description View the linked PRs

User Story

As a Red Hat Partner installing OpenShift using the External platform type, I would like to install my own Cloud Controller Manager(CCM). Having a field in the Infrastructure configuration object to signal that I will install my own CCM and that Kubernetes should be configured to expect an external CCM will allow me to run my own CCM on new OpenShift deployments.

Background

This work has been defined in the External platform enhancement , and had previously been part of openshift/api . The CCM API pieces were removed for the 4.13 release of OpenShift to ensure that we did not ship unused portions of the API.

In addition to the API changes, library-go will need to have an update to the IsCloudProviderExternal function to detect the if the External platform is selected and if the CCM should be enabled for external mode.

We will also need to check the ObserveCloudVolumePlugin function to ensure that it is not affected by the external changes and that it continues to use the external volume plugin.

After updating openshift/library-go, it will need to be re-vendored into the MCO , KCMO , and CCCMO (although this is not as critical as the other 2).

Steps

update openshift/api with new CCM fields (re-revert #1409)
revendor api to library-go
update IsCloudProviderExternal in library-go to observe the new API fields
investigate ObserveCloudVolumePlugin to see if it requires changes
revendor library-go to MCO, KCMO, and CCCMO
update enhancement doc to reflect state

Stakeholders

openshift eng
oracle cloud install effort

Definition of Done

openshift can be installed with External platform type with kubelet, and related components, using the external cloud provider flags.

Docs

this will need to be documented in the API and as part of ~~OCPCLOUD-1581~~

Testing

this will need validation through unit test, integration testing may be difficult as we will need a new e2e built off the external platform with a ccm

Feature OCPSTRAT-370: Update ETCD datastore encryption to use AES-GCM instead of AES-CBC

View the Description

Proposed title of this feature request:

Update ETCD datastore encryption to use AES-GCM instead of AES-CBC

2. What is the nature and description of the request?

The current ETCD datastore encryption solution uses the aes-cbc cipher. This cipher is now considered "weak" and is susceptible to padding oracle attack. Upstream recommends using the AES-GCM cipher. AES-GCM will require automation to rotate secrets for every 200k writes.

The cipher used is hard coded.

3. Why is this needed? (List the business requirements here).

Security conscious customers will not accept the presence and use of weak ciphers in an OpenShift cluster. Continuing to use the AES-CBC cipher will create friction in sales and, for existing customers, may result in OpenShift being blocked from being deployed in production.

4. List any affected packages or components.

Epic API-1509: APIserver encryption cipher for etcd

View the Description

Epic Goal*

What is our purpose in implementing this? What new capability will be available to customers?

The Kube APIserver is used to set the encryption of data stored in etcd. See https://docs.openshift.com/container-platform/4.11/security/encrypting-etcd.html

Today with OpenShift 4.11 or earlier, only aescbc is allowed as the encryption field type.

~~RFE-3095~~ is asking that aesgcm (which is an updated and more recent type) be supported. Furthermore ~~RFE-3338~~ is asking for more customizability which brings us to how we have implemented cipher customzation with tlsSecurityProfile. See https://docs.openshift.com/container-platform/4.11/security/tls-security-profiles.html

Why is this important? (mandatory)

AES-CBC is considered as a weak cipher

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Bug OCPBUGS-10037: Enable aesgcm encryption provider by default in openshift/cluster-config-operator

View the Description View the linked PRs

The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.

https://github.com/openshift/cluster-config-operator/pull/289

Feature OCPSTRAT-402: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic CONSOLE-3433: Phase 4 - Unite Consoles (ACM, OCP) - GA

View the Description

Single interface to accomplish their tasks
Consistent UX and patterns
Easily accessible: One URL, one set of credentials

Why we want this?

Shared code - improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
Pre-built PF4 components
Accessibility & i18n
Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console

Enable user to quickly change context from fleet view to single cluster view
1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
2. Shared SSO across the fleet
3. Hub OCP Console can connect to remote clusters API
4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
Share UX between views
1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
2. Add Cluster List to OCP —> Create Cluster

Bug OCPBUGS-7111: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/724

Story CONSOLE-2843: Support metrics for spoke clusters

View the Description View the linked PRs

~~We need a way to show metrics for workloads running on spoke clusters. This depends on ~~ACM-876~~, which lets the console discover the monitoring endpoints.~~

~~Console operator must discover the external URLs for monitoring~~
~~Console operator must pass the URLs and CA files as part of the cluster config to the console backend~~
~~Console backend must set up proxies for each endpoint (as it does for the API server endpoints)~~
~~Console frontend must include the cluster in metrics requests~~

~~Open Issues:~~

~~We will depend on ACM to create a route on each spoke cluster for the prometheus tenancy service, which is required for metrics for normal users.~~

Openshift console backend should proxy managed cluster monitoring requests through the MCE cluster proxy addon to prometheus services on the managed cluster. This depends on https://issues.redhat.com/browse/ACM-1188

https://github.com/openshift/console-operator/pull/707

Sub-task CONSOLE-3394: Update console operator to create a thanos-querier ManagedProxyServiceResolver

View the linked PRs

https://github.com/openshift/console-operator/pull/707

Epic CONSOLE-3571: OCP 4.14 - OLM Epic

View the Description

This epic contains all the OLM related stories for OCP release-4.14

Epic Goal

Track all the stories under a single epic

Story CONSOLE-3279: Cluster nodes OS type filtering in OperatorHub - Console operator

View the Description View the linked PRs

Console operator should be building up a set of cluster nodes OS types, which he should supply to console, so it renders only operators that could be installed on the cluster.

This will be needed when we will support different OS types on the cluster.

We need to scan through the compute nodes and build a set of supported OS from those. Each node on the cluster has a label for its operating system: e.g. kubernetes.io/os=linux,

AC:

Implement logic in the console-operator that will scan though all the nodes and build a set of all the OS types that the cluster nodes run on and pass it to the console-config.yaml . This set of OS types will be then used by console frontend.
Add unit and e2e test cases in the console-operator repository.

https://github.com/openshift/console-operator/pull/742

Epic CONSOLE-3342: OCP 4.13 - OLM Epic

View the Description

This epic contains all the OLM related stories for OCP release-4.13

Epic Goal

Track all the stories under a single epic

Story CONSOLE-3286: OCP console works when copied CSVs are disabled - operator changes

View the Description View the linked PRs

Description/Acceptance Criteria:

Add RBAC for the console-operator so it can GET/LIST/WATCH OLMConfig cluster config. The RBAC should be added to console-operator cluster-role rules
The console operator should watch the spec.features.disableCopiedCSVs property of the OLM cluster config. When this property is true, the console-config should be updated "clusterInfo.copiedCSVsDisabled" field accordingly, and rollout a new version of console.

https://github.com/openshift/console-operator/pull/693

Epic CONSOLE-3234: Phase 3 - Pre-Work Unite Consoles (ACM, OCP)

View the Description

Pre-Work Objectives

Since some of our requirements from the ACM team will not be available for the 4.12 timeframe, the team should work on anything we can get done in the scope of the console repo so that when the required items are available in 4.13, we can be more nimble in delivering GA content for the Unified Console Epic.

Overall GA Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster.
Why customers want this?

Single interface to accomplish their tasks
Consistent UX and patterns
Easily accessible: One URL, one set of credentials

Why we want this?

Shared code - improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
Pre-built PF4 components
Accessibility & i18n
Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console

Enable user to quickly change context from fleet view to single cluster view
1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
2. Shared SSO across the fleet
3. Hub OCP Console can connect to remote clusters API
4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
Share UX between views
1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
2. Add Cluster List to OCP —> Create Cluster

Story CONSOLE-3125: Disable unsupported clusters in cluster dropdown

View the Description View the linked PRs

As a developer I would like to disable clusters like *KS that we can't support for multi-cluster (for instance because we can't authenticate). The ManagedCluster resource has a vendor label that we can use to know if the cluster is supported.

cc Ali Mobrem Sho Weimer Jakub Hadvig

UPDATE: 9/20/22 : we want an allow-list with OpenShift, ROSA, ARO, ROKS, and OpenShiftDedicated

Acceptance criteria:

Investigate if console-operator should pass info about which cluster are supported and unsupported to the frontend
Unsupported clusters should not appear in the cluster dropdown
Unsupported clusters based off
- defined vendor label
- non 4.x ocp clusters

https://github.com/openshift/console-operator/pull/677

Feature OCPSTRAT-410: BYOK for encryption should encrypt the default storageclass with the same key

View the Description

1. Proposed title of this feature request
BYOK encrypts root vols AND default storageclass

2. What is the nature and description of the request?
User story
As a customer spinning up managed OpenShift clusters, if I pass a custom AWS KMS key to the installer, I expect it (installer and cluster-storage-operator) to not only encrypt the root volumes for the nodes in the cluster, but also be applied to encrypt the first/default (gp2 in current case) StorageClass, so that my assumptions around passing a custom key are met.
In current state, if I pass a KMS key to the installer, only root volumes are encrypted with it, and the default AWS managed key is used for the default StorageClass.
Perhaps this could be offered as a flag to set in the installer to further pass the key to the storage class, or not.

3. Why does the customer need this? (List the business requirements here)
To satisfy that customers wish to encrypt their owned volumes with their selected key instead of the AWS default account key, by accident.

4. List any affected packages or components.

uncertain.

Note: this implementation should take effect on AWS, GCP and Azure (any cloud provider) equally.

Epic STOR-870: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story STOR-875: Implement custom keys in AWS EBS CSI driver operator

View the Description View the linked PRs

User Story:

As a cluster admin, I want OCP to provision new volumes with my custom encryption key that I specified during cluster installation in install-config.yaml so all OCP assets (PVs, VMs & their root disks) use the same encryption key.

Acceptance Criteria:

Description of criteria:

Check that dynamically provisioned PVs use the key specified in install-config.yaml
Check that the key can be changed in TBD API and all volumes newly provisioned after the key change use the new key. (Exact API is not defined yet, probably a new field in `Infrastructure`, calling it TBD API now).

(optional) Out of Scope:

Re-encryption of existing PVs with a new key. Only newly provisioned PVs will use the new key.

Engineering Details:

Enhancement (incl. TBD API with encryption key reference) will be provided as part of https://issues.redhat.com/browse/CORS-2080.

"Raw meat" of this story is translation of the key reference in TBD API to StorageClass.Parameters. AWS EBS CSi driver operator should update both the StorageClass it manages (managed-csi) with:

Parameters:
encrypted: "true"

kmsKeyId: "arn:aws:kms:us-east-1:012345678910:key/abcd1234-a123-456a-a12b-a123b4cd56ef"

Upstream docs: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/parameters.md

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/185

Feature OCPSTRAT-414: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic SDN-2931: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story SDN-4024: Add feature-gate for this feature

View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/330

Feature OCPSTRAT-506: ARO Managed Identity

View the Description

Feature Overview

Customers want to create and manage OpenShift clusters using managed identities for Azure resources for authentication.

Goals

A customer using ARO wants to spin up an OpenShift cluster with "az aro create" without needing additional input, i.e. without the need for an AD account or service principal credentials, and the identity used is never visible to the customer and cannot appear in the cluster.
As an administrator, I want to deploy OpenShift 4 and run Operators on Azure using access controls (IAM roles) with temporary, limited privilege credentials.

Requirements

Azure managed identities must work for installation with all install methods including IPI and UPI, work with upgrades, and day-to-day cluster lifecycle operations.
Support HyperShift and non-HyperShift clusters.
Support use of Operators with Azure managed identities.
Support in all Azure regions where Azure managed identity is available. Note: Federated credentials is associated with Azure Managed Identity, and federated credentials is not available in all Azure regions.

More details at ARO managed identity scope and impact.

This Section: A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories
Alternate flow/scenarios - high-level user stories
...

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content Strategy.
What concepts do customers need to understand to be successful in [action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or Release Note)?

References

Epic CCO-187: Azure Managed Identity (Workload Identity) Support

View the Description

Epic Overview

Enable customers to create and manage OpenShift clusters using managed identities for Azure resources for authentication.
A customer using ARO wants to spin up an OpenShift cluster with "az aro create" without needing additional input, i.e. without the need for an AD account or service principal credentials, and the identity used is never visible to the customer and cannot appear in the cluster.

Epic Goal

A customer creates an OpenShift cluster ("az aro create") using Azure managed identity.
Azure managed identities must work for installation with all install methods including IPI and UPI, work with upgrades, and day-to-day cluster lifecycle operations.
After Azure failed to implement workable golang API changes after deprecation of their old API, we have removed mint mode and work entirely in passthrough mode. Azure has plans to implement pod/workload identity similar to how they have been implemented in AWS and GCP, and when this feature is available, we should implement permissions similar to AWS/GCP
This work cannot start until Azure have implemented this feature - as such, this Epic is a placeholder to track the effort when available.

Why is this important?

Microsoft and the customer would prefer that we use Managed Identities vs. Service Principal (which requires putting the Service Principal and principal password in clear text within the azure.conf file).

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CCO-412: Promote AzureWorkloadIdentity feature to the default featureset

View the Description View the linked PRs

This document describes the expectations for promoting a feature that is behind a feature gate.

https://docs.google.com/document/d/1zOL38_KDKwAvsx-LMHfyDdX4-jq3HQU1YWBjfuHaM0Q/edit#heading=h.2se1quoqg6jr

The criteria includes:

Automated testing
- ~~CCO-234~~: Azure workload identity e2e testing
- ~~CCO-408~~: Add periodic e2e test for e2e-azure-manual-oidc
- ~~CCO-379~~: E2E Automation
- ~~CCO-380~~: CI Integration-Azure Managed Identity (Workload Identity) Support
QE sign off that the feature is complete
Staff engineer approval

https://github.com/openshift/cluster-config-operator/pull/351

Feature OCPSTRAT-518: Console: Customer Happiness (RFEs) for 4.15

View the Description

Feature Overview

Console enhancements based on customer RFEs that improve customer user experience.

Goals

This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?

CI - MUST be running successfully with test automation

This is a requirement for ALL features.

YES

Release Technical Enablement

Provide necessary release enablement details and documents.

YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories

Alternate flow/scenarios - high-level user stories

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?

Does this feature have doc impact?

New Content, Updates to existing content, Release Note, or No Doc Impact

If unsure and no Technical Writer is available, please contact Content Strategy.

What concepts do customers need to understand to be successful in [action]?

How do we expect customers will use the feature? For what purpose(s)?

What reference material might a customer want/need to complete [action]?

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.

What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-3542: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-3791: readOnlyRootFilesystem should be explicitly to true and if required to false for security reason

View the Description View the linked PRs

According to security best practice, it's recommended to set readOnlyRootFilesystem: true for all containers running on kubernetes. Given that openshift-console does not set that explicitly, it's requested that this is being evaluated and if possible set to readOnlyRootFilesystem: true or otherwise to readOnlyRootFilesystem: false with a potential explanation why the file-system needs to be write-able.

3. Why does the customer need this? (List the business requirements here)
Extensive security audits are run on OpenShift Container Platform 4 and are highlighting that many vendor specific container is missing to set readOnlyRootFilesystem: true or else justify why readOnlyRootFilesystem: false is set.

AC: Set up readOnlyRootFilesystem field on both console and console-operator deployment's spec. Part of the work is to determine the value. True if the pod if not doing any writing to its filesystem, otherwise false.

https://github.com/openshift/console-operator/pull/809

Feature OCPSTRAT-519: Unify and Instrument Hosted Control Planes Storage Operators

View the Description

Feature Overview (aka. Goal Summary)

Unify and update hosted control planes storage operators so that they have similar code patterns and can run properly in both standalone OCP and HyperShift's control plane.

Goals (aka. expected user outcomes)

Simplify the operators with a unified code pattern
Expose metrics from control-plane components
Use proper RBACs in the guest cluster
Scale the pods according to HostedControlPlane's AvailabilityPolicy
Add proper node selector and pod affinity for mgmt cluster pods

Requirements (aka. Acceptance Criteria):

OCP regression tests work in both standalone OCP and HyperShift
Code in the operators looks the same
Metrics from control-plane components are exposed
Proper RBACs are used in the guest cluster
Pods scale according to HostedControlPlane's AvailabilityPolicy
Proper node selector and pod affinity is added for mgmt cluster pods

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic STOR-1437: Refactor AWS EBS CSI driver operators to support hypershift

View the Description View the linked PRs

Epic Goal*

Our current design of EBS driver operator to support Hypershift does not scale well to other drivers. Existing design will lead to more code duplication between driver operators and possibility of errors.

Why is this important? (mandatory)

An improved design will allow more storage drivers and their operators to be added to hypershift without requiring significant changes in the code internals.

Scenarios (mandatory)

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/297

Story STOR-1461: Add ebs-operator to legacy folder and build and test image from it

View the Description View the linked PRs

Until final structure is finalized we should be able to build and test aws-ebs image from legacy folder.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/294

Feature OCPSTRAT-521: Tech Debt for Console Team

View the Description

This feature is the place holder for all epics related to technical debt associated with Console team

Outcome Overview

Once all Features and/or Initiatives in this Outcome are complete, what tangible, incremental, and (ideally) measurable movement will be made toward the company's Strategic Goal(s)?

Success Criteria

What is the success criteria for this strategic outcome? Avoid listing Features or Initiatives and instead describe "what must be true" for the outcome to be considered delivered.

Expected Results (what, how, when)

What incremental impact do you expect to create toward the company's Strategic Goals by delivering this outcome? (possible examples: unblocking sales, shifts in product metrics, etc. + provide links to metrics that will be used post-completion for review & pivot decisions). {}For each expected result, list what you will measure and when you will measure it (ex. provide links to existing information or metrics that will be used post-completion for review and specify when you will review the measurement such as 60 days after the work is complete)

Post Completion Review – Actual Results

After completing the work (as determined by the "when" in Expected Results above), list the actual results observed / measured during Post Completion review(s).

Epic CONSOLE-3359: 4.14 Console Dependencies & Tech Debt

View the Description

An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.

Story CONSOLE-3278: Add client certificate and key to service monitor

View the Description View the linked PRs

Added client certificates based on https://github.com/deads2k/openshift-enhancements/blob/master/enhancements/monitoring/client-cert-scraping.md

https://github.com/openshift/console-operator/pull/668

Feature OCPSTRAT-58: Cloud Controller Manager: GCP (GA)

View the Description

Feature Overview (aka. Goal Summary)

Upstream Kuberenetes is following other SIGs by moving it's intree cloud providers to an out of tree plugin format, Cloud Controller Manager, at some point in a future Kubernetes release. OpenShift needs to be ready to action this change

Goals (aka. expected user outcomes)

GA of the cloud controller manager for the GCP platform

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic OCPCLOUD-1756: Cloud Controller Manager: GCP (GA)

View the Description

Epic Goal

We need to GA the GCP Cloud Controller Manager

Why is this important?

Upstream is moving to out of tree cloud providers and we need to be ready

Scenarios

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Story OCPCLOUD-1989: Switch GCP CCM to GA

View the Description View the linked PRs

Background

To make the CCM GA, we need to update the switch case in library go to make sure the GCP CCM is always considered external.

We then need to update the vendor in KCMO, CCMO, KASO and MCO.

Steps

Create a PR for updating library go
Create PRs for updating the vendor in dependent repos
Leverage an engineer with merge right (eg David Eads) to merge the library go, KCMO and CCMO changes simultaneously
Merge KASO and MCO changes

Stakeholders

Cluster Infra

Definition of Done

GCP CCM is enabled by default

Docs

Testing

<Explain testing that will be added>

https://github.com/openshift/cluster-config-operator/pull/319

Feature OCPSTRAT-587: Apply user defined tags to all resources created by OpenShift (Azure) TP

View the Description

Feature Overview

Create a Azure cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in Azure) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

Goals

Functionality on Azure Tech Preview
inclusion in the cluster backups
flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

List any affected packages or components.

Installer
Cluster Infrastructure
Storage
Node
NetworkEdge
Internal Registry
CCO

Epic CORS-2249: User defined tags for Azure Resources

View the Description

This epic covers the work to apply user defined tags to Azure created for openshift cluster available as tech preview.

The user should be able to define the azure tags to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for tagging when the resources are created.

Updating/deleting of tags added during cluster creation or adding new tags as Day-2 operation is out of scope of this epic.

List any affected packages or components.

Installer
Cluster Infrastructure
Storage
Node
NetworkEdge
Internal Registry
CCO

Reference - https://issues.redhat.com/browse/RFE-2017

Story CFE-601: Update openshift/api package version in cluster-config-operator

View the Description View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/279

Feature OCPSTRAT-590: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic STOR-1155: OCP 4.14 release chores

View the Description

Epic Goal

Update all images that we ship with OpenShift to the latest upstream releases and libraries.
Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Why is this important?

We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Story STOR-1169: Chore: update CSI sidecars

View the Description View the linked PRs

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

external-attacher
external-provisioner
external-resizer
external-snapshotter
node-driver-registrar
livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

Story STOR-1168: Chore: update libraries in all operators

View the Description View the linked PRs

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

Kubernetes:
- client-go
- controller-runtime
OCP:
- library-go
- openshift/api
- openshift/client-go
- operator-sdk

Operators:

aws-ebs-csi-driver-operator
aws-efs-csi-driver-operator
azure-disk-csi-driver-operator
azure-file-csi-driver-operator
openstack-cinder-csi-driver-operator
gcp-pd-csi-driver-operator
gcp-filestore-csi-driver-operator
csi-driver-manila-operator
vmware-vsphere-csi-driver-operator
alibaba-disk-csi-driver-operator
ibm-vpc-block-csi-driver-operator
csi-driver-shared-resource-operator
ibm-powervs-block-csi-driver-operator

cluster-storage-operator
cluster-csi-snapshot-controller-operator
local-storage-operator
vsphere-problem-detector

EOL, do not upgrade:

github.com/oVirt/csi-driver-operator

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/222

Story STOR-1167: Chore: Update aws-ebs-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/223

Epic STOR-1263: Update Control Plane Kubernetes Version to 1.27

View the Description View the linked PRs

Epic Goal*

What is our purpose in implementing this? What new capability will be available to customers?

Why is this important? (mandatory)

What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/713

Feature OCPSTRAT-621: Include Serverless Function Samples in Sample Catalog

View the Description

Feature Overview (aka. Goal Summary)

An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Complete during New status.

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic ODC-7241: Include Serverless Function Samples in our Sample Catalog

View the Description

Problem:

As a developer of serverless functions, we don't provide any samples.

Goal:

Provide Serverless Function samples in the sample catalog. These would be utilizing the Builder Image capabilities.

Why is it important?

Use cases:

<case>

Acceptance criteria:

<criteria>

Dependencies (External/Internal):

Serverless team would need to provide sample repo for serverless function
Samples operator would need to be update

Design Artifacts:

Exploration:

Note:

Need to define the API and confirm with other stakeholders - need to support a serverless func image stream "tag"
Serverless team will need to provide updates to the existing Image Streams, as well as maintain the sample repositories which are referenced in the Image Streams.
Need to understand the relationship between ImageStream and Image Stream Tag
Should serverless function samples in the catalog have "builder image" tag? or should it be "serverless function"

Story ODC-7333: Update console-operator to latest API to provide the ConsoleSample CRD, add RBAC permissions

View the Description View the linked PRs

Description

As an operator author, I want to provide additional samples that are tied to an operator version, not an OpenShift release. For that, I want to create a resource to add new samples to the web console.

Acceptance Criteria

openshift/console-operator update so that new clusters have the new ConsoleSample CRD
Add RBAC permissions (roles and rolebinding?) so that all users have access to ConsoleSample resources

Additional Details:

Feature OCPSTRAT-706: [Tech Prev] Add ControlPlaneMachineSet for vSphere

View the Description

Feature Overview (aka. Goal Summary)

Add ControlPlaneMachineSet for vSphere

Epic SPLAT-1110: [Tech Preview] Add ControlPlaneMachineSet support for vSphere

View the Description

Goal

Add ControlPlaneMachineSet for vSphere

Task SPLAT-1127: [vsphere] update OpenShift API

View the Description View the linked PRs

The OpenShift API needs to be updated to define VSphereFailureDomain. A draft PR is here: https://github.com/openshift/api/pull/1539

Also, ensure that the client-go and openshift-cluster-config-operator projects are bumped once the API changes merge.

https://github.com/openshift/cluster-config-operator/pull/342

Feature OCPSTRAT-715: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic STOR-1425: Update Control Plane Kubernetes Version to 1.28

View the Description View the linked PRs

Epic Goal*

What is our purpose in implementing this? What new capability will be available to customers?

Why is this important? (mandatory)

What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/742

Epic WRKLDS-806: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/305

Epic STOR-1383: OCP 4.15 release chores

View the Description

Epic Goal

Update all images that we ship with OpenShift to the latest upstream releases and libraries.
Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF).

Why is this important?

We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Story STOR-1404: Chore: update CSI sidecars

View the Description View the linked PRs

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

external-attacher
external-provisioner
external-resizer
external-snapshotter
node-driver-registrar
livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

Feature OCPSTRAT-838: Provide discoverability about RHDH

View the Description

< High-Level description of the feature ie: Executive Summary >

Goals

Cluster administrators need an in-product experience to discover and install new Red Hat offerings that can add high value to developer workflows.

Requirements

Requirements	Notes	IS MVP
Discover new offerings in Home Dashboard		Y
Access details outlining value of offerings		Y
Access step-by-step guide to install offering		N
Allow developers to easily find and use newly installed offerings		Y
Support air-gapped clusters		Y

- (Optional) Use Cases

< What are we making, for who, and why/what problem are we solving?>

Out of scope

Discovering solutions that are not available for installation on cluster

Dependencies

No known dependencies

Background, and strategic fit

Assumptions

None

Customer Considerations

Documentation Considerations

Quick Starts

What does success look like?

QE Contact

Impact

Related Architecture/Technical Documents

Done Checklist

Acceptance criteria are met
Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
User Journey automation is delivered
Support and SRE teams are provided with enough skills to support the feature in production environment

Epic ODC-7349: Discover and install RHDH from OpenShift console

View the Description

Problem:

Cluster admins need to be guided to install RHDH on the cluster.

Goal:

Enable admins to discover RHDH, be guided to installing it on the cluster, and verifying its configuration.

Why is it important?

RHDH is a key multi-cluster offering for developers. This will enable customers to self-discover and install RHDH.

Acceptance criteria:

Show RHDH card in Admin->Dashboard view
Enable link to RHDH documentation from the card
Quick start to install RHDH operator
Guided flow to installation and configuration of operator from Quick Start
RHDH UI link in top menu
Successful log in to RHDH

Dependencies (External/Internal):

RHDH operator

Design Artifacts:

Exploration:

Note:

Bug OCPBUGS-28541: Remove Janus IDP information from console Quick starts [4.16]

View the Description View the linked PRs

Description of problem:
The OpenShift Console QuickStarts promotes RHDH but also includes Janus IDP information.

The Janus IDP quick starts should be removed and all information about Janus IDP should be removed.

Version-Release number of selected component (if applicable):
4.15

How reproducible:
Always

Steps to Reproduce:
Just navigate to Quick starts and select the "Install Red Hat Developer Hub (RHDH) with an Operator" quick starts

Actual results:

The RHDH Operator Quick start contains some information and links to Janus IDP.
The Janus IDP Quick start exists and is similar to the RHDH one.

Expected results:

The RHDH Operator Quick start must not contain information about Janus IDP.
The Janus IDP Quick start should be removed

Additional info:
Initial PR: https://github.com/openshift/console-operator/pull/806

https://github.com/openshift/console-operator/pull/862

Bug OCPBUGS-29331: Update RHDH Quick starts to incl. some CRD specific information [4.16]

View the Description View the linked PRs

Description of problem:
The OpenShift Console QuickStarts that promotes RHDH was written in generic terms and doesn't include some information on how to use the CRD-based installation.

We have removed this specific information because the operator wasn't ready at that time. As soon as the RHDH operator is available in the OperatorHub we should update the QuickStarts with some more detailed information.

With a simple CR example and some info on how to customize the base URL or colors.

Version-Release number of selected component (if applicable):
4.15

How reproducible:
Always

Steps to Reproduce:
Just navigate to Quick starts and select the "Install Red Hat Developer Hub (RHDH) with an Operator" quick starts

Actual results:
The RHDH Operator Quick start exists but is written in a generic way.

Expected results:
The RHDH Operator Quick start should contain some more specific information.

Additional info:
Initial PR: https://github.com/openshift/console-operator/pull/806

https://github.com/openshift/console-operator/pull/878

Story ODC-7418: Create Console QuickStarts that explain how to install Janus IDP and RHDH

View the Description View the linked PRs

Description

As a cluster admin, I want to see and learn how to install Janus IDP / Red Hat Developer Hub (RHDH)

Acceptance Criteria

Create a Quickstart that is part of the console operator that explains how to install Janus IDP / Red Hat Developer Hub (RHDH)

Additional Details:

https://github.com/openshift/console-operator/pull/806

Feature OCPSTRAT-845: [Tech Preview] Proper MCO State Reporting

View the Description

Feature Overview (aka. Goal Summary)

The MCO should properly report its state in a way that's consistent and able to be understood by customers, troubleshooters, and maintainers alike.

Some customer cases have revealed scenarios where the MCO state reporting is misleading and therefore could be unreliable to base decisions and automation on.

In addition to correcting some incorrect states, the MCO will be enhanced for a more granular view of update rollouts across machines.

Epic MCO-452: [tech-preview] Proper state reporting when the MCO changes state

View the Description

The MCO should properly report its state in a way that's consistent and able to be understood by customers, troubleshooters, and maintainers alike.

For this epic, "state" means "what is the MCO doing?" – so the goal here is to try to make sure that it's always known what the MCO is doing.

This includes:

Conditions
Some Logging
Possibly Some Events

While this probably crosses a little bit into the "status" portion of certain MCO objects, as some state is definitely recorded there, this probably shouldn't turn into a "better status reporting" epic. I'm interpreting "status" to mean "how is it going" so status is maybe a "detail attached to a state".

Exploration here: https://docs.google.com/document/d/1j6Qea98aVP12kzmPbR_3Y-3-meJQBf0_K6HxZOkzbNk/edit?usp=sharing

https://docs.google.com/document/d/17qYml7CETIaDmcEO-6OGQGNO0d7HtfyU7W4OMA6kTeM/edit?usp=sharing

Story MCO-813: Implement featuregate for MachineConfigNode In the MCO and ClusterConfigOperator

View the Description View the linked PRs

Ensure that the pod exists but the functionality behind the pod is not exposed by default in the release version this work ships in.

This can be done by creating a new featuregate in openshift/api, vendoring that into the cluster config operator, and then checking for this featuregate in the state controller code of the MCO.

https://github.com/openshift/cluster-config-operator/pull/368

Feature TELCOSTRAT-30: Workload Partitioning For Multi Node Clusters

View the Description

Note: Replace text in red with details of your feature request.

Feature Overview

Extend the Workload Partitioning feature to support multi-node clusters.

Goals

Customers running RAN workloads on C-RAN Hubs (i.e. multi-node clusters) that want to maximize the cores available to the workloads (DU) should be able to utilize WP to isolate CP processes to reserved cores.

Requirements

A list of specific needs or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

requirement	Notes	isMvp?

Describe Use Cases (if needed)

< How will the user interact with this feature? >

< Which users will use this and when will they use it? >

< Is this feature used as part of current user interface? >

Out of Scope

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

< Are there Upgrade considerations that customers need to account for or that the feature should address on behalf of the customer?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

< What does success look like?>

< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact>

< If unsure and no Technical Writer is available, please contact Content Strategy. If yes, complete the following.>

<What concepts do customers need to understand to be successful in [action]?>

<How do we expect customers will use the feature? For what purpose(s)?>

<What reference material might a customer want/need to complete [action]?>

<Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available. >

<What is the doc impact (New Content, Updates to existing content, or Release Note)?>

Interoperability Considerations

< Which other products and versions in our portfolio does this feature impact?>

< What interoperability test scenarios should be factored by the layered product(s)?>

Questions

Question	Outcome

Epic OCPEDGE-38: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-532: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature TELCOSTRAT-87: Single Core CPU CaaS Budget for DU Deployment w/ Single-Node OpenShift on Sapphire Rapids Platform

View the Description

Feature Overview

Reduce the OpenShift platform and associated RH provided components to a single physical core on Intel Sapphire Rapids platform for vDU deployments on SingleNode OpenShift.

Goals

Reduce CaaS platform compute needs so that it can fit within a single physical core with Hyperthreading enabled. (i.e. 2 CPUs)
Ensure existing DU Profile components fit within reduced compute budget.
Ensure existing ZTP, TALM, Observability and ACM functionality is not affected.
Ensure largest partner vDU can run on Single Core OCP.

Requirements

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES
Provide a mechanism to tune the platform to use only one physical core.	Users need to be able to tune different platforms.	YES
Allow for full zero touch provisioning of a node with the minimal core budget configuration.	Node provisioned with SNO Far Edge provisioning method - i.e. ZTP via RHACM, using DU Profile.	YES
Platform meets all MVP KPIs		YES

(Optional) Use Cases

Main success scenario: A telecommunications provider uses ZTP to provision a vDU workload on Single Node OpenShift instance running on an Intel Sapphire Rapids platform. The SNO is managed by an ACM instance and it's lifecycle is managed by TALM.

Questions to answer...

Out of Scope

Core budget reduction on the Remote Worker Node deployment model.

Background, and strategic fit

Assumptions

The more compute power available for RAN workloads directly translates to the volume of cell coverage that a Far Edge node can support.
Telecommunications providers want to maximize the cell coverage on Far Edge nodes.
To provide as much compute power as possible the OpenShift platform must use as little compute power as possible.
As newer generations of servers are deployed at the Far Edge and the core count increases, no additional cores will be given to the platform for basic operation, all resources will be given to the workloads.

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
- Administrators must know how to tune their Far Edge nodes to make them as computationally efficient as possible.

Does this feature have doc impact?
- Possibly, there should be documentation describing how to tune the Far Edge node such that the platform uses as little compute power as possible.

New Content, Updates to existing content, Release Note, or No Doc Impact
- Probably updates to existing content

If unsure and no Technical Writer is available, please contact Content Strategy. What concepts do customers need to understand to be successful in [action]?
- Performance Addon Operator, tuned, MCO, Performance Profile Creator

How do we expect customers will use the feature? For what purpose(s)?
- Customers will use the Performance Profile Creator to tune their Far Edge nodes. They will use RHACM (ZTP) to provision a Far Edge Single-Node OpenShift deployment with the appropriate Performance Profile.

What reference material might a customer want/need to complete [action]?
- Performance Addon Operator, Performance Profile Creator

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
- N/A

What is the doc impact (New Content, Updates to existing content, or Release Note)?
- Likely updates to existing content / unsure

Epic OCPEDGE-41: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-481: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-469: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature ACM-739: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic ACM-2051: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story PD-1365: Provide learning path for MCE [quickstart]

View the Description View the linked PRs

For users who are using OpenShift but have not yet begun to explore multicluster and we we offer them.

I'm investigating where Learning paths are today and what is required.

As a user I'd like to have learning path for how to get started with Multicluster.
Install MCE
Create multiple clusters
Use HyperShift
Provide access to cluster creation to devs via templates
Scale up to ACM/ACS (OPP?)

Status
https://github.com/patternfly/patternfly-quickstarts/issues/37#issuecomment-1199840223

https://github.com/openshift/console-operator/pull/698

Feature CNV-24729: Dynamic Resource Allocation for VMs

View the Description

Goal: Resources provided via the Dynamic Resource Allocation Kubernetes mechanism can be consumed by VMs.

Details: Dynamic Resource Allocation

Epic CNV-24730: Design: Dynamic Resource Allocation (k8s feature) for VMs

View the Description

Goal

Come up with a design of how resources provided by Dynamic Resource Allocation can be consumed by KubeVirt VMs.

Description

The Dynamic Resource Allocation (DRA) feature is an alpha API in Kubernetes 1.26, which is the base for OpenShift 4.13.
This feature provides the ability to create ResourceClaim and ResourceClasse to request access to Resources. This is similar to the dynamic provisioning of PersistentVolume via PersistentVolumeClaim and StorageClasse.

NVIDIA has been a lead contributor to the KEP and has already an initial implementation of a DRA driver and plugin, with a nice demo recording. NVIDIA is expecting to have this DRA driver available in CY23 Q3 or Q4, so likely in NVIDIA GPU Operator v23.9, around OpenShift 4.14.

When asked about the availability of MIG-backed vGPU for Kubernetes, NVIDIA said that the timeframe is not decided yet, because it will likely use DRA for the MIG devices creation and their registration with the vGPU host driver. The MIG-base vGPU feature for OpenShift Virtualization will then likely require support of DRA to request vGPU resources for the VMs.

Not having MIG-backed vGPU is a risk for OpenShift Virtualization adoption in GPU use cases, such as virtual workstations for rendering with Windows-only softwares. Customers who want to have a mix of passthrough, time-based vGPU and MIG-backed vGPU will prefer competitors who offer the full range of options. And the certification of NVIDIA solutions like NVIDIA Omniverse will be blocked, despite a great potential to increase the OpenShift consumption, as it uses RTX/A40 GPU for virtual workstations (not certified by NVIDIA on OpenShift Virtualization yet) and A100/H100 for physics simulation, both use cases probably leveraring vGPUs [7]. There's a lot of necessary conditions for that to happen and MIG-backed vGPU support is one of them.

User Stories

GPU consumption optimization
"As an Admin, I want to let NVIDIA GPU DRA driver provision vGPUs for OpenShift Virtualization, so that it optimizes the allocation with dynamic provisioning of time or MIG backed vGPUs"
GPU mixed types per server
"As an Admin, I want to be able to mix different types of GPU to collocate different types of workloads on the same host, in order to improve multi-pod/stack performance.

Non-Requirements

List of things not included in this epic, to alleviate any doubt raised during the grooming process.

Notes

Any additional details or decisions made/needed

References

Done Checklist

Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Story WRKLDS-705: Enable DynamicResourceAllocation feature gate with TPNoUpgrade in 4.13

View the Description View the linked PRs

Part of making https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation available for early adoption.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/701

Feature OCPPLAN-3342: The details of this Jira Card are restricted (Only Red Hat employees and contractors)

View the Description

The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Feature OCPPLAN-4333: The details of this Jira Card are restricted (Only Red Hat employees and contractors)

View the Description

The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Epic CORS-1512: AWS: Support for C2S Region

View the Description

Goal:

As an administrator, I would like to deploy OpenShift 4 clusters to AWS C2S region

Problem:

Customers were able to deploy to AWS C2S region in OCP 3.11, but our global configuration in OCP 4.1 doesn't support this.

Why is this important:

Many of our public sector customers would like to move off 3.11 and on to 4.1, but missing support for AWS C2S region will prevent them from being able to migrate their environments..

Lifecycle Information:

Core

Previous Work:**

Here are the relevant PRs from OCP 3.11. You can see that these endpoints are not part of the standard SDK (they use an entirely separate SDK). To support these regions the endpoints had to be configured explicitly.

Kube: https://github.com/kubernetes/kubernetes/pull/72245
Autoscaler: https://github.com/kubernetes/autoscaler/pull/1717
Installer: https://github.com/openshift/openshift-ansible/pull/11277/

Seth Jennings has put together a highly customized POC.

Dependencies:

Custom API endpoint support w/ CA

- Cloud / Machine API
- Image Registry
- Ingress
- Kube Controller Manager
- Cloud Credential Operator
- others?
Require access to local/private/hidden AWS environment

Prioritized epics + deliverables (in scope / not in scope):

Allow AWS C2S region to be specified for OpenShift cluster deployment
Enable customers to use their own managed internal/cluster DNS solutions due to provider and operational restrictions
Document deploying OpenShift to AWS C2S region
Enable CI for the AWS C2S region

Estimate (XS, S, M, L, XL, XXL): L

Customers: North America Public Sector and Government Agencies

Open Questions:

Story CORS-1584: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/638

Feature OCPPLAN-8029: Console: Dynamic Plugin Framework

View the Description

Feature Overview

Plugin teams need a mechanism to extend the OCP console that is decoupled enough so they can deliver at the cadence of their projects and not be forced in to the OCP Console release timelines.

The OCP Console Dynamic Plugin Framework will enable all our plugin teams to do the following:

Extend the Console
Deliver UI code with their Operator
Work in their own git Repo
Deliver at their own cadence

Goals

- Operators can deliver console plugins separate from the console image and update plugins when the operator updates.
- The dynamic plugin API is similar to the static plugin API to ease migration.
- Plugins can use shared console components such as list and details page components.
- Shared components from core will be part of a well-defined plugin API.
- Plugins can use Patternfly 4 components.
- Cluster admins control what plugins are enabled.
- Misbehaving plugins should not break console.
- Existing static plugins are not affected and will continue to work as expected.

Out of Scope

- Initially we don't plan to make this a public API. The target use is for Red Hat operators. We might reevaluate later when dynamic plugins are more mature.
- We can't avoid breaking changes in console dependencies such as Patternfly even if we don't break the console plugin API itself. We'll need a way for plugins to declare compatibility.
- Plugins won't be sandboxed. They will have full JavaScript access to the DOM and network. Plugins won't be enabled by default, however. A cluster admin will need to enable the plugin.
- This proposal does not cover allowing plugins to contribute backend console endpoints.

Requirements

Requirement	Notes	isMvp?
UI to enable and disable plugins		YES
Dynamic Plugin Framework in place		YES
Testing Infra up and running		YES
Docs and read me for creating and testing Plugins		YES
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?

Does this feature have doc impact?

New Content, Updates to existing content, Release Note, or No Doc Impact

If unsure and no Technical Writer is available, please contact Content Strategy.

What concepts do customers need to understand to be successful in [action]?

How do we expect customers will use the feature? For what purpose(s)?

What reference material might a customer want/need to complete [action]?

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.

What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-2907: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-2381: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/7471

Epic CONSOLE-2368: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-2378: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/6101

Story CONSOLE-2405: Add support for using dynamic extensions in static plugins

View the Description View the linked PRs

Console static plugins, maintained as part of the frontend monorepo, currently use static extension types (packages/console-plugin-sdk/src/typings) which directly reference various kinds of objects, including React components and arbitrary functions.

To ease the long-term transition from static to dynamic plugins, we should support a use case where an existing static plugin goes through the following stages:

use static extensions only (we are here)
use both static and dynamic extensions
use dynamic extensions only (ideal target for 4.7)

Once a static plugin reaches the "use dynamic extensions only" stage, its maintainers can move it out of the Console monorepo - the plugin becomes dynamic, shipped via the corresponding operator and loaded by Console app at runtime.

https://github.com/openshift/console/pull/7163

Story CONSOLE-2376: Complete API and console operator changes to support dynamic plugins

View the Description View the linked PRs

Dynamic plugins will need changes to the console operator config to be enabled and disabled. We'll need either a new CRD or an annotation on CSVs for console to discover when plugins are available.

This story tracks any API updates needed to openshift/api and any operator updates needed to wire through dynamic plugins as console config.

See https://github.com/openshift/enhancements/pull/441 for design details.

Feature OCPPLAN-8030: Console: Customer Happiness (RFEs) for 4.8-4.12

View the Description

Feature Overview

This Section:* High-Level description of the feature ie: Executive Summary

Note: A Feature is a capability or a well defined set of functionality that delivers business value. Features can include additions or changes to existing functionality. Features can easily span multiple teams, and multiple releases.

Goals

This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?

CI - MUST be running successfully with test automation

This is a requirement for ALL features.

YES

Release Technical Enablement

Provide necessary release enablement details and documents.

YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories

Alternate flow/scenarios - high-level user stories

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?

Does this feature have doc impact?

New Content, Updates to existing content, Release Note, or No Doc Impact

If unsure and no Technical Writer is available, please contact Content Strategy.

What concepts do customers need to understand to be successful in [action]?

How do we expect customers will use the feature? For what purpose(s)?

What reference material might a customer want/need to complete [action]?

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.

What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-3193: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-3252: Inform OCP users that the platform is performing an upgrade

View the Description View the linked PRs

When OCP is performing cluster upgrade user should be notified about this fact.

There are two possibilities how to surface the cluster upgrade to the users:

Display a console notification throughout OCP web UI saying that the cluster is currently under upgrade.
Global notification throughout OCP web UI saying that the cluster is currently under upgrade.
Have an alert firing for all the users of OCP stating the cluster is undergoing an upgrade.

AC:

Console-operator will create a ConsoleNotification CR when the cluster is being upgraded. Once the upgrade is done console-operator will remote that CR. These are the three statuses based on which we are determining if the cluster is being upgraded.
Add unit tests

Note: We need to decide if we want to distinguish this particular notification by a different color? ccing Ali Mobrem

Created from: https://issues.redhat.com/browse/RFE-3024

https://github.com/openshift/console-operator/pull/687

Epic CONSOLE-3051: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-3063: [RFE] PDB for console operands to avoid going too many replicas down

View the Description View the linked PRs

During master nodes upgrade when nodes are getting drained there's currently no protection from two or more operands going down. If your component is required to be available during upgrade or other voluntary disruptions, please consider deploying PDB to protect your operands.

The effort is tracked in https://issues.redhat.com/browse/WRKLDS-293.

Example:

https://github.com/openshift/cluster-authentication-operator/pull/476/files

https://github.com/openshift/cluster-authentication-operator/pull/514/files

Acceptance Criteria:
1. Create PDB controller in console-operator for both console and downloads pods
2. Add e2e tests for PDB in single node and multi node cluster

Note: We should consider to backport this to 4.10

https://github.com/openshift/console-operator/pull/655

Epic CONSOLE-2382: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-2496: Update custom console routes to use new CustomDomains cluster API

View the Description

The work on this story is dependent on following changes:

Enhancement doc - https://github.com/openshift/enhancements/pull/577
API change - https://github.com/openshift/api/pull/852
Ingress operator changes - https://github.com/openshift/cluster-ingress-operator/pull/552

The console already supports custom routes on the operator config. With the new proposed CustomDomains API introduces a unified way how to stock install custom domains for routes, which both names and serving cert/keys, customers want to customise. From the console perspective those are:

openshift-console / console
openshift-console / downloads (CLI downloads)

The setup should be done on the Ingress config. There two new fields are introduced:

ComponentRouteSpec - contains configuration of the for the custom domain(name, namespace, custom hostname, TLS secret reference)
ComponentRouteStatus - contains status of the custom domain(condition, previous hostname, rbac needed to read the TLS secret, ...)

Console-operator will be only consuming the API and check for any changes. If a custom domain is set for either `console` or `downloads` route in the `openshift-console` namespace, console-operator will read the setup set a custom route accordingly. When a custom route will be setup for any of console's route, the default route wont be deleted, but instead it will updated so it redirects to the custom one. This is done because of two reasons:

we want to prevent somebody from stealing the default hostname of both routes (console, downloads)
we want to prevent users from having unusable bookmarks that are pointing to the default hostname

Console-operator will still need to support the CustomDomain API that is available on it's config.

Acceptance criteria:

Console supports the new CustomDomains API for configuring a custom domain for both `console` and `downloads` routes
Console falls back to the deprecated API in the console operator config if present
Console supports the original default domains and redirects to the new ones

Questions:

Which CustomDomain API takes precedens? Ingress config vs. Console-operator config. Can upgrade cause any issues?

Sub-task CONSOLE-2792: Bump openshift/api dependecy in console-operator to get CustomDomain API for Ingress config

View the Description View the linked PRs

Dump openshift/api godep to pickup new CustomDomain API for the Ingress config.

https://github.com/openshift/console-operator/pull/517

Sub-task CONSOLE-2793: Implement console-operator changes to consume new CustomDomains API

View the Description View the linked PRs

Implement console-operator changes to consume new CustomDomains API, based on the story details.

https://github.com/openshift/console-operator/pull/522

Feature OCPPLAN-8036: Console: QuickStart Extensions

View the Description

Feature Overview

Quick Starts are key tool for helping our customer to discover and understand how to take advantage of services that run on top of the OpenShift Platform. This feature will focus on making the Quick Starts extensible. With this Console extension our customer and partners will be able to add in their own Quick Starts to help drive a great user experience.

Enhancement PR: https://github.com/openshift/enhancements/pull/360

Goals

Provide a supported API that our internal teams, external partners and customers can use to create Quick Starts for the OCP Console.( QUICKSTART CRD)
Provide proper documentation and templates to make it as easy as possible to create Quick Starts
Update the Console Operator to generate Quick Starts for enabling key services on the OCP Clusters:
- Serverless
- Pipelines
- Virtualization
- OCS
- ServiceMesh
Process to validate Quick Starts for each release (Internal Teams Only)
Support Internationalization

Requirements

Requirement	Notes	isMvp?
Define QuickStart CRD		YES
Console Operator: Out of the box support for installing Quick Starts for enabling Operators		YES
Process( Design\Review), Documentation + Template for providing out of the box quick starts		YES
Process, Docs, for enabling Operators to add Quick Starts		YES
Migrate existing UI to work with CRD
Move Existing Quick Starts to new CRD		YES
Support Internationalization		NO
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?

Does this feature have doc impact?

New Content, Updates to existing content, Release Note, or No Doc Impact

If unsure and no Technical Writer is available, please contact Content Strategy.

What concepts do customers need to understand to be successful in [action]?

How do we expect customers will use the feature? For what purpose(s)?

What reference material might a customer want/need to complete [action]?

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.

What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-2232: Console: Quick Starts: Extensibility

View the Description

Goal

Provide a dynamic extensible mechanism to add Guided Tours to the OCP Console.

Format of Guided Tours (Almost like Interactive Documentation)
- Markup language
- Made up of Long Description, Steps, Links(Internal, External), ~~Images~~
- Steps should be links to the page. User navigates to the next page by clicking a button
- A tour may offer a link to a secondary tour
- A tour may offer an external link to documentation, etc
- Progress indicator, indicating the number of steps in the tour
- Perspective

Tour information to be displayed on the Cards in the landing page
- Title
- Summary (aka Short Desc)
- Image
- Approx time
- Perspective
- Indication if already completed (local storage, maybe this is punted til User Pref)

User Stories/Scenarios
As a product owner, I need a mechanism to add guide tours that will help guide my users to enable my service.
As a product owner, I need a mechanism to add guide tours that will help guide my users to consume my service.
As an Operator, I need a mechanism to add guide tours that will help guide my users to consume my service.

+Acceptance Criteria

CI - MUST be running successfully with tests automated
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CONSOLE-2472: Add README to console-operator repo describing how to contribute out of the box quick starts

View the Description View the linked PRs

We need a README about the submission process for adding quick starts to the console-operator repo.

https://github.com/openshift/console-operator/pull/490

Task CONSOLE-2479: Add YAML Sample for QuickStart CRD

View the Description View the linked PRs

As user I would like to see a YAMLSample for the new QuickStart CRD when I go and create my own QuickStarts.

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

https://github.com/openshift/console/pull/7457

Feature OCPSTRAT-1039: [GA] Add ControlPlaneMachineSet for vSphere

View the Description

Feature Overview (aka. Goal Summary)

Add ControlPlaneMachineSet for vSphere

Epic SPLAT-1388: [GA] Add ControlPlaneMachineSet for vSphere

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Promote vSphere control plane machinesets from tech preview to GA

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
Promotion PRs collectively pass payload testing

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task SPLAT-1400: bump config operator

View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/400

Feature OCPSTRAT-1066: Enable seamless collection of OpenShift console user telemetry

View the Description

Feature Overview (aka. Goal Summary)

Enable the OCP Console to send back user analytics to our existing endpoints in console.redhat.com. Please refer to doc for details of what we want to capture in the future:

Analytics Doc

Collect desired telemetry of user actions within OpenShift console to improve knowledge of user behavior.

Goals (aka. expected user outcomes)

OpenShift console should be able to send telemetry to a pre-configured Red Hat proxy that can be forwarded to 3rd party services for analysis.

Requirements (aka. Acceptance Criteria):

User analytics should respect the existing telemetry mechanism used to disable data being sent back

Need to update existing documentation with what we user data we track from the OCP Console: https://docs.openshift.com/container-platform/4.14/support/remote_health_monitoring/about-remote-health-monitoring.html

Capture and send desired user analytics from OpenShift console to Red Hat proxy

Red Hat proxy to forward telemetry events to appropriate Segment workspace and Amplitude destination

Use existing setting to opt out of sending telemetry: https://docs.openshift.com/container-platform/4.14/support/remote_health_monitoring/opting-out-of-remote-health-reporting.html#opting-out-remote-health-reporting

Also, allow just disabling user analytics without affecting the rest of telemetry: Add annotation to the Console to disbale just user analytics

Update docs to show this method as well.

We will require a mechanism to store all the segment values
We need to be able to pass back orgID that we receive from the OCM subscription API call

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Sending telemetry from OpenShift cluster nodes

Background

Console already has support for sending analytics to segment.io in Dev Sandbox and OSD environments. We should reuse this existing capability, but default to http://console.redhat.com/connections/api for analytics and http://console.redhat.com/connections/cdn to load the JavaScript in other environments. We must continue to allow Dev Sandbox and OSD clusters a way to configure their own segment key, whether telemetry is enabled, segment API host, and other options currently set as annotations on the console operator configuration resource.

Console will need a way to determine the org-id to send with telemetry events. Likely the console operator will need to read this from the cluster pull secret.

Customer Considerations

Documentation Considerations

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

Epic ODC-7460: Configure URL and destination of console telemetry for non-Sandbox clusters

View the Description

Problem:

The console telemetry plugin needs to send data to a new Red Hat ingress point that will then forward it to Segment for analysis.

Goal:

Update console telemetry plugin to send data to the appropriate ingress point.

Why is it important?

Use cases:

<case>

Acceptance criteria:

Update the URL to the Ingress point created for console.redhat.com
Ensure telemetry data is flowing to the ingress point.

Dependencies (External/Internal):

Ingress point created for console.redhat.com

Design Artifacts:

Exploration:

Note:

Story CONSOLE-3943: Configure and load default Segment Api Key and proxy

View the Description View the linked PRs

We want to enable segment analytics by default on all (incl. self-managed) OCP clusters using a known segment API key and the console.redhat.com proxy. We'll still want to allow to honor the segment-related annotations on the console operator config for overriding these values.

Most likely the segment key should be defaulted in the console operator, otherwise we would need a separate console flag for disabling analytics. If the operator provides the key, then the console backend can use the presence of the key to determine when to enable analytics. We can likely change the segment URL and CDN default values directly in the console code, however.

~~ODC-7517~~ tracks disabling segment analytics when cluster telemetry is disabled, which is a separate change, but required for this work.

OpenShift UI Telemetry Implementation details

This three keys should have new default values:

SEGMENT_API_KEY
SEGMENT_API_HOST
SEGMENT_JS_HOST OR SEGMENT_JS_URL

See https://github.com/openshift/console/blob/master/frontend/packages/console-telemetry-plugin/src/listeners/segment.ts

Defaults:

stringData:
  SEGMENT_API_KEY: BnuS1RP39EmLQjP21ko67oDjhbl9zpNU
  SEGMENT_API_HOST: console.redhat.com/connections/api/v1
  SEGMENT_JS_HOST: console.redhat.com/connections/cdn

AC:

Add default values for telemetry annotations into a CM in the openshift-console-operator NS.
Add and update unit tests and add e2e tests

https://github.com/openshift/console-operator/pull/890

Story CONSOLE-3944: Disable segment analytics when cluster telemetry is disabled

View the Description View the linked PRs

Description

As an administrator, I want to disable all telemetry on my cluster including UI analytics sent to segment.

We should honor the existing telemetry configuration so that we send no analytics when an admin opts out of telemetry. See the documentation here:

https://docs.openshift.com/container-platform/4.14/support/remote_health_monitoring/opting-out-of-remote-health-reporting.html#insights-operator-new-pull-secret_opting-out-remote-health-reporting

Simon Pasquier

yes this is the official supported way to disable telemetry though we also have a hidden flag in the CMO configmap that CI clusters use to disable telemetry (it depends if you want to push analytics for CI clusters).
CMO configmap is set to
data:
config.yaml: |-
telemeterClient:
enabled: false

the CMO code that reads the cloud.openshift.com token:
https://github.com/openshift/cluster-monitoring-operator/blob/b7e3f50875f2bb1fed912b23fb80a101d3a786c0/pkg/manifests/config.go#L358-L386

Acceptance Criteria

No segment events are sent when
1. Cluster is a CI cluster, which means at least one of these 2 conditions are met:
  - Cluster pull secret does not have "cloud.openshift.com" credentials [1]
  - Cluster monitoring config has 'telemeterClient: {enabled: false}' [2]
2. Console operator config telemetry disabled annotation == true [3]
Add and update unit tests and also e2e

Additional Details:

Slack discussion https://redhat-internal.slack.com/archives/C0VMT03S5/p1707753976034809

# [1] Check cluster pull secret for cloud.openshift.com creds
oc get secret pull-secret -n openshift-config -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq -r '.auths."cloud.openshift.com"'

# [2] Check cluster monitoring operator config for 'telemeterClient.enabled == false'
oc get configmap cluster-monitoring-config -n openshift-monitoring | jq -r '.data."config.yaml"'

# [3] Check console operator config telemetry disabled annotation 
oc get console.operator.openshift.io cluster -o json | jq -r '.metadata.annotations."telemetry.console.openshift.io/DISABLED"'

https://github.com/openshift/console-operator/pull/877

Epic CONSOLE-3929: Enable User Analytics from Console: Ability to handle/Pass ORG ID to Segment.io

View the Description

Epic Goal

Console should receive “organization.external_id” from the OCM Subscription API call.
We need to store the ORG ID locally and pass it back to Segment.io to enhance the user analytics we are tracking

API the Console uses:
const apiUrl = `/api/accounts_mgmt/v1/subscriptions?page=1&search=external_cluster_id%3D%27${clusterID}%27`;

Reference: Original Console PR

Why is this important?

High Level Feature Details can be found here

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CONSOLE-4014: Pass OCM organization ID to Segment

View the Description View the linked PRs

Currently, we can get the organization ID from the OCM server by querying subscription and adding the fetchOrganization=true query parameter based on the comment.

We should be passing this ID as SERVER_FLAG.telemetry.ORGANIZATION_ID to the frontend, and as organizationId to Segment.io

Fetching should be done by the console-operator due to its RBAC permissions. Once the Organization_ID is retrieved, console operator should set it on the console-config.yaml, together with other telemetry variables.

AC:

Update console-operator's main controller to check if the telemeter client is available on the cluster, which signalises that its a customer/pro cluster
Consume the telemetry parameter ORG_ID and pass it as parameter to segment in the console

Feature OCPSTRAT-1093: Support for VolumeGroup Snapshots (TP)

View the Description

Feature Overview (aka. Goal Summary)

Volume Group Snapshots is a key new Kubernetes storage feature that allows multiple PVs to be grouped together and snapshotted at the same time. This enables customers to takes consistent snapshots of applications that span across multiple PVs.

This is also a key requirement for backup and DR solutions.

https://kubernetes.io/blog/2023/05/08/kubernetes-1-27-volume-group-snapshot-alpha/

https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3476-volume-group-snapshot

Goals (aka. expected user outcomes)

Productise the volume group snapshots feature as tech preview have docs, testing as well as a feature gate to enable it in order for customers and partners to test it in advance.

Requirements (aka. Acceptance Criteria):

The feature should be graduated beta upstream to become TP in OCP. Tests and CI must pass and a feature gate should allow customers and partners to easily enable it. We should identify all OCP shipped CSI drivers that support this feature and configure them accordingly.

Use Cases (Optional):

As a storage vendor I want to have early access to the VolumeGroupSnapshot feature to test and validate my driver support.
As a backup vendor I want to have early access to the VolumeGroupSnapshot feature to test and validate my backup solution.
As a customer I want early access to test the VolumeGroupSnapshot feature in order to take consistent snapshots of my workloads that are relying on multiple PVs.

Out of Scope

CSI drivers development/support of this feature.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Drivers must support this feature and enable it. Partners may need to change their operator and/or doc to support it.

Documentation Considerations

Document how to enable the feature, what this feature does and how to use it. Update the OCP driver's table to include this capability.

Interoperability Considerations

Can be leveraged by ODF and OCP virt, especially around backup and DR scenarios.

Epic STOR-1700: OCP feature gate for VolumeGroupSnapshot API

View the Description View the linked PRs

Epic Goal*

Create an OCP feature gate that allows customers and parterns to VolumeGroupSnapshot feature while the feature is in alpha & beta upstream.

Why is this important? (mandatory)

Volume group snapshot is an important feature for ODF, OCP virt and backup partners. It requires driver support so partners need early access to the feature to confirm their driver works as expected before GA. The same applies to backup partners.

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

As a storage vendor I want to have early access to the VolumeGroupSnapshot feature to test and validate my driver support.
As a backup vendor I want to have early access to the VolumeGroupSnapshot feature to test and validate my backup solution.

Dependencies (internal and external) (mandatory)

This depends on the driver's support, the feature gate will enable it in the drivers that support it (OCP shipped drivers).

The feature gate should

Configure the snapshotter to start with the right parameter to enable VolumeGroupSnapshot
Create the necessary CRDs
Configure the OCP shipped CSI driver

Contributing Teams(and contacts) (mandatory)

Development - STOR
Documentation - N/A
QE - STOR
PX -
Others -

Acceptance Criteria (optional)

By enabling the feature gate partners should be able to use the VolumeGroupSnapshot API. Non OCP shipped drivers may need to be configured.

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

https://github.com/openshift/cluster-config-operator/pull/405

Feature OCPSTRAT-220: Console: Customer Happiness (RFEs) for 4.14

View the Description

Feature Overview

Console enhancements based on customer RFEs that improve customer user experience.

Goals

This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?

CI - MUST be running successfully with test automation

This is a requirement for ALL features.

YES

Release Technical Enablement

Provide necessary release enablement details and documents.

YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories

Alternate flow/scenarios - high-level user stories

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?

Does this feature have doc impact?

New Content, Updates to existing content, Release Note, or No Doc Impact

If unsure and no Technical Writer is available, please contact Content Strategy.

What concepts do customers need to understand to be successful in [action]?

How do we expect customers will use the feature? For what purpose(s)?

What reference material might a customer want/need to complete [action]?

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.

What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-3361: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-3624: Increase proxy timeout

View the Description View the linked PRs

Based on https://issues.redhat.com/browse/RFE-3775 we should be extending our proxy package timeout to match the browser's timeout, which is 5 minutes.

AC: Bump the 30second timeout in the proxy pkg to 5 minutes

https://github.com/openshift/console-operator/pull/777

Feature OCPSTRAT-427: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic NP-41: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View Demos

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story NP-793: Add new API for SDN live migration

View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/375

Feature OCPSTRAT-453: Sigstore runtime validation support in OpenShift

View the Description

Executive Summary

Image and artifact signing is a key part of a DevSecOps model. The Red Hat-sponsored sigstore project aims to simplify signing of cloud-native artifacts and sees increasing interest and uptake in the Kubernetes community. This document proposes to incrementally invest in OpenShift support for sigstore-style signed images and be public about it. The goal is to give customers a practical and scalable way to establish content trust. It will strengthen OpenShift’s security philosophy and value-add in the light of the recent supply chain security crisis.

CRIO

Support customer image validation
Support OpenShift release image validation

https://docs.google.com/document/d/12ttMgYdM6A7-IAPTza59-y2ryVG-UUHt-LYvLw4Xmq8/edit#

Epic OCPNODE-1628: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPNODE-1671: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-config-operator/pull/376

Feature OCPSTRAT-46: Strategic Upstream Work - OCP Control Plane and Node Lifecycle Group

View the Description

Feature Overview (aka. Goal Summary)

This feature will track upstream work from the OpenShift Control Plane teams - API, Auth, etcd, Workloads, and Storage.

Goals (aka. expected user outcomes)

To continue and develop meaningful contributions to the upstream community including feature delivery, bug fixes, and leadership contributions.

Epic WRKLDS-699: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story WRKLDS-757: MatchLabelKeysInPodTopologySpread promoted to beta upstream, dropping from TechPreviewNoUpgrade

View the Description View the linked PRs

From https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#topologyspreadconstraints-field:

Note: The matchLabelKeys field is a beta-level field and enabled by default in 1.27. You can disable it by disabling the MatchLabelKeysInPodTopologySpread [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/).

Removing from the TP as the feature is enabled by default.

Just a clean up work.

https://github.com/openshift/cluster-config-operator/pull/322

Feature OCPSTRAT-487: Pod Security Admission Integration - Restricted Enforcement

View the Description

Upstream K8s deprecated PodSecurityPolicy and replaced it with a new built-in admission controller that enforces the Pod Security Standards (See here for the motivations for deprecation).] There is an OpenShift-specific dedicated pod admission system called Security Context Constraints. Our aim is to keep the Security Context Constraints pod admission system while also allowing users to have access to the Kubernetes Pod Security Admission.

With OpenShift 4.11, we are turned on the Pod Security Admission with global "privileged" enforcement. Additionally we set the "restricted" profile for warnings and audit. This configuration made it possible for users to opt-in their namespaces to Pod Security Admission with the per-namespace labels. We also introduced a new mechanism that automatically synchronizes the Pod Security Admission "warn" and "audit" labels.

With OpenShift 4.15, we intend to move the global configuration to enforce the "restricted" pod security profile globally. With this change, the label synchronization mechanism will also switch into a mode where it synchronizes the "enforce" Pod Security Admission label rather than the "audit" and "warn".

Epic AUTH-262: Pod Security Admission Integration - Restricted Enforcement

View the Description

Epic Goal

Get Pod Security admission to be run in "restricted" mode globally by default alongside with SCC admission.

Story AUTH-482: SCC pinning for all workloads in platform namespaces

View the Description View the linked PRs

When creating a custom SCC, it is possible to assign a priority that is higher than existing SCCs. This means that any SA with access to all SCCs might use the higher priority custom SCC, and this might mutate a workload in an unexpected/unintended way.

To protect platform workloads from such an effect (which, combined with PSa, might result in rejecting the workload once we start enforcing the "restricted" profile) we must pin the required SCC to all workloads in platform namespaces (openshift-, kube-, default).

Each workload should pin the SCC with the least-privilege, except workloads in runlevel 0 namespaces that should pin the "privileged" SCC (SCC admission is not enabled on these namespaces, but we should pin an SCC for tracking purposes).

The following table tracks progress:

namespace	in review	merged
openshift-apiserver-operator	PR
openshift-authentication	PR
openshift-authentication-operator	PR
openshift-catalogd	PR
openshift-cloud-controller-manager
openshift-cloud-controller-manager-operator
openshift-cloud-credential-operator	PR
openshift-cloud-network-config-controller	PR
openshift-cluster-csi-drivers	PR1, PR2
openshift-cluster-machine-approver
openshift-cluster-node-tuning-operator	PR
openshift-cluster-olm-operator	PR
openshift-cluster-samples-operator	PR
openshift-cluster-storage-operator	PR1, PR2
openshift-cluster-version	PR
openshift-config-operator	PR
openshift-console	PR
openshift-console-operator	PR
openshift-controller-manager	PR
openshift-controller-manager-operator	PR
openshift-dns
openshift-dns-operator
openshift-etcd
openshift-etcd-operator
openshift-image-registry	PR
openshift-ingress	PR
openshift-ingress-canary	PR
openshift-ingress-operator	PR
openshift-insights	PR
openshift-kube-apiserver
openshift-kube-apiserver-operator
openshift-kube-controller-manager
openshift-kube-controller-manager-operator
openshift-kube-scheduler
openshift-kube-scheduler-operator
openshift-kube-storage-version-migrator	PR
openshift-kube-storage-version-migrator-operator	PR
openshift-machine-api	PR1, PR2, PR3, PR4, PR5, PR6
openshift-machine-config-operator	PR
openshift-marketplace	PR
openshift-monitoring	PR
openshift-multus
openshift-network-diagnostics	PR
openshift-network-node-identity	PR
openshift-network-operator
openshift-oauth-apiserver	PR
openshift-operator-controller	PR
openshift-operator-lifecycle-manager	PR
openshift-ovn-kubernetes
openshift-route-controller-manager	PR
openshift-service-ca	PR
openshift-service-ca-operator	PR
openshift-user-workload-monitoring	PR

Bug OCPBUGS-28621: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Feature OCPSTRAT-523: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic NE-463: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CFE-887: As a developer, I want to vendor openshift/api changes for the new CRD to openshift/cluster-config-operator

View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/364

Feature OCPSTRAT-790: prevent workload to be schedule in master node

View the Description

Feature Overview (aka. Goal Summary)

As an openshift admin ,who wants to make my OCP more secure and stable . I want to prevent anyone to schedule their workload in master node so that master node only run OCP management related workload .

Goals (aka. expected user outcomes)

secure OCP master node by preventing scheduling of customer workload in master node

Epic WRKLDS-1015: prevent application workload to be schedule to master nodes

View the Description

Anyone applying toleration(s) in a pod spec can unintentionally tolerate master taints which protect master nodes from receiving application workload when master nodes are configured to repel application workload. An admission plugin needs to be configured to protect master nodes from this scenario. Besides the taint/toleration, users can also set spec.NodeName directly, which this plugin should also consider protecting master nodes against.

Story WRKLDS-1169: kube-controller-manager pods need to tolerate node-role.kubernetes.io/control-plane:NoExecute

View the Description View the linked PRs

Needed so we can provide this workflow to customers following the proposal at https://github.com/openshift/enhancements/pull/1583

Reference https://issues.redhat.com/browse/WRKLDS-1015

kube-controller-manager pods are created by code residing in controllers provided by the kube-controler-manager operator. So changes are required in that repo to add a toleration to the node-role.kubernetes.io/control-plane:NoExecute taint.

https://github.com/openshift/cluster-kube-controller-manager-operator/blob/bdb42fd60a64adf6f98cc7be4d90a76ea30662d6/manifests/0000_25_kube-controller-manager-operator_06_deployment.yaml#L108

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/802

Feature OCPSTRAT-924: Console: Customer Happiness (RFEs) for 4.16

View the Description

Feature Overview

Console enhancements based on customer RFEs that improve customer user experience.

Goals

This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?

CI - MUST be running successfully with test automation

This is a requirement for ALL features.

YES

Release Technical Enablement

Provide necessary release enablement details and documents.

YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories

Alternate flow/scenarios - high-level user stories

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?

Does this feature have doc impact?

New Content, Updates to existing content, Release Note, or No Doc Impact

If unsure and no Technical Writer is available, please contact Content Strategy.

What concepts do customers need to understand to be successful in [action]?

How do we expect customers will use the feature? For what purpose(s)?

What reference material might a customer want/need to complete [action]?

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.

What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-3782: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-3910: Document how to impersonate user via Console

View the Description View the linked PRs

Based on the this old feature request

https://issues.redhat.com/browse/RFE-1530

we do have impersonation in place for gaining access to other user's permissions via the console. But the only documentation we currently have is how to impersonate system:admin via the CLI see

https://docs.openshift.com/container-platform/4.14/authentication/impersonating-system-admin.html

Please provide documentation for the console feature and the required prerequisites for the users/groups accordingly.

AC:

Create a ConsoleQuickStart CR that would help user with impersonate access understand the impersonation workflow and functionality
This quickstart should be available only to user with impersonate
Created CR should be placed in the console-operator's repo, where the default quickstarts are placed

More info on the impersonate access role - https://github.com/openshift/console/pull/13345/files

https://github.com/openshift/console-operator/pull/865

Feature OCPSTRAT-942: Console needs to be functional with external oidc token issuer

View the Description

Feature Overview (aka. Goal Summary)

When the internal oauth-server and oauth-apiserver are removed and replaced with an external OIDC issuer (like azure AD), the console must work for human users of the external OIDC issuer.

Goals (aka. expected user outcomes)

An end user can use the openshift console without a notable difference in experience. This must eventually work on both hypershift and standalone, but hypershift is the first priority if it impacts delivery

Requirements (aka. Acceptance Criteria):

User can log in and use the console
User can get a kubeconfig that functions on the CLI with matching oc
Both of those work on hypershift
both of those work on standalone.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Interoperability Considerations

Epic CONSOLE-3804: console operator must stop creating oauthclients when API is not present

View the Description View the linked PRs

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

When the oauthclient API is not present, the operator must stop creating the oauthclient

Why is this important?

This is preventing the operator from creating its deployment

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CONSOLE-3912: Configure console in external OIDC IDP environments

View the Description View the linked PRs

Console needs to be able to auth agains external OIDC IDP. For that console-operator need to set configure it in that order.

AC:

bump console-operator API pkg
sync oauth client secret from OIDC configuration for auth type OIDC
add OIDC config to the console configmap
add auth server CA to the deployment annotations and volumes when auth type OIDC
consume OIDC configuration in the console configmap and deployment
fix roles for oauthclients and authentications watching

Epic CONSOLE-3805: console operator must accept clientID and secret

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

When installed with external OIDC, the clientID and clientSecret need to be configurable to match the external (and unmanaged) OIDC server

Why is this important?

Without a configurable clientID and secret, I don't think the console can identify the user.
There must be a mechanism to do this on both hypershift and openshift, though the API may be very similar.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task HOSTEDCP-1250: make console-operator pass OIDC client config flags

View the linked PRs

https://github.com/openshift/console-operator/pull/801

Feature OCPSTRAT-974: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic WRKLDS-1016: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/324

Epic OCPNODE-1886: [OCP Rebase] Rebase OCP control plane with Kubernetes v1.29

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Goal of this epic is to capture all the amount of required work and efforts that take to update the openshift control plane with the upstream kubernetes v1.29

Why is this important?

Rebase is a must process for every ocp release to leverage all the new features implemented upstream

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Following epic captured the previous rebase work of k8s v1.28
https://issues.redhat.com/browse/STOR-1425

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPNODE-1890: Bump api, dependent libraries

View the Description View the linked PRs

Bump the following libraries in an order with the latest kube and the dependent libraries

openshift/api
openshift/client-go
openshift/library-go
openshift/apiserver-library-go
kube-openapi (if required)

Prev Ref:
https://github.com/openshift/api/pull/1534
https://github.com/openshift/client-go/pull/250
https://github.com/openshift/library-go/pull/1557
https://github.com/openshift/apiserver-library-go/pull/118

https://github.com/openshift/kube-openapi/pull/7

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/770

Feature TELCOSTRAT-18: CPU Manager: mix of exclusive and shared CPUs for a container

View the Description

Problem statement

DPDK applications require dedicated CPUs, and isolated any preemption (other processes, kernel threads, interrupts), and this can be achieved with the “static” policy of the CPU manager: the container resources need to include an integer number of CPUs of equal value in “limits” and “request”. For instance, to get six exclusive CPUs:

spec:

  containers:

  - name: CNF

    image: myCNF

    resources:

      limits:

        cpu: "6"

      requests:

        cpu: "6"

The six CPUs are dedicated to that container, however non trivial, meaning real DPDK applications do not use all of those CPUs as there is always at least one of the CPU running a slow-path, processing configuration, printing logs (among DPDK coding rules: no syscall in PMD threads, or you are in trouble). Even the DPDK PMD drivers and core libraries include pthreads which are intended to sleep, they are infrastructure pthreads processing link change interrupts for instance.

Can we envision going with two processes, one with isolated cores, one with the slow-path ones, so we can have two containers? Unfortunately no: going in a multi-process design, where only dedicated pthreads would run on a process is not an option as DPDK multi-process is going deprecated upstream and has never picked up as it never properly worked. Fixing it and changing DPDK architecture to systematically have two processes is absolutely not possible within a year, and would require all DPDK applications to be re-written. Knowing that the first and current multi-process implementation is a failure, nothing guarantees that a second one would be successful.

The slow-path CPUs are only consuming a fraction of a real CPU and can safely be run on the “shared” CPU pool of the CPU Manager, however containers specifications do not accept to request two kinds of CPUs, for instance:

spec:

  containers:

  - name: CNF

    image: myCNF

    resources:

      limits:

        cpu_dedicated: "4"

        cpu_shared: "20m"

      requests:

        cpu_dedicated: "4"

        cpu_shared: "20m"

Why do we care about allocating one extra CPU per container?

Allocating one extra CPU means allocating an additional physical core, as the CPUs running DPDK application should run on a dedicated physical core, in order to get maximum and deterministic performances, as caches and CPU units are shared between the two hyperthreads.

CNFs are built with a minimum of CPUs per container. This is still between 10 and 20, sometime more, today, but the intent is to decrease this number of CPU and increase the number of containers as this is the “cloud native” way to waste resources by having too large containers to schedule, like in the VNF days (tetris effect)

Let’s take a realistic example, based on a real RAN CNF: running 6 containers with dedicated CPUs on a worker node, with a slow Path requiring 0.1 CPUs means that we waste 5 CPUs, meaning 3 physical cores. With real life numbers:

For a single datacenter composed of 100 nodes, we waste 300 physical cores
For a single datacenter composed of 500 nodes, we waste 1500 physical cores
For a single node OpenShift deployed on 1 Millions of nodes, we waste 3 Millions of physical cores

Intel public CPU price per core is around 150 US$, not even taking into account the ecological aspect of the waste of (rare) materials and the electricity and cooling…

Goals

Implement an equivalent of Nokia CPU pooler, meaning a way to allocate dedicated and shared CPUs to a given container, and provide a way within the container to know which CPUs belong to which pool, so the CNF running in the container can properly pin its pthreads on the available CPUs.

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Questions to answer…

Would an implementation based on annotations be possible rather than an implementation requiring a container (so pod) definition change, like the CPU pooler does?

Out of Scope

Background, and strategic fit

This issue has been addressed lately by OpenStack.

Assumptions

Customer Considerations

Documentation Considerations

The feature needs documentation on how to configure OCP, create pods, and troubleshoot

Epic CNF-9117: Mixed-CPUs for container workloads implementation

View the Description

Epic Goal

An NRI plugin that invoked by CRI-O right before the container creation, and updates the container's cpuset and quota to match the mixed-cpus request.
The cpu pinning reconciliation operation must also execute the NRI API call on every update (so we can intercept kubelet and it does not destroy our changes)
Dev Preview for 4.15

Why is this important?

This would unblock lots of options including mixed cpu workloads where some CPUs could be shared among containers / pods ~~CNF-3706~~
This would also allow further research on dynamic (simulated) hyper threading ~~CNF-3743~~

Scenarios

Acceptance Criteria

Have an NRI plugin which called by the runtime and updates the container with mutual cpus.
The plugin must be able to override CPU manager conciliation loop and immune to future CPU manager changes.
The plugin must be robust and handle node reboot/kubelet/crio restart scenarios
upstream CI - MUST be running successfully with tests automated.
Release Technical Enablement - Provide necessary release enablement details and documents.
OCP adoption in relevant OCP version
NTO shall be able to deploy the new plugin

Dependencies (internal and external)

Previous Work (Optional):

https://issues.redhat.com/browse/CNF-3706 : Spike - mix of shared and pinned/dedicated cpus within a container
https://issues.redhat.com/browse/CNF-3743 : Spike: Dynamic offlining of cpu siblings to simulate no-smt
upstream Node Resource Interface project - https://github.com/containerd/nri
https://issues.redhat.com/browse/CNF-6082: [SPIKE] Cpus assigned hook point in CRI-O
https://issues.redhat.com/browse/CNF-7603

Open questions::

N/A

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CNF-10479: cluster-config: bump deps

View the Description View the linked PRs

bump cluster-config-operator to pull mixed-cpus feature-gate api

https://github.com/openshift/cluster-config-operator/pull/386

Feature TELCOSTRAT-38: Pre-Install SNO DU Deployments

View the Description

Feature Overview

Telecommunications providers continue to deploy OpenShift at the Far Edge. The acceleration of this adoption and the nature of existing Telecommunication infrastructure and processes drive the need to improve OpenShift provisioning speed at the Far Edge site and the simplicity of preparation and deployment of Far Edge clusters, at scale.

Goals

Simplicity The folks preparing and installing OpenShift clusters (typically SNO) at the Far Edge range in technical expertise from technician to barista. The preparation and installation phases need to be reduced to a human-readable script that can be utilized by a variety of non-technical operators. There should be as few steps as possible in both the preparation and installation phases.
Minimize Deployment Time A telecommunications provider technician or brick-and-mortar employee who is installing an OpenShift cluster, at the Far Edge site, needs to be able to do it quickly. The technician has to wait for the node to become in-service (CaaS and CNF provisioned and running) before they can move on to installing another cluster at a different site. The brick-and-mortar employee has other job functions to fulfill and can't stare at the server for 2 hours. The install time at the far edge site should be in the order of minutes, ideally less than 20m.
Utilize Telco Facilities Telecommunication providers have existing Service Depots where they currently prepare SW/HW prior to shipping servers to Far Edge sites. They have asked RH to provide a simple method to pre-install OCP onto servers in these facilities. They want to do parallelized batch installation to a set of servers so that they can put these servers into a pool from which any server can be shipped to any site. They also would like to validate and update servers in these pre-installed server pools, as needed.
Validation before Shipment Telecommunications Providers incur a large cost if forced to manage software failures at the Far Edge due to the scale and physical disparate nature of the use case. They want to be able to validate the OCP and CNF software before taking the server to the Far Edge site as a last minute sanity check before shipping the platform to the Far Edge site.
IPSec Support at Cluster Boot Some far edge deployments occur on an insecure network and for that reason access to the host’s BMC is not allowed, additionally an IPSec tunnel must be established before any traffic leaves the cluster once its at the Far Edge site. It is not possible to enable IPSec on the BMC NIC and therefore even OpenShift has booted the BMC is still not accessible.

Requirements

Factory Depot: Install OCP with minimal steps
- Telecommunications Providers don't want an installation experience, just pick a version and hit enter to install
- Configuration w/ DU Profile (PTP, SR-IOV, see telco engineering for details) as well as customer-specific addons (Ignition Overrides, MachineConfig, and other operators: ODF, FEC SR-IOV, for example)
- The installation cannot increase in-service OCP compute budget (don't install anything other that what is needed for DU)
- Provide ability to validate previously installed OCP nodes
- Provide ability to update previously installed OCP nodes
- 100 parallel installations at Service Depot
Far Edge: Deploy OCP with minimal steps
- Provide site specific information via usb/file mount or simple interface
- Minimize time spent at far edge site by technician/barista/installer
- Register with desired RHACM Hub cluster for ongoing LCM
Minimal ongoing maintenance of solution
- Some, but not all telco operators, do not want to install and maintain an OCP / ACM cluster at Service Depot
The current IPSec solution requires a libreswan container to run on the host so that all N/S OCP traffic is encrypted. With the current IPSec solution this feature would need to support provisioning host-based containers.

requirement	Notes	isMvp?

Describe Use Cases (if needed)

Telecommunications Service Provider Technicians will be rolling out OCP w/ a vDU configuration to new Far Edge sites, at scale. They will be working from a service depot where they will pre-install/pre-image a set of Far Edge servers to be deployed at a later date. When ready for deployment, a technician will take one of these generic-OCP servers to a Far Edge site, enter the site specific information, wait for confirmation that the vDU is in-service/online, and then move on to deploy another server to a different Far Edge site.

Retail employees in brick-and-mortar stores will install SNO servers and it needs to be as simple as possible. The servers will likely be shipped to the retail store, cabled and powered by a retail employee and the site-specific information needs to be provided to the system in the simplest way possible, ideally without any action from the retail employee.

Out of Scope

Q: how challenging will it be to support multi-node clusters with this feature?

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

< Are there Upgrade considerations that customers need to account for or that the feature should address on behalf of the customer?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

< What does success look like?>

< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact>

< If unsure and no Technical Writer is available, please contact Content Strategy. If yes, complete the following.>

<What concepts do customers need to understand to be successful in [action]?>

<How do we expect customers will use the feature? For what purpose(s)?>

<What reference material might a customer want/need to complete [action]?>

<Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available. >

<What is the doc impact (New Content, Updates to existing content, or Release Note)?>

Interoperability Considerations

< Which other products and versions in our portfolio does this feature impact?>

< What interoperability test scenarios should be factored by the layered product(s)?>

Questions

Question	Outcome

Epic MGMT-13418: Shorten SNO installation duration

View the Description

Epic Goal

Install SNO within 10 minutes

Why is this important?

SNO installation takes around 40+ minutes.
This makes SNO less appealing when compared to k3s/microshift.
We should analyze the SNO installation, figure our why it takes so long and come up with ways to optimize it

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

https://docs.google.com/document/d/1ULmKBzfT7MibbTS6Sy3cNtjqDX1o7Q0Rek3tAe1LSGA/edit?usp=sharing

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Bug OCPBUGS-6266: openshift-config namespace should get created earlier during bootkube

View the Description View the linked PRs

I'm not sure this is a CVO issue, but I think CVO is the one creating the namespace, CVO also renders some manifests during bootkube so it seems like the right component.

Description of problem:

The bootkube scripts spend ~1 minute failing to apply manifests while waiting fot eh openshift-config namespace to get created

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1.Run the POC using the makefile here https://github.com/eranco74/bootstrap-in-place-poc
2. Observe the bootkube logs (pre-reboot)

Actual results:

Jan 12 17:37:09 master1 cluster-bootstrap[5156]: Failed to create "0000_00_cluster-version-operator_01_adminack_configmap.yaml" configmaps.v1./admin-acks -n openshift-config: namespaces "openshift-config" not found
....
Jan 12 17:38:27 master1 cluster-bootstrap[5156]: "secret-initial-kube-controller-manager-service-account-private-key.yaml": failed to create secrets.v1./initial-service-account-private-key -n openshift-config: namespaces "openshift-config" not found

Here are the logs from another installation showing that it's not 1 or 2 manifests that require this namespace to get created earlier:

Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-ca-bundle-configmap.yaml": failed to create configmaps.v1./etcd-ca-bundle -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-client-secret.yaml": failed to create secrets.v1./etcd-client -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-metric-client-secret.yaml": failed to create secrets.v1./etcd-metric-client -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-metric-serving-ca-configmap.yaml": failed to create configmaps.v1./etcd-metric-serving-ca -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-metric-signer-secret.yaml": failed to create secrets.v1./etcd-metric-signer -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-serving-ca-configmap.yaml": failed to create configmaps.v1./etcd-serving-ca -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-signer-secret.yaml": failed to create secrets.v1./etcd-signer -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "kube-apiserver-serving-ca-configmap.yaml": failed to create configmaps.v1./initial-kube-apiserver-server-ca -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "openshift-config-secret-pull-secret.yaml": failed to create secrets.v1./pull-secret -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "openshift-install-manifests.yaml": failed to create configmaps.v1./openshift-install-manifests -n openshift-config: namespaces "openshift-config" not found
Jan 12 17:38:10 master1 bootkube.sh[5121]: "secret-initial-kube-controller-manager-service-account-private-key.yaml": failed to create secrets.v1./initial-service-account-private-key -n openshift-config: namespaces "openshift-config" not found

Expected results:

expected resources to get created successfully without having to wait for the namespace to get created.

Additional info:

https://github.com/openshift/cluster-config-operator/pull/303

Task API-1525: service-ca should start faster when installing SNO

View the Description View the linked PRs

openshift- service-ca service-ca pod takes a few minutes to start when installing SNO

kubectl get events -n openshift-service-ca --sort-by='.metadata.creationTimestamp' -o custom-columns=FirstSeen:.firstTimestamp,LastSeen:.lastTimestamp,Count:.count,From:.source.component,Type:.type,Reason:.reason,Message:.message                      
FirstSeen              LastSeen               Count   From                                                                                              Type      Reason                 Message
2023-01-22T12:25:58Z   2023-01-22T12:25:58Z   1       deployment-controller                                                                             Normal    ScalingReplicaSet      Scaled up replica set service-ca-6dc5c758d to 1
2023-01-22T12:26:12Z   2023-01-22T12:27:53Z   9       replicaset-controller                                                                             Warning   FailedCreate           Error creating: pods "service-ca-6dc5c758d-" is forbidden: error fetching namespace "openshift-service-ca": unable to find annotation openshift.io/sa.scc.uid-range
2023-01-22T12:27:58Z   2023-01-22T12:27:58Z   1       replicaset-controller                                                                             Normal    SuccessfulCreate       Created pod: service-ca-6dc5c758d-k7bsd
2023-01-22T12:27:58Z   2023-01-22T12:27:58Z   1       default-scheduler                                                                                 Normal    Scheduled              Successfully assigned openshift-service-ca/service-ca-6dc5c758d-k7bsd to master1

Seems that creating the serivce-ca namespace early allows it to get
openshift.io/sa.scc.uid-range annotation and start running earlier, the
service-ca pod is required for other pods (CVO and all the control plane pods) to start since it's creating the serving-cert

https://github.com/openshift/service-ca-operator/pull/208

Bug OCPBUGS-7440: kube-controller-manager cluster operator is degraded due connection refused while querying rules

View the Description View the linked PRs

Description of problem:

while trying to figure out why it takes so long to install Single node OpenShift I noticed that the kube-controller-manager cluster operator is degraded for ~5 minutes due to:
GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused
I don't understand how the prometheusClient is successfully initialized, but we get a connection refused once we try to query the rules.
Note that if the client initialization fails the kube-controller-manger won't set the  GarbageCollectorDegraded to true.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. install SNO with bootstrap in place (https://github.com/eranco74/bootstrap-in-place-poc)

2. monitor the cluster operators staus

Actual results:

GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused

Expected results:

Expected the GarbageCollectorDegraded status to be false

Additional info:

It seems that for PrometheusClient to be successfully initialised it needs to successfully create a connection but we get connection refused once we make the query.
Note that installing SNO with this patch (https://github.com/eranco74/cluster-kube-controller-manager-operator/commit/26e644503a8f04aa6d116ace6b9eb7b9b9f2f23f) reduces the installation time by 3 minutes

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/706

Epic AUTH-20: Client Cert based Metrics Scraping

View the Description

Monitoring needs to be reliable and is the very useful when trying to debug clusters in an already degraded state. We want to ensure that metrics scraping can always work if the scraper can reach the target, even if the kube-apiserver is unavailable or unreachable. To do this, we will combine a local authorizer (already merged in many binaries and the rbac-proxy) and client-cert based authentication to have a fully local authentication and authorization path for scraper targets.

If networking (or part of networking) is down and a scraper target cannot reach the kube-apiserver to verify a token and a subjectaccessreview, then the metrics scraper can be rejected. The subjectaccessreview (authorization) is already largely addressed, but service account tokens are still used for scraping targets. Tokens require an external network call that we can avoid by using client certificates. Gathering metrics, especially client metrics, from partially functionally clusters helps narrow the search area between kube-apiserver, etcd, kubelet, and SDN considerably.

In addition, this will significantly reduce the load on the kube-apiserver. We have observed in the CI cluster that token and subject access reviews are a significant percentage of all kube-apiserver traffic.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story AUTH-26: cluster-policy-controller: approve CSRs issued by monitoring

View the Description View the linked PRs

User story:

As cluster-policy-controller I automatically approve cert signing requests issued by monitoring.

DoD:

cert signing requests issued by the cluster-monitoring-operator service account are approved automatically.

Implementation hints: leverage approving logic implemented in https://github.com/openshift/library-go/pull/1083.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/535

Epic BUILD-144: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story BUILD-125: Improve recording of imagestream import results

View the Description View the linked PRs

User Story

Sample operator use of the reason field in its config object to track imagestream import completion has resulted in that singleton being a bottleneck and source of update conflicts (we are talking 60 or 70 imagestreams potentially updating that once field concurrently).

Acceptance Criteria

Reduction in reconciliation errors when imagestream imports complete (success or failure).
ConfigMaps containing failing imagstream imports are added to must-gather
After all imagestreams successfully import, there are no error-recording ConfigMaps in the samples-operator namespace.

Notes

See this hackday PR

for an alternative approach which uses a configmap per imagestream

https://github.com/openshift/cluster-samples-operator/pull/313

Story BUILD-145: Fix usability with mirroring sample images in disconnected environments

View the Description View the linked PRs

User Story

As an cluster administrator of a disconnected OCP cluster,
I want a list of possible sample images to mirror
So that I can configure my image mirror prior to installing OCP in a disconnected environment.

Acceptance Criteria

Publish the list of the sample images to mirror as a ConfigMap in the samples operator namespace.
Provide instructions on how to obtain the current image SHAs from the list above (via podman or skopeo).
Reference the ConfigMap name in our "import failing" alert.
[optional] Reference the ConfigMap name in our "Removed" condition message.

Notes

it is too onerous to find a connected cluster in order to obtain the list of possible samples images to mirror using the current documented procedures.
I need a list make available to me in my disconnected cluster that I can reference after initial install.

https://github.com/openshift/cluster-samples-operator/pull/321

Epic BUILD-529: Kubernetes 1.25 Rebases

View the Description

Epic Goal

Update OpenShift components that are owned by the Builds + Jenkins Team to use Kubernetes 1.25

Why is this important?

Our components need to be updated to ensure that they are using the latest bug/CVE fixes, features, and that they are API compatible with other OpenShift components.

Acceptance Criteria

Existing CI/CD tests must be passing

Story OCPBUILD-146: Rebase openshift/cluster-openshift-controller-manager-operator to k8s 1.25

Epic CONSOLE-2216: Convert existing test to Cypress e2e tests

View the Description

We should expand the set of Cypress e2e tests we started in 4.6. This can either be new tests, or migrating some of our more flaky protractor tests.

Story CONSOLE-2502: Migrate the remaining CRUD tests to Cypress

View the Description

We still have some tests remaining in the Protractor CRUD scenario:

https://github.com/openshift/console/blob/master/frontend/integration-tests/tests/crud.scenario.ts

We should migrate these to Cypress.

Acceptance Criteria

All remaining CRUD tests have been migrated to Cypress and are running in CI.
The old CRUD scenario is removed from Protractor.
The new CRUD tests are running as part of the periodic release tests against console (should not require code change).

Sub-task CONSOLE-2521: Convert RoleBinding tests

View the linked PRs

https://github.com/openshift/console/pull/7756

Sub-task CONSOLE-2522: Convert CRD tests

View the linked PRs

https://github.com/openshift/console/pull/7827

Sub-task CONSOLE-2530: Convert labels tests

View the Description View the linked PRs

Convert labels tests to Cypress.

https://github.com/openshift/console/pull/7834

Story CONSOLE-2397: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Sub-task CONSOLE-2485: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/7261

Epic CONSOLE-2275: 4.7 Console Dependencies & Tech Debt

View the Description

An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.

Story CONSOLE-2296: Bump dependencies for front end code for 4.7

View the Description View the linked PRs

It has been a few releases since we've gone through and updated the various frontend dependencies for console. We should update major dependencies such as TypeScript, React, and webpack.

We should consider updating our TypeScript typings as well, which have gotten out of date.

https://github.com/openshift/console/pull/7040

Sub-task CONSOLE-2423: Upgrade YAML language server to 0.10.1

View the linked PRs

https://github.com/openshift/console/pull/6483

Sub-task CONSOLE-2399: Upgrade yarn to 1.22.5

View the Description View the linked PRs

We should update yarn to the latest version.

https://github.com/openshift/console/pull/7280

Epic CONSOLE-2512: 4.8 Console Dependencies & Tech Debt

View the Description

An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.

Story CONSOLE-2366: Move the console-operator controllers to use library-go controller factory

View the Description View the linked PRs

Console operator should swap from using monis.app to openshift/operator-boilerplate-legacy. This will allow switching to klog/v2, which the shared libs (api,client-go,library-go) have already done.

https://github.com/openshift/console-operator/pull/500

Epic CONSOLE-3059: OCP 4.11 - Dynamic Plugins Epic

View the Description

This epic contains all the Dynamic Plugins related stories for OCP release-4.11

Epic Goal

Track all the stories under a single epic

Acceptance Criteria

Story CONSOLE-3162: Implement check for the new i18n annotation for dynamic plugins

View the Description View the linked PRs

In the 4.11 release, a console.openshift.io/default-i18next-namespace annotation is being introduced. The annotation indicates whether the ConsolePlugin contains localization resources. If the annotation is set to "true", the localization resources from the i18n namespace named after the dynamic plugin (e.g. plugin__kubevirt), are loaded. If the annotation is set to any other value or is missing on the ConsolePlugin resource, localization resources are not loaded.

In case these resources are not present in the dynamic plugin, the initial console load will be slowed down. For more info check BZ#2015654

AC:

console-operator should be checking for the new console.openshift.io/use-i18n annotation, update the console-config.yaml accordingly and redeploy the console server
console server should pick up the changes in the console-config.yaml and only load the i18n namespace that are available

Follow up of https://issues.redhat.com/browse/CONSOLE-3159

https://github.com/openshift/console-operator/pull/654

Epic CONSOLE-3174: OCP 4.12 - Dynamic Plugins Epic -GA

View the Description

This epic contains all the Dynamic Plugins related stories for OCP release-4.12

Epic Goal

Track all the stories under a single epic

Acceptance Criteria

Story CONSOLE-3069: Promote ConsolePlugins API version to v1

View the Description View the linked PRs

Currently the ConsolePlugins API version is v1alpha1. Since we are going GA with dynamic plugins we should be creating a v1 version.

This would require updates in following repositories:

openshift/api (add the v1 version and generate a new CRD)
openshift/client-go (picku the changes in the openshift/api repo and generate clients & informers for the new v1 version)
openshift/console-operator repository will using both the new v1 version and v1alpha1 in code and manifests folder.

AC:

both v1 and v1alpha1 ConsolePlugins should be passed to the console-config.yaml when the plugins are enabled and present on the cluster.

NOTE: This story does not include the conversion webhook change which will be created as a follow on story

https://github.com/openshift/console-operator/pull/683

Bug CONSOLE-3337: When defining more then one proxy in dynamic plugin, only the last one works

View the Description View the linked PRs

when defining two proxy endpoints,
apiVersion: console.openshift.io/v1alpha1
kind: ConsolePlugin
metadata:
...
name: forklift-console-plugin
spec:
displayName: Console Plugin Template
proxy:

alias: forklift-inventory
authorize: true
service:
name: forklift-inventory
namespace: konveyor-forklift
port: 8443
type: Service
alias: forklift-must-gather-api
authorize: true
service:
name: forklift-must-gather-api
namespace: konveyor-forklift
port: 8443
type: Service

service:
basePath: /
I get two proxy endpoints
/api/proxy/plugin/forklift-console-plugin/forklift-inventory
and
/api/proxy/plugin/forklift-console-plugin/forklift-must-gather-api

but both proxy to the `forklift-must-gather-api` service

e.g.
curl to:
[server url]/api/proxy/plugin/forklift-console-plugin/forklift-inventory
will point to the `forklift-must-gather-api` service, instead of the `forklift-inventory` service

https://github.com/openshift/console-operator/pull/691

Epic CONSOLE-3189: OCP 4.12 - OLM Epic

View the Description

This epic contains all the OLM related stories for OCP release-4.12

Epic Goal

Track all the stories under a single epic

Story CONSOLE-3242: Heterogeneous architecture clusters - Console backend changes

View the Description View the linked PRs

This enhancement Introduces support for provisioning and upgrading heterogenous architecture clusters in phases.

We need to scan through the compute nodes and build a set of supported architectures from those. Each node on the cluster has a label for architecture: e.g. kubernetes.io/arch=arm64, kubernetes.io/arch=amd64 etc. Based on the set of supported architectures console will need to surface only those operators in the Operator Hub, which are supported on our Nodes.

AC:

Implement logic in the console-operator that will scan though all the nodes and build a set of all the architecture types that the cluster nodes run on and pass it to the console-config.yaml
Add unit and e2e test cases in the console-operator repository.

@jpoulin is good to ask about heterogeneous clusters.

https://github.com/openshift/console-operator/pull/669

Epic CONSOLE-3587: 4.15 Console Dependencies & Tech Debt

View the Description

An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.

Story CONSOLE-3438: Console operator should use bindata asset

View the Description View the linked PRs

Console-operator should switch from using bindata to using assets, similar to what cluster-kube-apiserver-operator and other operators are doing so we dont need to regenerate the bindata when yaml files are changes.

There is also an issue with generating bindata on ARM and other arch., where switching to assets, will make it obsolete.

https://github.com/openshift/cluster-kube-apiserver-operator/blob/005a95607cf9f8db490e962b549811d8bc0c5eaf/bindata/assets.go

https://github.com/openshift/console-operator/pull/783

Story CONSOLE-3717: Optimize updates by moving Helm CRDs to runlevel 50

View the Description View the linked PRs

Cluster-version operator (CVO) manifests declaring a runlevel are supposed to use 0000_<runlevel>_<dash-separated-component>_<manifest_filename>, but since api#598 added 0000_10-helm-chart-repository.crd.yaml and api#1084 added 0000_10-project-helm-chart-repository.crd.yaml, those have diverged from that pattern, so the cluster-version operator will fail to parse their runlevel. They're still getting sorted into a runlevel around 10 by this code, but unless there are reasons that the CRD needs to be pushed early in an update, it gives the CVO the ability to parallelize the reconciliation with more sibling resources if you leave the runlevel prefix off in the API repository (after which these COPY lines would need a matching bump).

https://github.com/openshift/console-operator/pull/782

Epic CONSOLE-3594: OCP 4.15 - Dynamic Plugins Epic

View the Description

This epic contains all the Dynamic Plugins related stories for OCP release-4.14 and implementing Core SDK utils.

Epic Goal

Track all the stories under a single epic

Acceptance Criteria

Story CONSOLE-3652: Enable ConsolePlugin v1 CRD storage

View the Description View the linked PRs

We need to enable the storage for the v1 version of our ConsolePlugin CRD in the API repository. ConsolePlugin v1 CRD was added in ~~CONSOLE-3077~~.

AC: Enable the storage for the v1 version of ConsolePlugin CRD and disable the storage for v1alpha1 version

https://github.com/openshift/console-operator/pull/816

Epic CONSOLE-3636: OCP 4.15 - Remove ACM integration code from console's codebase

View the Description

Epic Goal

Remove code that was added thought the ACM integration into all of the console's codebase repositories

Why is this important?

Since there was decision made stop with the ACM integration, we as a team decided that it would be better to remove the unused code in order avoid any confusion or regressions.

Acceptance Criteria

Identify all the places from which we need to remove the code that was added during the ACM integration.
Come up with a plan how to remove the code from our repositories and CI
Remove the code from console-operator repoy
Start with code removal from the console repository

Story CONSOLE-3613: Remove multicluster code from console operator

View the Description View the linked PRs

Remove all multicluster-related code from the console operator repo.

AC:

All multicluster code is removed
- controllers
- helpers
- tests
- bindata
- api
- types
- manifests
No regressions are introduced
Remove ACM related conditions in the starter

Epic IR-115: OCI Support

View the Description

Goal: Support OCI images.

Problem: Buildah and podman use OCI format by default and OpenShift Image Registry and ImageStream API doesn't understand it.

Why is this important: OCI images are supposed to replace Docker schema 2 images, OpenShift should be ready when OCI images become widely adopted.

Dependencies (internal and external):

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL): XL

Previous Work:

Customers:

Open Questions:

Story IR-112: Add support of OCI images to image pruner

View the Description View the linked PRs

User Story

As a user of OpenShift
I want the image pruner to be aware of OCI images
So that it doesn't delete their layers/configs

Acceptance Criteria

- When
  - an OCI image is pushed/mirrored to the registry
  - an schema 2 image is pushes/mirrored to the registry and share its layers/config with the OCI image
  - the schema 2 image is eligible to be pruned
  - the shared layers/config are not shared with other images
- the pruner
  - will delete the schema 2 image
  - will NOT delete the OCI image and its layers/config

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:

Enhancement proposal link
Previous product docs
Best practices
Known issues

Guiding Questions

User Story

Is this intended for an administrator, application developer, or other type of OpenShift user?
What experience level is this intended for? New, experienced, etc.?
Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

How should a customer use and/or configure the feature?
Are there any prerequisites for using/enabling the feature?

Notes

Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
Are there any recommended best practices when using this feature?
On feature completion, are there any known issues that customers should be aware of?

https://github.com/openshift/oc/pull/617

Story IR-113: Add support of OCI images to API server

View the Description View the linked PRs

User Story

As a user of OpenShift
I want to import OCI images
So that I can use images that are built by new tools

Acceptance Criteria

oc import-image works with OCI images

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:

https://bugzilla.redhat.com/show_bug.cgi?id=1817418

Guiding Questions

User Story

Is this intended for an administrator, application developer, or other type of OpenShift user?
What experience level is this intended for? New, experienced, etc.?
Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

How should a customer use and/or configure the feature?
Are there any prerequisites for using/enabling the feature?

Notes

Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
Are there any recommended best practices when using this feature?
On feature completion, are there any known issues that customers should be aware of?

https://github.com/openshift/openshift-apiserver/pull/145

Story IR-114: Add support of OCI images to image registry

View the Description View the linked PRs

User Story

As a user of OpenShift
I want to push OCI images to the registry
So that I can use buildah and podman with their defaults to push images

Acceptance Criteria

An OCI image can be pushed to the registry by buildah or podman
An imported OCI image can be pulled from the registry
The registry should be able to pull-through OCI images from other registries

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:

Enhancement proposal link
Previous product docs
Best practices
Known issues

Guiding Questions

User Story

Is this intended for an administrator, application developer, or other type of OpenShift user?
What experience level is this intended for? New, experienced, etc.?
Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

How should a customer use and/or configure the feature?
Are there any prerequisites for using/enabling the feature?

Notes

Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
Are there any recommended best practices when using this feature?
On feature completion, are there any known issues that customers should be aware of?

https://github.com/openshift/image-registry/pull/255

Epic IR-138: Continuous Improvement of Maintainability 4.7

View the Description

Epic Goal

Improve CI testing of the image registry components.

Why is this important?

The image registry, image API and the image pruner had a lot of tests removed during transition 4.0. This may make the platform less stable and/or slow down the team.

Scenarios

Acceptance Criteria

CI - tests should be more stable and have broader coverage

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.

Task IR-172: Convert image-registry integration tests into e2e tests

View the Description View the linked PRs

The integration tests for the image registry expect that OpenShift and tests are run on the same machine (i.e. OpenShift can connect to sockets that the tests listen). This is not the case with e2e tests.

Acceptance Criteria

every integration test is converted into an e2e test or a techdebt story
image-registry tests are green

https://github.com/openshift/image-registry/pull/258

Story IR-152: Monitor availability of registry during upgrade tests

View the Description View the linked PRs

An the registry developer
I want e2e-upgrade jobs to monitor availability of the registry during upgrades
So that I can be sure that clients can use the registry without disruptions.

Acceptance Criteria

A new test in openshift/origin repo.

Notes

https://github.com/openshift/origin/blob/e6b3d1ece61d7c3ab5a23151c9875e1f9ad36838/test/extended/util/disruption/controlplane/controlplane.go#L69

https://bugzilla.redhat.com/show_bug.cgi?id=1884380

https://github.com/openshift/origin/pull/25679

Story IR-137: Make ISI tests non-disruptive

View the Description View the linked PRs

https://github.com/openshift/origin/pull/25475/files marked our tests for ISI as Disruptive.

Tests should wait until operators become stable, otherwise other tests will be run on an unstable cluster and it'll cause flakes.

Acceptance Criteria

The tests wait until operators stable after image.config changes.
The tests is no longer [Disruptive].
If tests are slow (it depends on other operator, but MCO tends to be slow), they should be [Slow].

https://github.com/openshift/origin/pull/25497

Epic IR-85: Rebase Registry v2.7.1

View the Description

Goal: Rebase registry to Docker Distribution

Problem: The registry is currently based on an outdated version of the upstream docker/distribution project. The base does not even have a version associated with it - DevEx last rebased on an untagged commit.

Why is this important: Update the registry with improvements and bug fixes from the upstream community.

Dependencies (internal and external):

Prioritized epics + deliverables (in scope / not in scope):

~~DEVEXP-505~~ Rebase on Docker Distribution v2.7.1

Estimate (XS, S, M, L, XL, XXL): M

Previous Work:

Customers:

Open questions:

Story IR-52: Rebase Image Registry to v2.7.1

View the Description View the linked PRs

User Story

As a user of OpenShift
I want the image registry to be rebased on the latest docker/distribution release (v2.7.1)
So that the image registry has the latest upstream bugfixes and enhancements

Acceptance Criteria

Image registry is based on docker/distribution v2.7.1

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:

Enhancement proposal link
Previous product docs
Best practices
Known issues

Guiding Questions

User Story

Is this intended for an administrator, application developer, or other type of OpenShift user?
What experience level is this intended for? New, experienced, etc.?
Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

How should a customer use and/or configure the feature?
Are there any prerequisites for using/enabling the feature?

Notes

Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
Are there any recommended best practices when using this feature?
On feature completion, are there any known issues that customers should be aware of?

https://github.com/openshift/image-registry/pull/252

Epic MON-3153: Switch to metrics-server

View the Description

Epic Goal

Replace CMO's prometheus-adapter deployment with metrics-server as the provider for the metrics.k8s.io APIService
OpenShift gets an operator that deploys and reconciles metrics-server and related resources
explore if a capability that reflects what this operators make sense, see also https://github.com/openshift/enhancements/blob/master/enhancements/installer/component-selection.md

Why is this important?

For autoscaling OpenShift needs an resource metrics implementation. Currently this depends on CMO's Prometheus stack
Some users would like to opt-out of running a fully fledged Prometheus stack, see https://issues.redhat.com/browse/MON-3152

Scenarios

A cluster admin decides to not deploy a full Monitoring stack, autoscaling must still work.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

https://github.com/jan--f/cluster-monitoring-operator/tree/metrics-server

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story MON-3211: Implement switching to metrics-server

View the Description

Following points to consider to implement for switching to metrics-server:

Add this feature in Tech-Preview mode
Config option to switch metrics-server: https://issues.redhat.com/browse/MON-3214
- enable-metrics-server by switching to TechPreviewNoUpgrade mode
Flow of creating/deleting objects:
- Deploy metrics-server
- Verify metrics-server resource metrics api is working
- Update APIService object to point to metrics-server
- Delete prometheus-adapter resources
Deploy in HA mode (2 replicas which is the case for all components we deploy)

Acceptance Criteria:

we have a PR in the CMO repo
we have a POC
people can start deploying it in their cluster

Sub-task MON-3480: Vendor openshift/api to openshift/cluster-config-operator to bring featuregate changes

View the Description View the linked PRs

Vendor openshift/api to openshift/cluster-config-operator to bring featuregate changes.

https://github.com/openshift/api/pull/1615 is merged but this change won't be available unless we vendor change in openshift/cluster-config-operator

https://github.com/openshift/cluster-config-operator/pull/377

Epic NE-942: Add field in installer to get LB type for AWS

View the Description

Epic Goal

We need the installer to accept a LB type from user and then we could set type of LB in the following object.
oc get ingress.config.openshift.io/cluster -o yaml
Then we can fetch info from this object and reconcile the operator to have the NLB changes reflected.

This is an API change and we will consider this as a feature request.

Why is this important?

https://issues.redhat.com/browse/NE-799 Please check this for more details

Scenarios

https://issues.redhat.com/browse/NE-799 Please check this for more details

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

installer
ingress operator

Previous Work (Optional):

Open questions::

N/A

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story NE-975: Ingress controller logic to handle automatic recreation of AWS LB as per installation.

View the Description View the linked PRs

PR https://github.com/openshift/cluster-ingress-operator/pull/798/

https://github.com/openshift/cluster-config-operator/pull/268

Epic OCPBUILD-26: Kubernetes 1.27 Rebases

View the Description

Epic Goal

Update OpenShift components that are owned by the Builds + Jenkins Team to use Kubernetes 1.27

Why is this important?

Our components need to be updated to ensure that they are using the latest bug/CVE fixes, features, and that they are API compatible with other OpenShift components.

Acceptance Criteria

Existing CI/CD tests must be passing

Story OCPBUILD-145: Rebase openshift/cluster-openshift-controller-manager-operator to k8s 1.27

Epic OCPBUILD-48: Build Rebases OCP 4.9

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Rebase OpenShift components to k8s v1.22
Rebase Jenkins and plugins to latest long term support versions

Why is this important?

Rebasing ensures components work with the upcoming release of Kubernetes
Address tech debt related to upstream deprecations and removals.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

k8s 1.22 release - expected August 4th 2021

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story BUILD-298: Rebase samples operator to k8s 1.22

View the Description View the linked PRs

User Story

Rebase samples operator to k8s 1.22

Acceptance Criteria

Samples operator deploys with k8s 1.22 libraries
Core components continue to function (CI tests pass, including build suite).

Docs Impact

None

Notes

https://github.com/openshift/cluster-samples-operator/pull/388

Epic OCPRHV-812: Tests for oVirt Cluster API Provider and CSI Driver

View the Description

We need tests for the ovirt-csi-driver and the cluster-api-provider-ovirt. These tests help us to

minimize bugs,
reproduce and fix them faster and
pin down current behavior of the driver

Also, having dedicated tests on lower levels with a smaller scope (unit, integration, ...) has the following benefits:

fast feedback cycle (local test execution)
developer in-code documentation
easier onboarding for new contributers
lower resource consumption

Task OCPRHV-804: Tests for oVirt Cluster API Provider

View the Description

Resources:

https://cluster-api.sigs.k8s.io/developer/testing.html

Sub-task OCPRHV-806: Implement integration tests

View the Description View the linked PRs

Integration tests need to be implemented according to https://cluster-api.sigs.k8s.io/developer/testing.html#integration-tests using envtest.

https://github.com/openshift/cluster-api-provider-ovirt/pull/148

Epic ODC-5012: Allow admins to define pre-pinned resources on Dev navigation

View the Description

Overview

Today in the Dev Perspective navigation, we have pre-pinned resources in the "third section" of the vertical navigation. Customers have asked if there is a way for admins to to provide defaults for those pre-pinned resources for all users.

Acceptance Criteria

Add a section to the Cluster/Console menu for Developer pre-pinned resources
Admin can define which nav items exist by default for all users & is able to re-order them
These defaults should be used as the default pre-pinned items for all new users
Users who's nav items have not been customized will inherit these defaults the next time they log in.

Exploration Results

Miro Board

Slack Channel

tbd

Story ODC-7182: Extend the console-operator CRD and API to define pre-pinned resources

View the Description View the linked PRs

Description

As an admin, I want to define the pre-pinned resources on the developer perspective navigation

Based on the https://issues.redhat.com/browse/ODC-7181 enhancement proposal, it is required to extend the console configuration CRD to enable the cluster admins to configure this data in the console resource

Acceptance Criteria

Extend the "customization" spec type definition for the CRD in the openshift/api project

Additional Details:

Previous customization work:

https://github.com/openshift/console-operator/pull/697

Epic ODC-5897: Provide ability for admins to hide perspective

View the Description

Goal

Provide a form driven experience to allow cluster admins to manage the perspectives to meet the ACs below.

Problem:

We have heard the following requests from customers and developer advocates:

Some admins do not want to provide access to the Developer Perspective from the console
Some admins do not want to provide non-priv users access to the Admin Perspective from the console

Acceptance criteria:

Cluster administrator is able to "hide" the admin perspective for non-priv users
Cluster administrator is able to "hide" the developer perspective for all users
Be user that User Preferences for individual users behaves appropriately. If only one perspective is available, the perspective switcher is not needed.

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Miro board: https://miro.com/app/board/uXjVOoSAPbA=/
Recording: https://drive.google.com/file/d/10JUmaTMdfMH1NceHJW_74Y2cUxqwqmD_/view

Note:

Story ODC-6783: Extend the console-operator CRD and API for hiding user perspectives

View the Description View the linked PRs

Description

As an admin, I want to hide the admin perspective for non-privileged users or hide the developer perspective for all users

Based on the https://issues.redhat.com/browse/ODC-6730 enhancement proposal, it is required to extend the console configuration CRD to enable the cluster admins to configure this data in the console resource

Acceptance Criteria

Extend the "customization" spec type definition for the CRD in the openshift/api project

Additional Details:

Previous customization work:

https://github.com/openshift/console-operator/pull/678

Epic ODC-6462: Improve console telemetry

View the Description

Problem:

Currently we are only able to get limited telemetry from the Dev Sandbox, but not from any of our managed clusters or on prem clusters.

Goals:

Enable gathering segment telemetry whenever cluster telemetry is enabled on OSD clusters
Have our OSD clusters opt into telemetry by default
Work with PM & UX to identify additional metrics to capture in addition to what we have enabled currently on Sandbox.
Ability to get a single report from woopra across all of our Sandbox and OSD clusters.
Be able to generate a report including metrics of a single cluster or all clusters of a certain type ( sandbox, or OSD)

Why is it important?

In order to improve properly analyze usage and the user experience, we need to be able to gather as much data as possible.

Story ODC-6670: Provide telemetry configuration as SERVER_FLAGS in console backend/bridge

View the Description View the linked PRs

Acceptance Criteria

Extend console backend (bridge) to provide configuration as SERVER_FLAGS
```
// JS type
telemetry?: Record<string, string>
```
1. Read the annotation of the cluster ConfigMap for telemetry data and pass them into the internal serverconfig.
2. Pass through this internal serverconfig and export it as SERVER_FLAGS.
3. Add a new --telemetry CLI option so that the telemetry options could be tested in a dev environment:
```
./bin/bridge --telemetry SEGMENT_API_KEY=a-key-123-xzy
./bin/bridge --telemetry CONSOLE_LOG=debug
```
TBD: In best case the new annotation could be read from the cluster ConfigMap...
1. Otherwise update the console-operator to pass the annotation from the console cluster configuration to the console ConfigMap.

Additional Details:

More information about the integration with the backend could be found in the Telemetry on OSD clusters Google Doc

https://github.com/openshift/console-operator/pull/653

Epic ODC-6695: Allow admins to disable the Dev Catalog, or one or more of its sub-catalogs

View the Description

Problem:

Customers don't want their users to have access to some/all of the items which are available in the Developer Catalog. The request is to change access for the cluster, not per user or persona.

Goal:

Provide a form driven experience to allow cluster admins easily disable the Developer Catalog, or one or more of the sub catalogs in the Developer Catalog.

Why is it important?

Multiple customer requests.

Acceptance criteria:

As a cluster admin, I can hide/disable access to the developer catalog for all users across all namespaces.
As a cluster admin, I can hide/disable access to a specific sub-catalog in the developer catalog for all users across all namespaces.
1. Builder Images
2. Templates
3. Helm Charts
4. Devfiles
5. Operator Backed

Notes

We need to consider how this will work with subcatalogs which are installed by operators: VMs, Event Sources, Event Catalogs, Managed Services, Cloud based services

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Story ODC-6778: Extend the console-operator CRD and API for disabling the access to specific sub-catalog(s) in the dev catalog or the complete dev catalog

View the Description View the linked PRs

Description

As an admin, I want to hide/disable access to specific sub-catalogs in the developer catalog or the complete dev catalog for all users across all namespaces.

Based on the https://issues.redhat.com/browse/ODC-6732 enhancement proposal, it is required to extend the console configuration CRD to enable the cluster admins to configure this data in the console resource

Acceptance Criteria

Extend the "customization" spec type definition for the CRD in the openshift/api project

Additional Details:

Previous customization work:

https://github.com/openshift/console-operator/pull/676

Epic ODC-7171: [4.13] Improved telemetry (provide new metrics via insight)

View the Description

Goal

This epic has 3 main goals

Improve segment implementation so that we can easily enable additional telemetry pieces (hotjar, etc) for particular cluster types (starting with sandbox, maybe expanding to RHPDS). This will help us better understand where errors and drop off occurs in our trial and workshop clusters, thus being able to (1) help conversion and (2) proactively detect issues before they are "reported" by customers.
Improve telemetry so we can START capturing console usage across the fleet
Additional improvements to segment, to enable proper gathering of user telemetry and analysis

Problem

Currently we have no accurate telemetry of usage of the OpenShift Console across all clusters in the fleet. We should be able to utilize the auth and console telemetry to glean details which will allow us to get a picture of console usage by our customers.

Acceptance criteria

Let's do a spike to validate, and possibly have to update this list after the spike:

Need to verify HOW do we define a cluster Admin -> Listing all namespaces in a cluster? Install operators? Make sure that we consider OSD cluster admins as well (this should be aligned with how we send people to dev perspective in my mind)

Capture additional information via console plugin ( and possibly the auth operator )

Average number of users per cluster
Average number of cluster admin users per cluster
Average number of dev users per cluster
Average # of page views across the fleet
Average # of page views per perspective across the fleet
# of cluster which have disabled the admin perspective for any users
# of cluster which have disabled the dev perspective for any users
# of cluster which have disabled the “any” perspective for any users
# of clusters which have plugin “x” installed
Total number of unique users across the fleet
Total number of cluster admin users across the fleet
Total number of developer users across the fleet

Dependencies (External/Internal):

Understanding how to capture telemetry via the console operator

Exploration:

Note:

We have removed the following ACs for this release:

(p2) Average total active time spent per User in console (per cluster for all users)
1. per Cluster Admins
2. per non-Cluster Admins
(p2) Average active time spent in Dev Perspective [implies we can calculate this for admin perspective]
1. per Cluster Admins
2. per non-Cluster Admins-
(p3) Average # of times they change the perspective (per cluster for all users)

Bug OCPBUGS-12439: Improve telemetry epic (ODC-7171) doesn't work without PrometheusRule (4.14)

View the Description View the linked PRs

As Red Hat, we want to understand the usage of the (dev) console, for that, we want to add new Prometheus metrics (how many users have a cluster, etc.) and collect them later (as telemetry data) via cluster-monitoring-operator.

Eigther the console-operator or cluster-monitoring-operator needs to apply a PrometheusRule to collect the right data and make these later available in Superset DataHat or Tableau.

https://github.com/openshift/console-operator/pull/755

Epic OSASINFRA-2085: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OSASINFRA-902: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/4389

Epic SO-116: Samples Release Activities for OCP4.15

View the Description

We need to Bump the version of K8 and to run a library sync for OCP4.13 .Two stories will be created for each activity

Story SO-117: Re-sync Samples Library for OCP 4.15

View the Description View the linked PRs

Owner: Architect:

Story (Required)

As a Sample Operator Developer, I will like to run the library sync process, so the new libraries can be pushed to OCP 4.15

Background (Required)

This is a runbook we need to execute on every release of OpenShift

Glossary

Out of scope

In Scope

Approach(Required)

Follow instructions here: https://source.redhat.com/groups/public/appservices/wiki/cluster_samples_operator_release_activities

Dependencies

Library Repo

Edge Case

Acceptance Criteria

Library sync PR is merged in master

INVEST Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Legend

Unknown
Verified
Unsatisfied

https://github.com/openshift/cluster-samples-operator/pull/522

Story SO-118: Bump Kubernetes Version 27.4 to latest stable API 28.2

View the Description View the linked PRs

We need to bump the Kubernetes Version. To the latest API version OCP is using.

This what was done last time:

https://github.com/openshift/cluster-samples-operator/pull/409

Find latest stable version from here: https://github.com/kubernetes/api

This is described in wiki: https://source.redhat.com/groups/public/appservices/wiki/cluster_samples_operator_release_activities

https://github.com/openshift/cluster-samples-operator/pull/524

Epic OCPCLOUD-2512: Clean up Cloud Provider code CCM migration

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Stop setting `-~~cloud-provider` and `~~-cloud-config` arguments on KAS, KCM and MCO
Remove `CloudControllerOwner` condition from CCM and KCM ClusterOperators
Remove feature gating reliance in library-go IsCloudProviderExternal
Remove CloudProvider feature gates from openshift/api

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPCLOUD-2514: External cloud providers should not rely on feature gates

View the Description View the linked PRs

Background

Code in library-go currently uses feature gates to determine if Azure and GCP clusters should be external or not. They have been promoted for at least one release and we do not see ourselves going back.

In 4.17 the code is expected to be deleted completely.

We should remove the reliance on the feature gate from this part of the code and clean up references to feature gate access at the call sites.

Steps

Update library go to remove reliance on feature gates
Update callers to no longer rely on feature gate accessor (KCMO, KASO, MCO, CCMO)
Remove feature gates from API repo

Stakeholders

Cluster Infra
MCO team
Workloads team
API server team

Definition of Done

Feature gates for external cloud providers are removed from the product

Docs

<Add docs requirements for this card>

Testing

<Explain testing that will be added>

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/794

Story OCPCLOUD-2515: Remove CloudControllerOwner condition from CCMO and KCMO

View the Description View the linked PRs

Background

As part of the migration to external cloud providers, the CCMO and KCMO used a CloudControllerOwner condition to show which controller owned the controllers.

This is no longer required and can be removed.

Steps

Remove code from CCMO that looks for and gates on the KCMO condition
Ensure CCMO clears the condition
Ensure KCMO clears the condition

Stakeholders

Cluster Infra
Workloads team

Definition of Done

Clusters upgraded to 4.16 do not have a CloudControllerOwner condition set on the KCMO or CCMO ClusterOperators

Docs

<Add docs requirements for this card>

Testing

<Explain testing that will be added>

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/795

Epic OCPCLOUD-737: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPCLOUD-898: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/624

Epic SO-120: Samples Release Activities for OCP4.16

View the Description

We need to Bump the version of K8 and to run a library sync for OCP4.13 .Two stories will be created for each activity

Story SO-121: Re-sync Samples Library for OCP 4.16

View the Description View the linked PRs

Owner: Architect:

Story (Required)

As a Sample Operator Developer, I will like to run the library sync process, so the new libraries can be pushed to OCP 4.16

Background (Required)

This is a runbook we need to execute on every release of OpenShift

Glossary

Out of scope

In Scope

Approach(Required)

Follow instructions here: https://source.redhat.com/groups/public/appservices/wiki/cluster_samples_operator_release_activities

Dependencies

Library Repo

Edge Case

Acceptance Criteria

Library sync PR is merged in master

INVEST Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Legend

Unknown
Verified
Unsatisfied

Epic STOR-1565: OCP 4.16 release chores

View the Description

Epic Goal

Update all images that we ship with OpenShift to the latest upstream releases and libraries.
Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF).

Why is this important?

We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Story STOR-1573: Chore: update CSI sidecars

View the Description View the linked PRs

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

external-attacher
external-provisioner
external-resizer
external-snapshotter
node-driver-registrar
livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

Story STOR-1688: Chore: add .snyk file to ignore false positives

View the Description View the linked PRs

We get too many false positive bugs like https://issues.redhat.com/browse/OCPBUGS-25333 from SAST scans, especially from the vendor directory. Add a .snyk file like https://github.com/openshift/oc/blob/master/.snyk to each repo to ignore them.

Bug OCPBUGS-19173: Update 4.15 openshift-enterprise-console-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console-operator/pull/794

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/console-operator/pull/794

Bug OCPBUGS-3990: HyperShift control plane operators have wrong priorityClass

View the Description View the linked PRs

Description of problem:

This PR fails HyperShift CI fails with:

=== RUN TestAutoscaling/EnsureNoPodsWithTooHighPriority
util.go:411: pod csi-snapshot-controller-7bb4b877b4-q5457 with priorityClassName system-cluster-critical has a priority of 2000000000 with exceeds the max allowed of 100002000
util.go:411: pod csi-snapshot-webhook-644b6dbfb-v4lj7 with priorityClassName system-cluster-critical has a priority of 2000000000 with exceeds the max allowed of 100002000

How reproducible:

always

Steps to Reproduce:

Install HyperShift + create a guest cluster with CSI Snapshot Controller and/or Cluster Storage Operator / AWS EBS CSI driver operator running in the HyperShift managed cluster
Check priorityClass of the guest control plane pods in the hosted cluster.

Alternatively, ci/prow/e2e-aws in https://github.com/openshift/hypershift/pull/1698 and https://github.com/openshift/hypershift/pull/1748 must pass.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/167

Story CONSOLE-2892: Allow dynamic plugins to proxy to services on the cluster

View the Description View the linked PRs

Goal

We have several use cases where dynamic plugins need to proxy to another service on the cluster. One example is the Helm plugin. We would like to move the backend code for Helm to a separate service on the cluster, and the Helm plugin could proxy to that service for its requests. This is required to make Helm a dynamic plugin. Similarly if we want to have ACM contribute any views through dynamic plugins, we will need a way for ACM to proxy to its services (e.g., for Search).

It's possible for plugins to make requests to services exposed through routes today, but that has several problems:

It requires that the service be exposed outside the cluster, which is not always desired.
It requires the service support CORS headers for the console.
There is no way to specify a CA file for the route if it's not trusted by the browser.
Plugins will not have access to the user's access token on the client, which means that there is no simple way to handle auth.

Plugins need a way to declare in-cluster services that they need to connect to. The console backend will need to set up proxies to those services on console load. This also requires that the console operator be updated to pass the configuration to the console backend.

This work will apply only to single clusters.

Open Questions

What happens when a multitenant isolated network policy is configured on the cluster?

https://docs.openshift.com/container-platform/4.7/networking/network_policy/multitenant-network-policy.html

How do we (and can we?) support this for multi-cluster where console is running on a different hub cluster?
Do we need to auth for all requests?

Acceptance Criteria

Plugins can declare a service to proxy to in the ConsolePlugin resource
Plugins can specify a CA cert for the service
Console falls back to the service signing CA if none is specified
Plugins have a way of specifying whether the user's authentication token is included in requests through the service proxy
Dynamic plugin enhancement is updated with the implementation details
Support for server-side events (SSE) for ACM
Add support, or a flag, if auth is needed for each request.

cc Ali Mobrem [~christianmvogt]

https://github.com/openshift/console-operator/pull/603

Bug OCPBUGS-19176: Update 4.15 ose-service-ca-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/service-ca-operator/pull/221

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/service-ca-operator/pull/221

Bug OCPBUGS-2948: Whereabouts CNI timesout while iterating exclude range

View the Description View the linked PRs

Description of problem:

When creating a pod with an additional network that contains a `spec.config.ipam.exclude` range, any address within the excluded range is still iterated while searching for a suitable IP candidate. As a result, pod creation times out when large exclude ranges are used.

Version-Release number of selected component (if applicable):

How reproducible:

with big exclude ranges, 100%

Steps to Reproduce:

1. create network-attachment-definition with a large range:

$ cat <<EOF| oc apply -f -       
apiVersion: k8s.cni.cncf.io/v1                                            
kind: NetworkAttachmentDefinition
metadata:
  name: nad-w-excludes
spec:
  config: |-
    {
      "cniVersion": "0.3.1",
      "name": "macvlan-net",
      "type": "macvlan",
      "master": "ens3",
      "mode": "bridge",
      "ipam": {
         "type": "whereabouts",
         "range": "fd43:01f1:3daa:0baa::/64",
         "exclude": [ "fd43:01f1:3daa:0baa::/100" ],
         "log_file": "/tmp/whereabouts.log",
         "log_level" : "debug"
      }
    }
EOF
2. create a pod with the network attached:

$ cat <<EOF|oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-exclude-range
  annotations:
    k8s.v1.cni.cncf.io/networks: nad-w-excludes
spec:
  containers:
  - name: pod-1
    image: openshift/hello-openshift
EOF

3. check pod status, event log and whereabouts logs after a while: 

$ oc get pods
NAME                        READY   STATUS              RESTARTS   AGE
pod-with-exclude-range      0/1     ContainerCreating   0          2m23s

$ oc get events
<...>
6m39s       Normal    Scheduled                                    pod/pod-with-exclude-range                   Successfully assigned default/pod-with-exclude-range to <worker-node>
6m37s       Normal    AddedInterface                               pod/pod-with-exclude-range                   Add eth0 [10.129.2.49/23] from openshift-sdn
2m39s       Warning   FailedCreatePodSandBox                       pod/pod-with-exclude-range                   Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded

$ oc debug node/<worker-node> - tail /host/tmp/whereabouts.log
Starting pod/<worker-node>-debug ...
To use host binaries, run `chroot /host`
2022-10-27T14:14:50Z [debug] Finished leader election
2022-10-27T14:14:50Z [debug] IPManagement: {fd43:1f1:3daa:baa::1 ffffffffffffffff0000000000000000} , <nil>
2022-10-27T14:14:59Z [debug] Used defaults from parsed flat file config @ /etc/kubernetes/cni/net.d/whereabouts.d/whereabouts.conf
2022-10-27T14:14:59Z [debug] ADD - IPAM configuration successfully read: {Name:macvlan-net Type:whereabouts Routes:[] Datastore:kubernetes Addresses:[] OmitRanges:[fd43:01f1:3daa:0baa::/80] DNS: {Nameservers:[] Domain: Search:[] Options:[]} Range:fd43:1f1:3daa:baa::/64 RangeStart:fd43:1f1:3daa:baa:: RangeEnd:<nil> GatewayStr: EtcdHost: EtcdUsername: EtcdPassword:********* EtcdKeyFile: EtcdCertFile: EtcdCACertFile: LeaderLeaseDuration:1500 LeaderRenewDeadline:1000 LeaderRetryPeriod:500 LogFile:/tmp/whereabouts.log LogLevel:debug OverlappingRanges:true SleepForRace:0 Gateway:<nil> Kubernetes: {KubeConfigPath:/etc/kubernetes/cni/net.d/whereabouts.d/whereabouts.kubeconfig K8sAPIRoot:} ConfigurationPath:PodName:pod-with-exclude-range PodNamespace:default} 
2022-10-27T14:14:59Z [debug] Beginning IPAM for ContainerID: f4ffd0e07d6c1a2b6ffb0fa29910c795258792bb1a1710ff66f6b48fab37af82
2022-10-27T14:14:59Z [debug] Started leader election
2022-10-27T14:14:59Z [debug] OnStartedLeading() called
2022-10-27T14:14:59Z [debug] Elected as leader, do processing
2022-10-27T14:14:59Z [debug] IPManagement - mode: 0 / containerID:f4ffd0e07d6c1a2b6ffb0fa29910c795258792bb1a1710ff66f6b48fab37af82 / podRef: default/pod-with-exclude-range
2022-10-27T14:14:59Z [debug] IterateForAssignment input >> ip: fd43:1f1:3daa:baa:: | ipnet: {fd43:1f1:3daa:baa:: ffffffffffffffff0000000000000000} | first IP: fd43:1f1:3daa:baa::1 | last IP: fd43:1f1:3daa:baa:ffff:ffff:ffff:ffff

Actual results:

Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Expected results:

additional network gets attached to the pod

Additional info:

https://github.com/openshift/whereabouts-cni/pull/102

Bug OCPBUGS-13926: OCM-o does not support obtaining verbosity through OpenShiftControllerManager.operatorLogLevel objec

View the Description View the linked PRs

Description of problem:

OCM-o does not support obtaining verbosity through OpenShiftControllerManager.operatorLogLevel object

Version-Release number of selected component (if applicable):

How reproducible:

modify the OpenShiftControllerManager.operatorLogLevel, and the OCM-o operator will not display the correspond logs

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-15823: Adjust CSI rpc call timeouts from Sidecar for AWS and GCP-PD driver

View the Description View the linked PRs

We should adjust CSI RPC call timeout from sidecars to CSI driver. We seem to be using default values which are just too short and hence can cause unintended side-effects.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/248

Bug OCPBUGS-2219: ConsolePlugin CRs cannot be garbage collected due to missing spec.i18n.loadType value

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/api/pull/1186 - https://issues.redhat.com/browse/CONSOLE-3069 promoted ConsolePlugin CRD to v1.

The PR introduces also a conversion webhook from v1alpha1 to v1.

In new CRD version I18n ConsolePluginI18n is marked as optional.
The conversion webhook will not set a default valid ("Lazy"/"Preload") value writing the v1 object and a v1 object completely omitting spec.i18n will be accepted we no valid default value as well.

On the other side, at garbage collection time the object will be stuck forever due to the lack of a valid value for spec.i18n.loadType

Example,
create a v1 ConsolePlugin object:

cat <<EOF | oc apply -f -
apiVersion: console.openshift.io/v1
kind: ConsolePlugin
metadata:
  name: test472
spec:
  backend:
    service:
      basePath: /
      name: test472-service
      namespace: kubevirt-hyperconverged
      port: 9443
    type: Service
  displayName: Test 472 Plugin
EOF

Delete it in foreground mode:
stirabos@t14s:~$ oc delete consoleplugin test472 --timeout=30s --cascade='foreground' -v 7
I1011 18:20:03.255605   31610 loader.go:372] Config loaded from file:  /home/stirabos/.kube/config
I1011 18:20:03.266567   31610 round_trippers.go:463] DELETE https://api.ci-ln-krdzphb-72292.gcp-2.ci.openshift.org:6443/apis/console.openshift.io/v1/consoleplugins/test472
I1011 18:20:03.266581   31610 round_trippers.go:469] Request Headers:
I1011 18:20:03.266588   31610 round_trippers.go:473]     Accept: application/json
I1011 18:20:03.266594   31610 round_trippers.go:473]     Content-Type: application/json
I1011 18:20:03.266600   31610 round_trippers.go:473]     User-Agent: oc/4.11.0 (linux/amd64) kubernetes/fcf512e
I1011 18:20:03.266606   31610 round_trippers.go:473]     Authorization: Bearer <masked>
I1011 18:20:03.688569   31610 round_trippers.go:574] Response Status: 200 OK in 421 milliseconds
consoleplugin.console.openshift.io "test472" deleted
I1011 18:20:03.688911   31610 round_trippers.go:463] GET https://api.ci-ln-krdzphb-72292.gcp-2.ci.openshift.org:6443/apis/console.openshift.io/v1/consoleplugins?fieldSelector=metadata.name%3Dtest472
I1011 18:20:03.688919   31610 round_trippers.go:469] Request Headers:
I1011 18:20:03.688928   31610 round_trippers.go:473]     Authorization: Bearer <masked>
I1011 18:20:03.688935   31610 round_trippers.go:473]     Accept: application/json
I1011 18:20:03.688941   31610 round_trippers.go:473]     User-Agent: oc/4.11.0 (linux/amd64) kubernetes/fcf512e
I1011 18:20:03.840103   31610 round_trippers.go:574] Response Status: 200 OK in 151 milliseconds
I1011 18:20:03.840825   31610 round_trippers.go:463] GET https://api.ci-ln-krdzphb-72292.gcp-2.ci.openshift.org:6443/apis/console.openshift.io/v1/consoleplugins?fieldSelector=metadata.name%3Dtest472&resourceVersion=175205&watch=true
I1011 18:20:03.840848   31610 round_trippers.go:469] Request Headers:
I1011 18:20:03.840884   31610 round_trippers.go:473]     Accept: application/json
I1011 18:20:03.840907   31610 round_trippers.go:473]     User-Agent: oc/4.11.0 (linux/amd64) kubernetes/fcf512e
I1011 18:20:03.840928   31610 round_trippers.go:473]     Authorization: Bearer <masked>
I1011 18:20:03.972219   31610 round_trippers.go:574] Response Status: 200 OK in 131 milliseconds
error: timed out waiting for the condition on consoleplugins/test472

and in kube-controller-manager logs we see:

2022-10-11T16:25:32.192864016Z I1011 16:25:32.192788       1 garbagecollector.go:501] "Processing object" object="test472" objectUID=0cc46a01-113b-4bbe-9c7a-829a97d6867c kind="ConsolePlugin" virtual=false
2022-10-11T16:25:32.282303274Z I1011 16:25:32.282161       1 garbagecollector.go:623] remove DeleteDependents finalizer for item [console.openshift.io/v1/ConsolePlugin, namespace: , name: test472, uid: 0cc46a01-113b-4bbe-9c7a-829a97d6867c]
2022-10-11T16:25:32.304835330Z E1011 16:25:32.304730       1 garbagecollector.go:379] error syncing item &garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"console.openshift.io/v1", Kind:"ConsolePlugin", Name:"test472", UID:"0cc46a01-113b-4bbe-9c7a-829a97d6867c", Controller:(*bool)(nil), BlockOwnerDeletion:(*bool)(nil)}, Namespace:""}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:1, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{}, deletingDependents:true, deletingDependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, beingDeleted:true, beingDeletedLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, virtual:false, virtualLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, owners:[]v1.OwnerReference(nil)}: ConsolePlugin.console.openshift.io "test472" is invalid: spec.i18n.loadType: Unsupported value: "": supported values: "Preload", "Lazy"

Version-Release number of selected component (if applicable):

OCP 4.12.0 ec4

How reproducible:

100%

Steps to Reproduce:

1. cat <<EOF | oc apply -f -
apiVersion: console.openshift.io/v1
kind: ConsolePlugin
metadata:
  name: test472
spec:
  backend:
    service:
      basePath: /
      name: test472-service
      namespace: kubevirt-hyperconverged
      port: 9443
    type: Service
  displayName: Test 472 Plugin
EOF

2. oc delete consoleplugin test472 --timeout=30s --cascade='foreground' -v 7

Actual results:

2022-10-11T16:25:32.192864016Z I1011 16:25:32.192788       1 garbagecollector.go:501] "Processing object" object="test472" objectUID=0cc46a01-113b-4bbe-9c7a-829a97d6867c kind="ConsolePlugin" virtual=false
2022-10-11T16:25:32.282303274Z I1011 16:25:32.282161       1 garbagecollector.go:623] remove DeleteDependents finalizer for item [console.openshift.io/v1/ConsolePlugin, namespace: , name: test472, uid: 0cc46a01-113b-4bbe-9c7a-829a97d6867c]
2022-10-11T16:25:32.304835330Z E1011 16:25:32.304730       1 garbagecollector.go:379] error syncing item &garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"console.openshift.io/v1", Kind:"ConsolePlugin", Name:"test472", UID:"0cc46a01-113b-4bbe-9c7a-829a97d6867c", Controller:(*bool)(nil), BlockOwnerDeletion:(*bool)(nil)}, Namespace:""}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:1, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{}, deletingDependents:true, deletingDependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, beingDeleted:true, beingDeletedLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, virtual:false, virtualLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, owners:[]v1.OwnerReference(nil)}: ConsolePlugin.console.openshift.io "test472" is invalid: spec.i18n.loadType: Unsupported value: "": supported values: "Preload", "Lazy"

Expected results:

Object correctly deleted

Additional info:

The issue doesn't happen with --cascade='background' which is the default on the CLI client

https://github.com/openshift/console-operator/pull/689

Bug OCPBUGS-3041: Guard Pod Hostnames Too Long and Truncated Down Into Collisions With Other Masters

View the Description View the linked PRs

Discovered in the must gather kubelet_service.log from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-sdn-upgrade/1586093220087992320

It appears the guard pod names are too long, and being truncated down to where they will collide with those from the other masters.

From kubelet logs in this run:

❯ grep openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-maste kubelet_service.log
Oct 28 23:58:55.693391 ci-op-3hj6pnwf-4f6ab-lv57z-master-1 kubenswrapper[1657]: E1028 23:58:55.693346    1657 kubelet_pods.go:413] "Hostname for pod was too long, truncated it" podName="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-master-1" hostnameMaxLen=63 truncatedHostname="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-maste"
Oct 28 23:59:03.735726 ci-op-3hj6pnwf-4f6ab-lv57z-master-0 kubenswrapper[1670]: E1028 23:59:03.735671    1670 kubelet_pods.go:413] "Hostname for pod was too long, truncated it" podName="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-master-0" hostnameMaxLen=63 truncatedHostname="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-maste"
Oct 28 23:59:11.168082 ci-op-3hj6pnwf-4f6ab-lv57z-master-2 kubenswrapper[1667]: E1028 23:59:11.168041    1667 kubelet_pods.go:413] "Hostname for pod was too long, truncated it" podName="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-master-2" hostnameMaxLen=63 truncatedHostname="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-maste"

This also looks to be happening for openshift-kube-scheduler-guard, kube-controller-manager-guard, possibly others.

Looks like they should be truncated further to make room for random suffixes in https://github.com/openshift/library-go/blame/bd9b0e19121022561dcd1d9823407cd58b2265d0/pkg/operator/staticpod/controller/guard/guard_controller.go#L97-L98

Unsure of the implications here, it looks a little scary.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/664

Bug OCPBUGS-1718: prometheus-k8s-0 ends in CrashLoopBackOff with evel=error err="opening storage failed: /prometheus/chunks_head/000002: invalid magic number 0" on SNO after hard reboot tests

View the Description View the linked PRs

Description of problem:

prometheus-k8s-0 ends in CrashLoopBackOff with evel=error err="opening storage failed: /prometheus/chunks_head/000002: invalid magic number 0" on SNO after hard reboot tests

Version-Release number of selected component (if applicable):

4.11.6

How reproducible:

Not always, after ~10 attempts

Steps to Reproduce:

1. Deploy SNO with Telco DU profile applied
2. Hard reboot node via out of band interface
3. oc -n openshift-monitoring get pods prometheus-k8s-0

Actual results:

NAME               READY   STATUS             RESTARTS          AGE
prometheus-k8s-0   5/6     CrashLoopBackOff   125 (4m57s ago)   5h28m

Expected results:

Running

Additional info:

Attaching must-gather.

The pod recovers successfully after deleting/re-creating.


[kni@registry.kni-qe-0 ~]$ oc -n openshift-monitoring logs prometheus-k8s-0
ts=2022-09-26T14:54:01.919Z caller=main.go:552 level=info msg="Starting Prometheus Server" mode=server version="(version=2.36.2, branch=rhaos-4.11-rhel-8, revision=0d81ba04ce410df37ca2c0b1ec619e1bc02e19ef)"
ts=2022-09-26T14:54:01.919Z caller=main.go:557 level=info build_context="(go=go1.18.4, user=root@371541f17026, date=20220916-14:15:37)"
ts=2022-09-26T14:54:01.919Z caller=main.go:558 level=info host_details="(Linux 4.18.0-372.26.1.rt7.183.el8_6.x86_64 #1 SMP PREEMPT_RT Sat Aug 27 22:04:33 EDT 2022 x86_64 prometheus-k8s-0 (none))"
ts=2022-09-26T14:54:01.919Z caller=main.go:559 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2022-09-26T14:54:01.919Z caller=main.go:560 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2022-09-26T14:54:01.921Z caller=web.go:553 level=info component=web msg="Start listening for connections" address=127.0.0.1:9090
ts=2022-09-26T14:54:01.922Z caller=main.go:989 level=info msg="Starting TSDB ..."
ts=2022-09-26T14:54:01.924Z caller=tls_config.go:231 level=info component=web msg="TLS is disabled." http2=false
ts=2022-09-26T14:54:01.926Z caller=main.go:848 level=info msg="Stopping scrape discovery manager..."
ts=2022-09-26T14:54:01.926Z caller=main.go:862 level=info msg="Stopping notify discovery manager..."
ts=2022-09-26T14:54:01.926Z caller=manager.go:951 level=info component="rule manager" msg="Stopping rule manager..."
ts=2022-09-26T14:54:01.926Z caller=manager.go:961 level=info component="rule manager" msg="Rule manager stopped"
ts=2022-09-26T14:54:01.926Z caller=main.go:899 level=info msg="Stopping scrape manager..."
ts=2022-09-26T14:54:01.926Z caller=main.go:858 level=info msg="Notify discovery manager stopped"
ts=2022-09-26T14:54:01.926Z caller=main.go:891 level=info msg="Scrape manager stopped"
ts=2022-09-26T14:54:01.926Z caller=notifier.go:599 level=info component=notifier msg="Stopping notification manager..."
ts=2022-09-26T14:54:01.926Z caller=main.go:844 level=info msg="Scrape discovery manager stopped"
ts=2022-09-26T14:54:01.926Z caller=manager.go:937 level=info component="rule manager" msg="Starting rule manager..."
ts=2022-09-26T14:54:01.926Z caller=main.go:1120 level=info msg="Notifier manager stopped"
ts=2022-09-26T14:54:01.926Z caller=main.go:1129 level=error err="opening storage failed: /prometheus/chunks_head/000002: invalid magic number 0"

https://github.com/openshift/prometheus/pull/142

Story TRT-1420: console and auth operators failing to install in 4.16 intermittently

View the Description View the linked PRs

Last payload showed several occurrences of this problem seemingly surfacing the same way:

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.nightly/release/4.16.0-0.nightly-2023-12-18-092716

Example jobs:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-sdn-techpreview-serial/1736680969496170496

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-sdn-techpreview-serial/1736680969496170496 (blocked the payload)

Looking at sippy, both tests took a dip in pass rate on the 16th, meaning a regression may have merged late on the 15th (friday) or somewhere on the 16th (less likely)

console operator test 4.16

auth operator test 4.16

The problem kills the install and thus we are getting no intervals charts to help debug.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/781

Bug OCPBUGS-22075: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/206

Bug OCPBUGS-23300: Internal NLB issue (OCPBUGS-9026) causes random failures on HCP private cluster without infra nodes

View the Description View the linked PRs

Description of problem:

Actually the issue is same root cause of https://issues.redhat.com/browse/OCPBUGS-9026 but I'd like to open new one since the issue becomes very critical after ROSA using NLB as default since 4.14, HCP(HyperShift) private cluster that without infra nodes is the serious victim because it has worker nodes only and no available workaround for it now.

But if we think we could use the old bug to track the issue, then please close this one.

Version-Release number of selected component (if applicable):

4.14.1
HyperShift Private cluster

How reproducible:

100%

Steps to Reproduce:

1. create ROSA HCP(HyperShift) cluster
2. run qe-e2e-test on this cluster, or curl route from one pod inside the cluster
3.

Actual results:

1. co/console status is flapping since route is intermittently accessible 
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.1    True        False         4h56m   Error while reconciling 4.14.1: the cluster operator console is not available


2. check node and router pods running on both worker nodes
$ oc get node
NAME                          STATUS   ROLES    AGE    VERSION
ip-10-0-49-184.ec2.internal   Ready    worker   5h5m   v1.27.6+f67aeb3
ip-10-0-63-210.ec2.internal   Ready    worker   5h8m   v1.27.6+f67aeb3

$ oc -n openshift-ingress get pod -owide
NAME                              READY   STATUS    RESTARTS   AGE    IP           NODE                          NOMINATED NODE   READINESS GATES
router-default-86d569bf84-bq66f   1/1     Running   0          5h8m   10.130.0.7   ip-10-0-49-184.ec2.internal   <none>           <none>
router-default-86d569bf84-v54hp   1/1     Running   0          5h8m   10.128.0.9   ip-10-0-63-210.ec2.internal   <none>           <none>

3. check ingresscontroller LB setting, it uses Internal NLB

spec:
  endpointPublishingStrategy:
    loadBalancer:
      dnsManagementPolicy: Managed
      providerParameters:
        aws:
          networkLoadBalancer: {}
          type: NLB
        type: AWS
      scope: Internal
    type: LoadBalancerService

4. continue to curl the route from a pod inside the cluster
$ oc rsh console-operator-86786df488-w6fks
Defaulted container "console-operator" out of: console-operator, conversion-webhook-server

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
HTTP/1.1 200 OK

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
Connection timed out

Expected results:

1. co/console should be stable, curl console route should be always OK.
2. qe-e2e-test should not fail

Additional info:

qe-e2e-test on the cluster:

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/45369/rehearse-45369-periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-stable-aws-rosa-sts-hypershift-sec-guest-prod-private-link-full-f2/1724307074235502592

https://github.com/openshift/console-operator/pull/815

Bug OCPBUGS-3123: Operator attempts to render both GA and Tech Preview API Extensions

View the Description View the linked PRs

Description of problem:

Support for tech preview API extensions was introduced in https://github.com/openshift/installer/pull/6336 and https://github.com/openshift/api/pull/1274 .  In the case of https://github.com/openshift/api/pull/1278 , config/v1/0000_10_config-operator_01_infrastructure-TechPreviewNoUpgrade.crd.yaml was introduced which seems to result in both 0000_10_config-operator_01_infrastructure-TechPreviewNoUpgrade.crd.yaml and 0000_10_config-operator_01_infrastructure-Default.crd.yaml being rendered by the bootstrap.  As a result, both CRDs are created during bootstrap.  However, one of them(in this case the tech preview CRD) fails to be created.  

We may need to modify the render command to be aware of feature gates when rendering manifests during bootstrap.  Also, I'm open hearing other views on how this might work.

Version-Release number of selected component (if applicable):

https://github.com/openshift/cluster-config-operator/pull/269 built and running on 4.12-ec5

How reproducible:

consistently

Steps to Reproduce:

1. bump the version of OpenShift API to one including a tech preview version of the infrastructure CRD
2. install openshift with the infrastructure manifest modified to incorporate tech preview fields
3. those fields will not be populated upon installation

Also, checking the logs from bootkube will show both being installed, but one of them fails.

Actual results:

Expected results:

Additional info:

Excerpts from bootkube log
Nov 02 20:40:01 localhost.localdomain bootkube.sh[4216]: Writing asset: /assets/config-bootstrap/manifests/0000_10_config-operator_01_infrastructure-TechPreviewNoUpgrade.crd.yaml
Nov 02 20:40:01 localhost.localdomain bootkube.sh[4216]: Writing asset: /assets/config-bootstrap/manifests/0000_10_config-operator_01_infrastructure-Default.crd.yaml


Nov 02 20:41:23 localhost.localdomain bootkube.sh[5710]: Created "0000_10_config-operator_01_infrastructure-Default.crd.yaml" customresourcedefinitions.v1.apiextensions.k8s.io/infrastructures.config.openshift.io -n
Nov 02 20:41:23 localhost.localdomain bootkube.sh[5710]: Skipped "0000_10_config-operator_01_infrastructure-TechPreviewNoUpgrade.crd.yaml" customresourcedefinitions.v1.apiextensions.k8s.io/infrastructures.config.openshift.io -n  as it already exists

https://github.com/openshift/cluster-config-operator/pull/271

Task MON-3633: Upgrade Prometheus to v2.48.1

View the linked PRs

https://github.com/openshift/prometheus/pull/188

Bug OCPBUGS-10108: Update 4.14 openshift-enterprise-console-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console-operator/pull/737

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/console-operator/pull/738

Bug OCPBUGS-7837: hypershift: aws-ebs-csi-driver-operator uses guest cluster proxy causing PV provisioning failure

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/186

Bug OCPBUGS-12825: 4.14 prometheus image should be built with go1.20

View the Description View the linked PRs

Description of problem:

based on bugs from ART team, example: https://issues.redhat.com/browse/OCPBUGS-12347, 4.14 image should be built with go 1.20, but prometheus container image is built by go1.19.6

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/goversion/values' | jq
{
  "status": "success",
  "data": [
    "go1.19.6",
    "go1.20.3"
  ]
}

searched from thanos API

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query={__name__=~".*",goversion="go1.19.6"}' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "prometheus_build_info",
          "branch": "rhaos-4.14-rhel-8",
          "container": "kube-rbac-proxy",
          "endpoint": "metrics",
          "goarch": "amd64",
          "goos": "linux",
          "goversion": "go1.19.6",
          "instance": "10.128.2.19:9092",
          "job": "prometheus-k8s",
          "namespace": "openshift-monitoring",
          "pod": "prometheus-k8s-0",
          "prometheus": "openshift-monitoring/k8s",
          "revision": "fe01b9f83cb8190fc8f04c16f4e05e87217ab03e",
          "service": "prometheus-k8s",
          "tags": "unknown",
          "version": "2.43.0"
        },
        "value": [
          1682576802.496,
          "1"
        ]
      },
...

prometheus-k8s-0 container name: [prometheus config-reloader thanos-sidecar prometheus-proxy kube-rbac-proxy kube-rbac-proxy-thanos], prometheus image is built with go1.19.6

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- prometheus --version
prometheus, version 2.43.0 (branch: rhaos-4.14-rhel-8, revision: fe01b9f83cb8190fc8f04c16f4e05e87217ab03e)
  build user:       root@402ffbe02b57
  build date:       20230422-00:43:08
  go version:       go1.19.6
  platform:         linux/amd64
  tags:             unknown

$ oc -n openshift-monitoring exec -c config-reloader prometheus-k8s-0 -- prometheus-config-reloader --version
prometheus-config-reloader, version 0.63.0 (branch: rhaos-4.14-rhel-8, revision: ce71a7d)
  build user:       root
  build date:       20230424-15:53:51
  go version:       go1.20.3
  platform:         linux/amd64

$ oc -n openshift-monitoring exec -c thanos-sidecar prometheus-k8s-0 -- thanos --version
thanos, version 0.31.0 (branch: rhaos-4.14-rhel-8, revision: d58df6d218925fd007e16965f50047c9a4194c42)
  build user:       root@c070c5e6af32
  build date:       20230422-00:44:21
  go version:       go1.20.3
  platform:         linux/amd64


# owned by oauth team, not responsible by Monitoring
$ oc -n openshift-monitoring exec -c prometheus-proxy prometheus-k8s-0 -- oauth-proxy --version
oauth2_proxy was built with go1.18.10

# below isssue is tracked by bug OCPBUGS-12821
$ oc -n openshift-monitoring exec -c kube-rbac-proxy prometheus-k8s-0 -- kube-rbac-proxy --version
Kubernetes v0.0.0-master+$Format:%H$

$ oc -n openshift-monitoring exec -c kube-rbac-proxy-thanos prometheus-k8s-0 -- kube-rbac-proxy --version
Kubernetes v0.0.0-master+$Format:%H$

should fix files
https://github.com/openshift/prometheus/blob/master/.ci-operator.yaml#L4
https://github.com/openshift/prometheus/blob/master/Dockerfile.ocp#L1

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-26-154754

How reproducible:

always

Actual results:

4.14 prometheus is built with go1.19.6

Expected results:

4.14 prometheus image should be built with go1.20

Additional info:

no functional impact

https://github.com/openshift/prometheus/pull/160

Bug OCPBUGS-18846: Update 4.15 golang-github-prometheus-alertmanager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-alertmanager/pull/75

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus/pull/169

Bug OCPBUGS-19171: Update 4.15 configmap-reload image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/configmap-reload/pull/56

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/configmap-reload/pull/56

Bug OCPBUGS-24608: Whereabouts assignment error

View the Description View the linked PRs

Description of problem:

    Before: Warning  FailedCreatePodSandBox  8s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "82187d55b1379aad1e6c02b3394df7a8a0c84cc90902af413c1e0d9d56ddafb0": plugin type="multus" name="multus-cni-network" failed (add): [default/netshoot-deployment-59898b5dd9-hhvfn/89e6349b-9797-4e03-8828-ebafe224dfaf:whereaboutsexample]: error adding container to network "whereaboutsexample": error at storage engine: Could not allocate IP in range: ip: 2000::1 / - 2000::ffff:ffff:ffff:fffe / range: net.IPNet{IP:net.IP{0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Mask:net.IPMask{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}

After:  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               6s    default-scheduler  Successfully assigned default/netshoot-deployment-59898b5dd9-kk2zm to whereabouts-worker
  Normal   AddedInterface          6s    multus             Add eth0 [10.244.2.2/24] from kindnet
  Warning  FailedCreatePodSandBox  6s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "23dd45e714db09380150b5df74be37801bf3caf73a5262329427a5029ef44db1": plugin type="multus" name="multus-cni-network" failed (add): [default/netshoot-deployment-59898b5dd9-kk2zm/142de5eb-9f8a-4818-8c5c-6c7c85fe575e:whereaboutsexample]: error adding container to network "whereaboutsexample": error at storage engine: Could not allocate IP in range: ip: 2000::1 / - 2000::ffff:ffff:ffff:fffe / range: 2000::/64 / excludeRanges: [2000::/32]

Fixed upstream in #366 https://github.com/k8snetworkplumbingwg/whereabouts/pull/366

https://github.com/openshift/whereabouts-cni/pull/211

Bug OCPBUGS-19136: Update 4.15 ose-cluster-openshift-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/304

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/304

Bug OCPBUGS-16891: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-21722: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-29973: ART requests updates to 4.16 image ose-cluster-openshift-controller-manager-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/337

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/337

Bug OCPBUGS-30440: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-livenessprobe/pull/63

Bug OCPBUGS-13547: Azure CCM should be promoted to GA

View the Description View the linked PRs

Description of problem:

Azure CCM should be GA before the end of 4.14. When we previously tried to promote it there were issues, so we need to improve the feature gates promotion so that we can promote all components in a single release.
And then promote the CCM to GA once those changes are in place.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-config-operator/pull/307

Bug OCPBUGS-14810: Update OWNERS and OWNERS_ALIASES in livenessprobe repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-livenessprobe/pull/45

Bug OCPBUGS-3283: remove unnecessary RBAC in KCM

View the Description View the linked PRs

Description of problem:

We discovered that we are shipping unnecesary RBAC in https://coreos.slack.com/archives/CC3CZCQHM/p1667571136730989 .

This RBAC was only used 4.2 and 4.3 for

for making a switch from configMaps to leases in leader election

and we should remove it

Version-Release number of selected component (if applicable):{code:none}

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/661

Bug OCPBUGS-24912: Update 4.16 configmap-reload-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/configmap-reload/pull/58

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/configmap-reload/pull/58

Bug OCPBUGS-3929: Use flowcontrol/v1beta2 for apf manifests in 4.13

View the Description View the linked PRs

In 4.12.0-rc.0 some API-server components declare flowcontrol/v1beta1 release manifests:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.12.0-rc.0-x86_64
$ grep -r flowcontrol.apiserver.k8s.io manifests
manifests/0000_50_cluster-authentication-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_50_cluster-authentication-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_50_cluster-authentication-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_50_cluster-authentication-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_20_etcd-operator_10_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_50_cluster-openshift-apiserver-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_50_cluster-openshift-apiserver-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_50_cluster-openshift-apiserver-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
manifests/0000_50_cluster-openshift-controller-manager-operator_10_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1beta1

The APIs are scheduled for removal in Kube 1.26, which will ship with OpenShift 4.13. We want the 4.12 CVO to move to modern APIs in 4.12, so the APIRemovedInNext.*ReleaseInUse alerts are not firing on 4.12. This ticket tracks removing those manifests, or replacing them with a more modern resource type, or some such. Definition of done is that new 4.13 (and with backports, 4.12) nightlies no longer include flowcontrol.apiserver.k8s.io/v1beta1 manifests.

This can be noticed in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/27560/pull-ci-openshift-origin-master-e2e-gcp-ovn/1593697975584952320/artifacts/e2e-gcp-ovn/openshift-e2e-test/build-log.txt:

[It] clients should not use APIs that are removed in upcoming releases [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/apiserver/api_requests.go:27
Nov 18 21:59:06.261: INFO: api flowschemas.v1beta1.flowcontrol.apiserver.k8s.io, removed in release 1.26, was accessed 254 times
Nov 18 21:59:06.261: INFO: api horizontalpodautoscalers.v2beta2.autoscaling, removed in release 1.26, was accessed 10 times
Nov 18 21:59:06.261: INFO: api prioritylevelconfigurations.v1beta1.flowcontrol.apiserver.k8s.io, removed in release 1.26, was accessed 22 times
Nov 18 21:59:06.261: INFO: user/system:serviceaccount:openshift-cluster-version:default accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 224 times
Nov 18 21:59:06.261: INFO: user/system:serviceaccount:openshift-cluster-version:default accessed prioritylevelconfigurations.v1beta1.flowcontrol.apiserver.k8s.io 22 times
Nov 18 21:59:06.261: INFO: user/system:serviceaccount:openshift-kube-storage-version-migrator:kube-storage-version-migrator-sa accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 16 times
Nov 18 21:59:06.261: INFO: user/system:admin accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 14 times
Nov 18 21:59:06.261: INFO: user/system:serviceaccount:openshift-monitoring:kube-state-metrics accessed horizontalpodautoscalers.v2beta2.autoscaling 10 times
Nov 18 21:59:06.261: INFO: api flowschemas.v1beta1.flowcontrol.apiserver.k8s.io, removed in release 1.26, was accessed 254 times
api horizontalpodautoscalers.v2beta2.autoscaling, removed in release 1.26, was accessed 10 times
api prioritylevelconfigurations.v1beta1.flowcontrol.apiserver.k8s.io, removed in release 1.26, was accessed 22 times
user/system:admin accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 14 times
user/system:serviceaccount:openshift-cluster-version:default accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 224 times
user/system:serviceaccount:openshift-cluster-version:default accessed prioritylevelconfigurations.v1beta1.flowcontrol.apiserver.k8s.io 22 times
user/system:serviceaccount:openshift-kube-storage-version-migrator:kube-storage-version-migrator-sa accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 16 times
user/system:serviceaccount:openshift-monitoring:kube-state-metrics accessed horizontalpodautoscalers.v2beta2.autoscaling 10 times
Nov 18 21:59:06.261: INFO: api flowschemas.v1beta1.flowcontrol.apiserver.k8s.io, removed in release 1.26, was accessed 254 times
api horizontalpodautoscalers.v2beta2.autoscaling, removed in release 1.26, was accessed 10 times
api prioritylevelconfigurations.v1beta1.flowcontrol.apiserver.k8s.io, removed in release 1.26, was accessed 22 times
user/system:admin accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 14 times
user/system:serviceaccount:openshift-cluster-version:default accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 224 times
user/system:serviceaccount:openshift-cluster-version:default accessed prioritylevelconfigurations.v1beta1.flowcontrol.apiserver.k8s.io 22 times
user/system:serviceaccount:openshift-kube-storage-version-migrator:kube-storage-version-migrator-sa accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 16 times
user/system:serviceaccount:openshift-monitoring:kube-state-metrics accessed horizontalpodautoscalers.v2beta2.autoscaling 10 times
[AfterEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:158
[AfterEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:159
flake: api flowschemas.v1beta1.flowcontrol.apiserver.k8s.io, removed in release 1.26, was accessed 254 times
api horizontalpodautoscalers.v2beta2.autoscaling, removed in release 1.26, was accessed 10 times
api prioritylevelconfigurations.v1beta1.flowcontrol.apiserver.k8s.io, removed in release 1.26, was accessed 22 times
user/system:admin accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 14 times
user/system:serviceaccount:openshift-cluster-version:default accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 224 times
user/system:serviceaccount:openshift-cluster-version:default accessed prioritylevelconfigurations.v1beta1.flowcontrol.apiserver.k8s.io 22 times
user/system:serviceaccount:openshift-kube-storage-version-migrator:kube-storage-version-migrator-sa accessed flowschemas.v1beta1.flowcontrol.apiserver.k8s.io 16 times
user/system:serviceaccount:openshift-monitoring:kube-state-metrics accessed horizontalpodautoscalers.v2beta2.autoscaling 10 times
Ginkgo exit error 4: exit with code 4

This is required to unblock https://github.com/openshift/origin/pull/27561

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/272

Story CONSOLE-1523: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-14491: Update Jenkins to use 4.13 images

View the Description View the linked PRs

Description of problem:

Update to use Jenkins 4.13 images to address CVEs

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/502

Bug OCPBUGS-16536: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/213

Task MON-3671: update OWNERS file for configmap-reloader

View the linked PRs

https://github.com/openshift/configmap-reload/pull/55

Bug OCPBUGS-24913: Update 4.16 openshift-enterprise-console-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console-operator/pull/823

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25093: Update 4.16 csi-livenessprobe-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/56

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/56

Bug OCPBUGS-30488: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/798

Story CONSOLE-2523: Add resource requests to cluster and project overview

View the Description View the linked PRs

In 4.7, we have the queries to calculate the dotted line for projects and overall cluster:

sum by (exported_namespace) (kube_pod_resource_request{resource="cpu"})
sum by (exported_namespace) (kube_pod_resource_request{resource="memory"})

We should add the dotted lines to the cluster and project dashboards showing total requests for CPU and memory.

https://github.com/openshift/console/pull/7786

Bug OCPBUGS-21798: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/223

Bug OCPBUGS-5825: system:openshift:kube-controller-manager:gce-cloud-provider referencing non existing serviceAccount

View the Description View the linked PRs

Description of problem:

$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.20 True False 43h Cluster version is 4.11.20

$ oc get clusterrolebinding system:openshift:kube-controller-manager:gce-cloud-provider -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2023-01-11T13:16:47Z"
  name: system:openshift:kube-controller-manager:gce-cloud-provider
  resourceVersion: "6079"
  uid: 82a81635-4535-4a51-ab83-d2a1a5b9a473
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:kube-controller-manager:gce-cloud-provider
subjects:
- kind: ServiceAccount
  name: cloud-provider
  namespace: kube-system

$ oc get sa cloud-provider -n kube-system
Error from server (NotFound): serviceaccounts "cloud-provider" not found

The serviceAccount cloud-provider does not exist. Neither in kube-system nor in any other namespace.

It's therefore not clear what this ClusterRoleBinding does, what use-case it does fulfill and why it references non existing serviceAccount.

From Security point of view, it's recommended to remove non serviceAccounts from ClusterRoleBindings as a potential attacker could abuse the current state by creating the necessary serviceAccount and gain undesired permissions.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4 (all version from what we have found)

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4
2. Run oc get clusterrolebinding system:openshift:kube-controller-manager:gce-cloud-provider -o yaml

Actual results:

$ oc get clusterrolebinding system:openshift:kube-controller-manager:gce-cloud-provider -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2023-01-11T13:16:47Z"
  name: system:openshift:kube-controller-manager:gce-cloud-provider
  resourceVersion: "6079"
  uid: 82a81635-4535-4a51-ab83-d2a1a5b9a473
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:kube-controller-manager:gce-cloud-provider
subjects:
- kind: ServiceAccount
  name: cloud-provider
  namespace: kube-system

$ oc get sa cloud-provider -n kube-system
Error from server (NotFound): serviceaccounts "cloud-provider" not found

Expected results:

The serviceAccount called cloud-provider to exist or otherwise the ClusterRoleBinding to be removed.

Additional info:

Finding related to a Security review done on the OpenShift Container Platform 4 - Platform

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/778

Bug OCPBUGS-16403: Update Cluster Sample Operator dependencies and libraries for OCP 4.14

View the Description View the linked PRs

Description of problem:

We need to update the operator to be synced with the K8 api version used by OCP 4.14. We also need to sync our samples libraries with latest available libraries. Any deprecated libraries should be removed as well.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/511

Bug OCPBUGS-1617: Remove unused node.kubernetes.io/not-reachable toleration

View the Description View the linked PRs

Poking around in 4.12.0-ec.3 CI:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-serial/1570267348072402944/artifacts/e2e-aws-sdn-serial/gather-extra/artifacts/pods.json | jq -r '.items[] | .metadata.name as $n | .spec.tolerations[] | select(.key == "node.kubernetes.io/not-reachable") | $n'
console-7fffd859d6-j784q
console-7fffd859d6-m8fgj
downloads-8449c756f8-47ppj
downloads-8449c756f8-b7w26
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-serial/1570267348072402944/artifacts/e2e-aws-sdn-serial/gather-extra/artifacts/pods.json | jq -r '.items[] | .metadata.name as $n | select($n | startswith("console-") or startswith("downloads-")).spec.tolerations[] | $n + " " + tostring' | grep -v console-operator
console-7fffd859d6-j784q {"effect":"NoSchedule","key":"node-role.kubernetes.io/master","operator":"Exists"}
console-7fffd859d6-j784q {"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":120}
console-7fffd859d6-j784q {"effect":"NoExecute","key":"node.kubernetes.io/not-reachable","operator":"Exists","tolerationSeconds":120}
console-7fffd859d6-j784q {"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300}
console-7fffd859d6-j784q {"effect":"NoSchedule","key":"node.kubernetes.io/memory-pressure","operator":"Exists"}
console-7fffd859d6-m8fgj {"effect":"NoSchedule","key":"node-role.kubernetes.io/master","operator":"Exists"}
console-7fffd859d6-m8fgj {"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":120}
console-7fffd859d6-m8fgj {"effect":"NoExecute","key":"node.kubernetes.io/not-reachable","operator":"Exists","tolerationSeconds":120}
console-7fffd859d6-m8fgj {"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300}
console-7fffd859d6-m8fgj {"effect":"NoSchedule","key":"node.kubernetes.io/memory-pressure","operator":"Exists"}
downloads-8449c756f8-47ppj {"effect":"NoSchedule","key":"node-role.kubernetes.io/master","operator":"Exists"}
downloads-8449c756f8-47ppj {"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":120}
downloads-8449c756f8-47ppj {"effect":"NoExecute","key":"node.kubernetes.io/not-reachable","operator":"Exists","tolerationSeconds":120}
downloads-8449c756f8-47ppj {"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300}
downloads-8449c756f8-47ppj {"effect":"NoSchedule","key":"node.kubernetes.io/memory-pressure","operator":"Exists"}
downloads-8449c756f8-b7w26 {"effect":"NoSchedule","key":"node-role.kubernetes.io/master","operator":"Exists"}
downloads-8449c756f8-b7w26 {"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":120}
downloads-8449c756f8-b7w26 {"effect":"NoExecute","key":"node.kubernetes.io/not-reachable","operator":"Exists","tolerationSeconds":120}
downloads-8449c756f8-b7w26 {"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300}
downloads-8449c756f8-b7w26 {"effect":"NoSchedule","key":"node.kubernetes.io/memory-pressure","operator":"Exists"}

node.kubernetes.io/unreachable is a well-known taint. But I haven't noticed node.kubernetes.io/not-reachable before. It seems like these console operands are the only pods to mention it. And it seems to have entered the console in co#224 without much motivational context (but I may just have missed finding a thread somewhere where the motivation was discussed).

I don't think the toleration will cause any problems, but to avoid use confusion (as I experienced before working up this ticket), it is probably worth removing node.kubernetes.io/not-reachable, both from new clusters created after the fix lands, and in old clusters born before the fix and updated into the fixed release. Both of those use-cases should be available in presubmit CI for console-operator changes.

https://github.com/openshift/console-operator/pull/696

Bug OCPBUGS-498: Update console operator vendor with latest openshift/api

View the Description View the linked PRs

https://github.com/openshift/api/pull/1213 and https://github.com/openshift/api/pull/1202 PR's have been merged but the latest 4.12 OCP clusters do not show the changes .

According to https://github.com/openshift/console-operator/blob/bd2a7c9077ccf214dd8a725a7660e86d96e045b0/Dockerfile.rhel7#L18-L23, we need to vendor the openshift/api in console operator repo so that the latest manifests get's applied.

https://github.com/openshift/console-operator/pull/674

Bug OCPBUGS-26017: Update 4.16 ose-multus-whereabouts-ipam-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/whereabouts-cni/pull/223

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/whereabouts-cni/pull/223

Bug OCPBUGS-4009: Cluster operator should report ConsolePlugin as a related resource

View the Description View the linked PRs

Description of problem:

Since the operator watches plugins to enable dynamic plugins, it should list that resource under `status.relatedObjects` in its ClusterOperator.

Additional info:

Migrated from bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044588

https://github.com/openshift/console-operator/pull/706

Bug OCPBUGS-4273: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/148

Story STOR-1432: Allow separate images to the specified for Hosted Control Plane components

View the Description View the linked PRs

Hypershift needs to be able to specify a different release payload for control plane components without redeploying anything in the hosted cluster.

csi-driver-node DaemonSet pods in the hosted cluster and the csi-driver-controller Deployment that runs in the control plane both use the AWS_EBS_DRIVER_IMAGE and LIVENESS_PROBE_IMAGE

https://github.com/openshift/hypershift/blob/fc42313fc93125799f7eba5361190043cc2f6561/control-plane-operator/controllers/hostedcontrolplane/storage/envreplace.go#L9-L48

We need a way to specify these images separately for csi-driver-node and csi-driver-controller.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/252

Bug OCPBUGS-15499: Console-operator is hotlooping

View the Description View the linked PRs

Description of problem:

Console-operator's config file gets updated every couple of seconds, where only the `resourceVersion` field get s changed.

Version-Release number of selected component (if applicable):

4.14-ec-2

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/774

Bug OCPBUGS-24846: Update 4.16 ose-csi-external-resizer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-resizer/pull/152

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-resizer/pull/152

Bug OCPBUGS-18498: spec.containers.image is empty when use 'oc new-app' created deploy when build/deploymentconfig are not installed

View the Description View the linked PRs

Description of problem:

If not installed capability operator build and deploymentconfig, when use `oc new-app registry.redhat.io/<namespace>/<image>:<tag>` , the created deployment emptied spec.containers[0].image. The deploy will fail to start pod.

Version-Release number of selected component (if applicable):

oc version
Client Version: 4.14.0-0.nightly-2023-08-22-221456
Kustomize Version: v5.0.1
Server Version: 4.14.0-0.nightly-2023-09-02-132842
Kubernetes Version: v1.27.4+2c83a9f

How reproducible:

Always

Steps to Reproduce:

1. Installed cluster without build/deploymentconfig function
Set "baselineCapabilitySet: None" in install-config
2.Create a deploy using 'new-app' cmd
oc new-app registry.redhat.io/ubi8/httpd-24:latest
3.

Actual results:

2.
$oc new-app registry.redhat.io/ubi8/httpd-24:latest
--> Found container image c412709 (11 days old) from registry.redhat.io for "registry.redhat.io/ubi8/httpd-24:latest"    Apache httpd 2.4
    ----------------
    Apache httpd 2.4 available as container, is a powerful, efficient, and extensible web server. Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Virtual hosting allows one Apache installation to serve many different Web sites.    Tags: builder, httpd, httpd-24    * An image stream tag will be created as "httpd-24:latest" that will track this image--> Creating resources ...
    imagestream.image.openshift.io "httpd-24" created
    deployment.apps "httpd-24" created
    service "httpd-24" created
--> Success
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose service/httpd-24'
    Run 'oc status' to view your app

3. oc get deploy -o yaml
 apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "1"
      image.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"httpd-24:latest"},"fieldPath":"spec.template.spec.containers[?(@.name==\"httpd-24\")].image"}]'
      openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: "2023-09-04T07:44:01Z"
    generation: 1
    labels:
      app: httpd-24
      app.kubernetes.io/component: httpd-24
      app.kubernetes.io/instance: httpd-24
    name: httpd-24
    namespace: wxg
    resourceVersion: "115441"
    uid: 909d0c4e-180c-4f88-8fb5-93c927839903
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        deployment: httpd-24
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        annotations:
          openshift.io/generated-by: OpenShiftNewApp
        creationTimestamp: null
        labels:
          deployment: httpd-24
      spec:
        containers:
        - image: ' '
          imagePullPolicy: IfNotPresent
          name: httpd-24
          ports:
          - containerPort: 8080
            protocol: TCP
          - containerPort: 8443
            protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Created new replica set "httpd-24-7f6b55cc85"
      reason: NewReplicaSetCreated
      status: "True"
      type: Progressing
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: 'Pod "httpd-24-7f6b55cc85-pvvgt" is invalid: spec.containers[0].image:
        Invalid value: " ": must not have leading or trailing whitespace'
      reason: FailedCreate
      status: "True"
      type: ReplicaFailure
    observedGeneration: 1
    unavailableReplicas: 1
kind: List
metadata:

Expected results:

Should set spec.containers[0].image to registry.redhat.io/ubi8/httpd-24:latest

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/299

Bug OCPBUGS-24815: Update 4.16 csi-livenessprobe-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/55

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/55

Task MON-3673: Bump downstream Prometheus to v2.49.1

View the linked PRs

Bug OCPBUGS-10568: migrate to using Lease for leader election

View the Description View the linked PRs

Description of problem:

library-go should use Lease for leader election by default. 
In 4.10 we switched from configmaps to configmapsleases, now we can switch to leases

change library-go to use lease by default, we already have an open pr for that: https://github.com/openshift/library-go/pull/1448 

once the pr merges, we should revendor library-go for:
- kas operator
- oas operator
- etcd operator
- kcm operator
- openshift controller manager operator
- scheduler operator
- auth operator
- cluster policy controller

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-19157: Update 4.15 ose-aws-ebs-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver-operator/pull/268

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/268

Bug OCPBUGS-22956: When build capability is disabled, ConfigObserver controller does not run

View the Description View the linked PRs

Description of problem:

ConfigObserver controller waits until the all given informers are marked as synced including the build informer. However, when build capability is disabled, that causes ConfigObserver's blockage and never runs.

This is likely only happening on 4.15 because capability watching mechanism was bound to ConfigObserver in 4.15.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Launch cluster-bot cluster via "launch 4.15.0-0.nightly-2023-11-05-192858,openshift/cluster-openshift-controller-manager-operator#315 no-capabilities"

Steps to Reproduce:

1.
2.
3.

Actual results:

ConfigObserver controller stuck in failure

Expected results:

ConfigObserver controller runs and successfully clear all deployer service accounts when deploymentconfig capability is disabled.

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/315

Task MON-1175: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/66

Bug OCPBUGS-28982: oauthclients degraded condition never gets removed

View the Description View the linked PRs

Description of problem:

oauthclients degraded condition that never gets removed, meaning once its set due to an issue on a cluster, it wont be unset

Version-Release number of selected component (if applicable):

How reproducible:

Sporadically, when the AuthStatusHandlerFailedApply condition is set on the console operator status conditions.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/855

Bug OCPBUGS-855: When setting allowedRegistries urls the openshift-samples operator is degraded

View the Description View the linked PRs

Description of problem:

When setting the allowedregistries like the example below, the openshift-samples operator is degraded:

oc get image.config.openshift.io/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Image
metadata:
  annotations:
    release.openshift.io/create-only: "true"
  creationTimestamp: "2020-12-16T15:48:20Z"
  generation: 2
  name: cluster
  resourceVersion: "422284920"
  uid: d406d5a0-c452-4a84-b6b3-763abb51d7a5
spec:
  additionalTrustedCA:
    name: registry-ca
  allowedRegistriesForImport:
  - domainName: quay.io
    insecure: false
  - domainName: registry.redhat.io
    insecure: false
  - domainName: registry.access.redhat.com
    insecure: false
  - domainName: registry.redhat.io/redhat/redhat-operator-index
    insecure: true
  - domainName: registry.redhat.io/redhat/redhat-marketplace-index
    insecure: true
  - domainName: registry.redhat.io/redhat/certified-operator-index
    insecure: true
  - domainName: registry.redhat.io/redhat/community-operator-index
    insecure: true
  registrySources:
    allowedRegistries:
    - quay.io
    - registry.redhat.io
    - registry.rijksapps.nl
    - registry.access.redhat.com
    - registry.redhat.io/redhat/redhat-operator-index
    - registry.redhat.io/redhat/redhat-marketplace-index
    - registry.redhat.io/redhat/certified-operator-index
    - registry.redhat.io/redhat/community-operator-index


oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.21   True        False         False      5d13h   
baremetal                                  4.10.21   True        False         False      450d    
cloud-controller-manager                   4.10.21   True        False         False      94d     
cloud-credential                           4.10.21   True        False         False      624d    
cluster-autoscaler                         4.10.21   True        False         False      624d    
config-operator                            4.10.21   True        False         False      624d    
console                                    4.10.21   True        False         False      42d     
csi-snapshot-controller                    4.10.21   True        False         False      31d     
dns                                        4.10.21   True        False         False      217d    
etcd                                       4.10.21   True        False         False      624d    
image-registry                             4.10.21   True        False         False      94d     
ingress                                    4.10.21   True        False         False      94d     
insights                                   4.10.21   True        False         False      104s    
kube-apiserver                             4.10.21   True        False         False      624d    
kube-controller-manager                    4.10.21   True        False         False      624d    
kube-scheduler                             4.10.21   True        False         False      624d    
kube-storage-version-migrator              4.10.21   True        False         False      31d     
machine-api                                4.10.21   True        False         False      624d    
machine-approver                           4.10.21   True        False         False      624d    
machine-config                             4.10.21   True        False         False      17d     
marketplace                                4.10.21   True        False         False      258d    
monitoring                                 4.10.21   True        False         False      161d    
network                                    4.10.21   True        False         False      624d    
node-tuning                                4.10.21   True        False         False      31d     
openshift-apiserver                        4.10.21   True        False         False      42d     
openshift-controller-manager               4.10.21   True        False         False      22d     
openshift-samples                          4.10.21   True        True          True       31d     Samples installation in error at 4.10.21: &errors.errorString{s:"global openshift image configuration prevents the creation of imagestreams using the registry "}
operator-lifecycle-manager                 4.10.21   True        False         False      624d    
operator-lifecycle-manager-catalog         4.10.21   True        False         False      624d    
operator-lifecycle-manager-packageserver   4.10.21   True        False         False      31d     
service-ca                                 4.10.21   True        False         False      624d    
storage                                    4.10.21   True        False         False      113d  


After applying the fix as described here(  https://access.redhat.com/solutions/6547281 ) it is resolved:
oc patch configs.samples.operator.openshift.io cluster --type merge --patch '{"spec": {"samplesRegistry": "registry.redhat.io"}}'

But according the the BZ this should be fixed in 4.10.3 https://bugzilla.redhat.com/show_bug.cgi?id=2027745 but the issue is still occur in our 4.10.21 cluster:

oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.21   True        False         31d     Error while reconciling 4.10.21: the cluster operator openshift-samples is degraded

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/487

Bug OCPBUGS-21781: [gcp] please clarify what's wrong with the userLabel key "a"

View the Description View the linked PRs

Description of problem:

setting key beging "a" for platform.gcp.userLabels got error message which doesn't explain what's wrong exactly

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-15-164249

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit the install-config.yaml to insert userLabels settings (see [1])
3. "create cluster"

Actual results:

Error message shows up telling the label key "a" is invalid.

Expected results:

There should be no error, according to the statement "A label key can have a maximum of 63 characters and cannot be empty. Label must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-`".

Additional info:

$ openshift-install version
openshift-install 4.14.0-0.nightly-2023-10-15-164249
built from commit 359866f9f6d8c86e566b0aea7506dad22f59d860
release image registry.ci.openshift.org/ocp/release@sha256:3c5976a39479e11395334f1705dbd3b56580cd1dcbd514a34d9c796b0a0d9f8e
release architecture amd64
$ openshift-install explain installconfig.platform.gcp.userLabels
KIND:     InstallConfig
VERSION:  v1

RESOURCE: <[]object>
  userLabels has additional keys and values that the installer will add as labels to all resources that it creates on GCP. Resources created by the cluster itself may not include these labels. This is a TechPreview feature and requires setting CustomNoUpgrade featureSet with GCPLabelsTags featureGate enabled or TechPreviewNoUpgrade featureSet to configure labels.

FIELDS:
    key <string> -required-
      key is the key part of the label. A label key can have a maximum of 63 characters and cannot be empty. Label must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-`.    value <string> -required-
      value is the value part of the label. A label value can have a maximum of 63 characters and cannot be empty. Value must contain only lowercase letters, numeric characters, and the following special characters `_-`.

$ 

[1]
$ yq-3.3.0 r test12/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
  userLabels:
  - key: createdby
    value: installer-qe
  - key: a
    value: hello
$ yq-3.3.0 r test12/install-config.yaml featureSet
TechPreviewNoUpgrade
$ yq-3.3.0 r test12/install-config.yaml credentialsMode
Passthrough
$ openshift-install create cluster --dir test12
ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.gcp.userLabels[a]: Invalid value: "hello": label key is invalid or contains invalid characters. Label key can have a maximum of 63 characters and cannot be empty. Label key must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-` 
$

https://github.com/openshift/cluster-config-operator/pull/365

Bug OCPBUGS-24215: kube-controller-manager TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/769

Bug OCPBUGS-5275: remove unnecessary RBAC in OCM

View the Description View the linked PRs

Description of problem:

system:openshift:openshift-controller-manager:leader-locking-ingress-to-route-controller role and role-binding should not be present in openshift-route-controller-manager namespace. Not needed since the leader locking responsibility was moved to route-controller-manager which is managed by leader-locking-openshift-route-controller-manager

This was added in and used by https://github.com/openshift/openshift-controller-manager/pull/230/files#diff-2ddbbe8d5a13b855786852e6dc0c6213953315fd6e6b813b68dbdf9ffebcf112R20

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/276

Bug OCPBUGS-14638: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/294

Story BUILD-249: 4.8 samples bumps

View the Description View the linked PRs

User Story

Pull in the latest openshift/library content into the samples operator
If image eco e2e's fail, work with upstream SCL to address

Acceptance Criteria

Samples operator installs current official content in openshift/library
List of removed/EOL images is prepared for docs update

Docs Impact

List of EOL images needs to be sent to the Docs team and added to the release notes.

https://github.com/openshift/cluster-samples-operator/pull/367

Bug OCPBUGS-13579: Rebase components to k8s v0.27.*

View the Description View the linked PRs

Some repositories require bugzilla/valid-bug label present. Complement to https://issues.redhat.com/browse/WRKLDS-700.

Bug OCPBUGS-18893: pods assigned with Multus whereabouts IP get stuck in ContainerCreating state after OCP upgrading

View the Description View the linked PRs

Description of problem:

pods assigned with Multus whereabouts IP get stuck in ContainerCreating state after OCP upgrading from 4.12.15 to 4.12.22. Not sure if upgrading cause the issue or node rebooting directly cause the issue.

The error message is:
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox mypod-0-0-1-0_testproject_8c8500e1-1643-4716-8fd7-e032292c62ab_0(2baa045a1b19291769ed56bab288b60802179ff3138ffe0d16a14e78f9cb5e4f): error adding pod testproject_mypod-0-0-1-0 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [testproject/mypod-0-0-1-0/8c8500e1-1643-4716-8fd7-e032292c62ab:testproject-net-svc-kernel-bond]: error adding container to network "testproject-net-svc-kernel-bond": error at storage engine: k8s get error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

Version-Release number of selected component (if applicable):

How reproducible:

Not sure if it is reproducible

Steps to Reproduce:

1.
2.
3.

Actual results:

Pods stuck in ContainerCreating state

Expected results:

Pods creates normally

Additional info:

Customer responded deleting statefulset and recreated it didn't work.
The pods can be created normally after deleting corresponding ippools.whereabouts.cni.cncf.io manually
$ oc delete ippools.whereabouts.cni.cncf.io 172.21.24.0-22 -n openshift-multus

https://github.com/openshift/whereabouts-cni/pull/196

Bug OCPBUGS-20164: builds.config.openshift.io CRD is available in a cluster with baselineCapabilitySet None

View the Description View the linked PRs

Description of problem:

a cluster installed with baselineCapabilitySet: None have build available while the build capability is disabled


❯ oc get -o json clusterversion version | jq '.spec.capabilities'                      
{
  "baselineCapabilitySet": "None"
}

❯ oc get -o json clusterversion version | jq '.status.capabilities.enabledCapabilities'
null

❯ oc get build -A                   
NAME      AGE
cluster   5h23m

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-04-143709

How reproducible:

100%

Steps to Reproduce:

1.install a cluster with baselineCapabilitySet: None

Actual results:

❯ oc get build -A                   
NAME      AGE
cluster   5h23m

Expected results:

❯ oc get -A build
error: the server doesn't have a resource type "build"

slack thread with more info: https://redhat-internal.slack.com/archives/CF8SMALS1/p1696527133380269

Bug OCPBUGS-24108: Update 4.15 baremetal-machine-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/205

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/205

Task SPLAT-1099: [vsphere] introduce feature gate for vSphere static IPs

View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/323

Bug OCPBUGS-12996: Prometheus UI reports Error opening React index.html: open web/ui/static/react/index.html: no such file or directory

View the Description View the linked PRs

Description of problem:
Same for OCP 4.14.

In OCP 4.13 when trying to reach prometheus UI  via port-forward, e.g. `oc port-forward prometheus-k8s-0` the UI url($HOST:9090/graph) is returning `Error opening React index.html: open web/ui/static/react/index.html: no such file or directory`

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-01-24-061922

How reproducible:

100%

Steps to Reproduce:

1.  oc -n openshift-monitoring port-forward prometheus-k8s-0 9090:9090 --address='0.0.0.0' 

2. curl http://localhost:9090/graph

Actual results:

Error opening React index.html: open web/ui/static/react/index.html: no such file or directory

Expected results:

Prometheus UI is loaded

Additional info:

 The UI loads fine when following the same steps in 4.12.

https://github.com/openshift/prometheus/pull/162

Bug OCPBUGS-29972: ART requests updates to 4.16 image csi-livenessprobe-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/59

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/62

Bug OCPBUGS-17424: console-operator should not panic when filtering tombstone informer events

View the Description View the linked PRs

Description of problem:

console-operator may panic when IncludeNamesFilter receives an object from a shared informer event of type cache.DeletedFinalStateUnknown.

Example job with panic: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1687876857824808960

Specific log that shows the full stack trace: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1687876857824808960/artifacts/e2e-aws-sdn-serial/gather-extra/artifacts/pods/openshift-console-operator_console-operator-748d7c6cdd-vwxmx_console-operator.log

Version-Release number of selected component (if applicable):

How reproducible:

Sporadically

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-24745: Update 4.16 golang-github-prometheus-prometheus-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus/pull/187

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus/pull/187

Bug OCPBUGS-18847: Update 4.15 ose-multus-whereabouts-ipam-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/whereabouts-cni/pull/192

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/whereabouts-cni/pull/192

Bug OCPBUGS-19132: Update 4.15 csi-livenessprobe image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/47

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/47

Bug OCPBUGS-16209: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1777

Bug OCPBUGS-16686: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1917

Bug OCPBUGS-2862: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-aws/pull/451

Bug OCPBUGS-1708: console.openshift.io/use-i18n false in v1alpha API is converted to "" in the v1 APi, which is not a valid value for the enum type declared in the code.

View the Description View the linked PRs

Description of problem:

console.openshift.io/use-i18n false in v1alpha API is converted to "" in the v1 APi, which is not a valid value for the enum type declared in the code.

Version-Release number of selected component (if applicable):

 4.12.0-0.nightly-2022-09-25-071630

How reproducible:

Always

Steps to Reproduce:

1. Load a dynamic plugin with v1alpha API console.openshift.io/use-i18n set to 'false'
2. In the v1 API the {"spec":{"i18n":{"loadType":""}}} loadType is set to empty string, which is not a valid value defined here: https://github.com/jhadvig/api/blob/22d69793277ffeb618d642724515f249262959a5/console/v1/types_console_plugin.go#L46
https://github.com/openshift/api/pull/1186/files#

Actual results:

{"spec":{"i18n":{"loadType":""}}}

Expected results:

{"spec":{"i18n":{"loadType":"Lazy"}}}

Additional info:

https://github.com/openshift/console-operator/pull/684

Bug OCPBUGS-2864: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/266

Bug OCPBUGS-30477: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/213

Bug OCPBUGS-8512: WebhookConfiguration caBundle injection is incorrect when some webhooks already confiugred

View the Description View the linked PRs

Description of problem:

WebhookConfiguration caBundle injection is incorrect when some webhooks already configured with caBundle.

Behavior seems to be that the first n number of webhooks in `.webhooks` array have caBundle injected, where n is the number of webhooks that do not have caBundle set.

Version-Release number of selected component (if applicable):

How reproducible

Steps to Reproduce:

1. Create a validatingwebhookconfigurations or mutatingwebhookconfigurations with `service.beta.openshift.io/inject-cabundle: "true"` annotation.

2. oc edit validatingwebhookconfigurations (or oc edit mutatingwebhookconfigurations)

3. Add a new webhook to the end of the list `.webhooks`. It will not have caBundle set manually as service-ca should inject it. 

4. Observe new webhook does not get caBundle injected.

Note: it is important in step. 3 that the new webhook is added to the end of the list.

Actual results:

Only the first n webhooks have caBundle injected where n is the number of webhooks without caBundle set.

Expected results:

All webhooks have caBundle injected when they do not have it set.

Additional info:

Open PR here: https://github.com/openshift/service-ca-operator/pull/207

The issue seems to be a mistake with go-lang for range syntax where "i" is the index of desired "i" to update.  

tl dr; code should update the value of the int in the array, not the index of the int in the array.

https://github.com/openshift/service-ca-operator/pull/219

Bug OCPBUGS-18932: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/302

Story CONSOLE-2464: Collapse managedFields by default in YAML

View the Description View the linked PRs

Various resources have a large `metadata.mangedFields` stanza that isn't important for users. We should see if we can collapse that section by default in the YAML editor.

I didn't see an obvious API in Monaco editor for doing this, but I could have missed it. We can reach out to the dev tools team who own the YAML language server to see if it's possible.

cc Ali Mobrem Eric Paris

https://github.com/openshift/console/pull/7144

Bug OCPBUGS-12709: --external-cloud-volume-plugin for out-of tree providers

View the Description View the linked PRs

Due to removal of in-tree AWS provider https://github.com/kubernetes/kubernetes/pull/115838 we need to ensure that KCM is setting --external-cloud-volume-plugin flag accordingly, especially that the CSI migration was GA-ed in 4.12/1.25.

The original PR that fixed this (https://github.com/openshift/cluster-kube-controller-manager-operator/pull/721) got reverted by mistake. We need to bring it back to unblock the kube rebase.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/729

Bug OCPRHV-783: High performance worker nodes are created with wrong configuration

View the Description

When installing OCP cluster with worker nodes VM type specified as high performance, some of the configuration settings of said VMs do not match the configuration settings a high performance VM should have.

Specific configurations that do not match are described in subtasks.

Default configuration settings of high performance VMs:
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/virtual_machine_management_guide/index?extIdCarryOver=true&sc_cid=701f2000001Css5AAC#Configuring_High_Performance_Virtual_Machines_Templates_and_Pools

Sub-task OCPRHV-788: Manual and automatic migration is enabled in high performance worker nodes

View the Description View the linked PRs

When installing OCP cluster with worker nodes VM type specified as high performance, manual and automatic migration is enabled in the said VMs.
However, high performance worker VMs are created with default values of the engine, so only manual migration should be enabled.

How reproducible: 100%

How to reproduce:

1. Create install-config.yaml with a vmType field and set it to high performance, i.e.:

apiVersion: v1
baseDomain: basedomain.com
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    ovirt:
      affinityGroupsNames: []
      vmType: high_performance
  replicas: 2
...

2. Run installation

./openshift-install create cluster --dir=resources --log-level=debug

3. Check worker VM's configuration in the RHV webconsole.

Expected:
Only manual migration (under Host) should be enabled.

Actual:
Manual and automatic migration is enabled.

https://github.com/openshift/cluster-api-provider-ovirt/pull/144

Bug OCPBUGS-4166: Update Cluster Sample Operator dependencies and libraries for OCP 4.13

View the Description View the linked PRs

Description of problem:

This is wrapper bug for library sync of 4.12

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/474

Bug OCPBUGS-15827: console operator degraded following service CA rotation by deleting the signing-key

View the Description View the linked PRs

Description of problem:

following signing-key deletion, there is a service CA rotation process which might temporary disrupt cluster operators, but eventually all should regenerate. in recent 4.14 nighties however this is not the case anymore. following a deletion of the signing-key using
oc delete secret/signing-key -n openshift-service-ca
operators will progress for a while, but eventually console as well as monitoring will end up in available=false and degraded=true, which is only recoverable by manually deleting all the pods in the cluster.

console                                    4.14.0-0.nightly-2023-06-30-131338   False       False         True       159m    RouteHealthAvailable: route not yet available, https://console-openshift-console.apps.evakhoni-0412.qe.gcp.devcluster.openshift.com returns '503 Service Unavailable'

monitoring                                 4.14.0-0.nightly-2023-06-30-131338   False       True          True       161m    reconciling Console Plugin failed: retrieving ConsolePlugin object failed: conversion webhook for console.openshift.io/v1alpha1, Kind=ConsolePlugin failed: Post "https://webhook.openshift-console-operator.svc:9443/crdconvert?timeout=30s": tls: failed to verify certificate: x509: certificate signed by unknown authority

same deletion in the previous versions of 4.14-ec.2 or earlier doesn't have this issue, and able to recover eventually without any manual pod deletion. I believe this to be regression bug.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-30-131338 and other recent 4.14 nightlies

How reproducible:

100%

Steps to Reproduce:

1.oc delete secret/signing-key -n openshift-service-ca
2. wait at least 30+ minutes
3. observe oc get co

Actual results:

console and monitoring degraded and not recovering

Expected results:

able to recover eventually as in previous versions

Additional info:

using manual deletion of all pods it is possible to recover the cluster from this state as follows:
for I in $(oc get ns -o jsonpath='{range .items[*]} {.metadata.name}{"\n"} {end}'); \
      do oc delete pods --all -n $I; \
      sleep 1; \
      done

must-gather:
https://drive.google.com/file/d/1Y3RrYZlz0EncG-Iqt8USFPsTd-br36Zt/view?usp=sharing

Bug OCPBUGS-21744: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-config-operator/pull/366

Bug SO-119: Resync OKD MariaDB and ruby imagestreams from library

View the Description View the linked PRs

Description of problem:

OKD samples have synced with invalid MariaDB ref

https://github.com/openshift/cluster-samples-operator/pull/525

Bug OCPBUGS-14824: CSI Driver Operators should not update the default storageclass annotation back after customers set the default storageclass annotation to false

View the Description View the linked PRs

Description of problem:

[AWS EBS CSI Driver Operator] should not update the default storageclass annotation back after customers remove the default storageclass annotation

Version-Release number of selected component (if applicable):

Server Version: 4.14.0-0.nightly-2023-06-08-102710

How reproducible:

Always

Steps to Reproduce:

1. Install an aws openshift cluster
2. Create 6 extra storage classes(any sc is ok)
3. Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=false and check all the sc are set as undefault 
4. Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=true 
5. loop step4-5 several times

Actual results:

Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=false, sometimes recovered by the driver operator

Expected results:

Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=false should always succeed

Additional info:

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/247

Bug OCPBUGS-3109: Change text colour for ConsoleNotification that notifies user that the cluster is being

View the Description View the linked PRs

Currently the ConsoleNotification CR thats responsible for notifying user that the cluster is being upgraded has an accessibility issue where we are surfacing white text on yellow background. We should be using black text colour for in this case.

https://github.com/openshift/console-operator/pull/694

Bug OCPBUGS-11298: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/751

Bug OCPBUGS-19189: Update 4.15 ose-cluster-update-keys image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/51

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/51

Bug OCPBUGS-24190: when baselinecapabiliity set is set to None, still see SA with name `deployer-controller` being present in the cluster

View the Description View the linked PRs

When baselineCapabilitySet is set to None, still see an SA with name `deployer-controller` in the cluster.

steps to Reproduce:

=================

1. Install 4.15 cluster with baselineCapabilitySet to None

2. Run command `oc get sa -A | grep deployer`

Actual Results:

================

[knarra@knarra openshift-tests-private]$ oc get sa -A | grep deployer
openshift-infra deployer-controller 0 63m

Expected Results:

==================

No SA related to deployer should be returned

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/320

Bug OCPBUGS-24720: Update 4.16 ose-multus-whereabouts-ipam-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/whereabouts-cni/pull/212

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-12767: Installation failed with setting: featureSet: LatencySensitive or featureSet: CustomNoUpgrade

View the Description View the linked PRs

Description of problem:

Installation failed when setting featureSet: LatencySensitive or featureSet: CustomNoUpgrade.
When setting featureSet: CustomNoUpgrade in install-config and create cluster.See below error info:
[core@bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         github.com/spf13/cobra@v1.6.0/command.go:968
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]: k8s.io/component-base/cli.run(0xc00025c300)
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         k8s.io/component-base@v0.26.1/cli/run.go:146 +0x317
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]: k8s.io/component-base/cli.Run(0x2ce59e8?)
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         k8s.io/component-base@v0.26.1/cli/run.go:46 +0x1d
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]: main.main()
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         github.com/openshift/cluster-kube-controller-manager-operator/cmd/cluster-kube-controller-manager-operator/main.go:24 +0x2c
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Failed with result 'exit-code'.
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Consumed 1.935s CPU time.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 343.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: Stopped Bootstrap a Kubernetes cluster.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Consumed 1.935s CPU time.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: Started Bootstrap a Kubernetes cluster.
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670489]: Rendering Kubernetes Controller Manager core manifests...
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: panic: interface conversion: interface {} is nil, not []interface {}
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: goroutine 1 [running]:
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/targetconfigcontroller.GetKubeControllerManagerArgs(0xc000746100?)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/targetconfigcontroller/targetconfigcontroller.go:696 +0x379
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render.(*renderOpts).Run(0xc0008d22c0)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render/render.go:269 +0x85c
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render.NewRenderCommand.func1.1(0x0?)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render/render.go:48 +0x32
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render.NewRenderCommand.func1(0xc000bee600?, {0x285dffa?, 0x8?, 0x8?})
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render/render.go:58 +0xc8
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/spf13/cobra.(*Command).execute(0xc000bee600, {0xc00071cb00, 0x8, 0x8})
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/spf13/cobra@v1.6.0/command.go:920 +0x847
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000bee000)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/spf13/cobra@v1.6.0/command.go:1040 +0x3bd
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/spf13/cobra.(*Command).Execute(...)


When setting featureSet: LatencySensitive in install-config and create cluster.See below error info:
[core@bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: "cluster-infrastructure-02-config.yml": failed to create infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: Failed to create "cluster-infrastructure-02-config.yml" infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: [#1105] failed to create some manifests:
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: "cluster-infrastructure-02-config.yml": failed to create infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: Failed to create "cluster-infrastructure-02-config.yml" infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource

Version-Release number of selected component (if applicable):

OCP version: 4.13.0-0.nightly-2023-04-21-084440

How reproducible:

always

Steps to Reproduce:

1.Create install-config.yaml like below(LatencySensitive)
  apiVersion: v1
  controlPlane:
    architecture: amd64
    hyperthreading: Enabled
    name: master
    replicas: 3
  compute:
  - architecture: amd64
    hyperthreading: Enabled
    name: worker
   replicas: 2
  metadata:
    name: wwei-426h
  platform:
   none: {}
  pullSecret: xxxxx
  featureSet: LatencySensitive
  networking:
    clusterNetwork:
    - cidr: xxxxx
      hostPrefix: 23
    serviceNetwork:
    - xxxxx
    networkType: OpenShiftSDN
  publish: External
  baseDomain: xxxxxx
  sshKey: xxxxxxx

2.Then continue to install the cluster:
openshift-install create cluster --dir <install_folder> --log-level debug

3.Create install-config.yaml like below(CustomNoUpgrade):
  apiVersion: v1
  controlPlane:
    architecture: amd64
    hyperthreading: Enabled
    name: master
    replicas: 3
  compute:
  - architecture: amd64
    hyperthreading: Enabled
    name: worker
   replicas: 2
  metadata:
    name: wwei-426h
  platform:
   none: {}
  pullSecret: xxxxx
  featureSet: CustomNoUpgrade
  networking:
    clusterNetwork:
    - cidr: xxxxx
      hostPrefix: 23
    serviceNetwork:
    - xxxxx
    networkType: OpenShiftSDN
  publish: External
  baseDomain: xxxxxx
  sshKey: xxxxxxx

4.Then continue to install the cluster:
openshift-install create cluster --dir <install_folder> --log-level debug

Actual results:

Installation failed.

Expected results:

Installation succeeded.

Additional info:

log-bundle can get from below link : https://drive.google.com/drive/folders/1kg1EeYR6ApWXbeRZTiM4DV205nwMfSQv?usp=sharing

https://github.com/openshift/cluster-config-operator/pull/320

Bug OCPBUGS-24115: Update 4.15 ose-cluster-update-keys-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/52

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/52

Bug OCPBUGS-24787: Update 4.16 ose-cluster-samples-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-samples-operator/pull/527

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-5059: duplicate entry in spec.plugins will cause console panic

View the Description View the linked PRs

Description of problem:

console will have panic error when duplicate entry is set in spec.plugins

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2022-12-19-122634

How reproducible:

Always

Steps to Reproduce:

1. Create console-demo-plugin manifests
$ oc apply -f dynamic-demo-plugin/oc-manifest.yaml 
namespace/console-demo-plugin created
deployment.apps/console-demo-plugin created
service/console-demo-plugin created
consoleplugin.console.openshift.io/console-demo-plugin created 
2.Enable console-demo-plugin
$ oc patch consoles.operator.openshift.io cluster --patch '{ "spec": { "plugins": ["console-demo-plugin"] } }' --type=merge 
console.operator.openshift.io/cluster patched
3. Add a duplicate entry in spec.plugins in consoles.operator/cluster 
$ oc patch consoles.operator.openshift.io cluster --patch '{ "spec": { "plugins": ["console-demo-plugin", "console-demo-plugin"] } }' --type=merge  console.operator.openshift.io/cluster patched
$ oc get consoles.operator cluster -o json | jq .spec.plugins
[
  "console-demo-plugin",
  "console-demo-plugin"
]
4. check console pods status
$ oc get pods -n openshift-console                        
NAME                         READY   STATUS             RESTARTS      AGE
console-6bcc87c7b4-6g2cf     0/1     CrashLoopBackOff   1 (21s ago)   50s
console-6bcc87c7b4-9g6kk     0/1     CrashLoopBackOff   3 (3s ago)    50s
console-7dc78ffd78-sxvcv     1/1     Running            0             2m58s
downloads-758fc74758-9k426   1/1     Running            0             3h18m
downloads-758fc74758-k4q72   1/1     Running            0             3h21m

Actual results:

3. console pods will be in CrashLoopBackOff status
$ oc logs console-6bcc87c7b4-9g6kk -n openshift-console
W1220 06:48:37.279871       1 main.go:228] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
I1220 06:48:37.279889       1 main.go:238] The following console plugins are enabled:
I1220 06:48:37.279898       1 main.go:240]  - console-demo-plugin
I1220 06:48:37.279911       1 main.go:354] cookies are secure!
I1220 06:48:37.331802       1 server.go:607] The following console endpoints are now proxied to these services:
I1220 06:48:37.331843       1 server.go:610]  - /api/proxy/plugin/console-demo-plugin/thanos-querier/ -> https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
I1220 06:48:37.331884       1 server.go:610]  - /api/proxy/plugin/console-demo-plugin/thanos-querier/ -> https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
panic: http: multiple registrations for /api/proxy/plugin/console-demo-plugin/thanos-querier/goroutine 1 [running]:
net/http.(*ServeMux).Handle(0xc0005b6600, {0xc0005d9a40, 0x35}, {0x35aaf60?, 0xc000735260})
    /usr/lib/golang/src/net/http/server.go:2503 +0x239
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func1({0xc0005d9940?, 0x35?}, {0x35aaf60, 0xc000735260})
    /go/src/github.com/openshift/console/pkg/server/server.go:245 +0x149
github.com/openshift/console/pkg/server.(*Server).HTTPHandler(0xc000056c00)
    /go/src/github.com/openshift/console/pkg/server/server.go:621 +0x330b
main.main()
    /go/src/github.com/openshift/console/cmd/bridge/main.go:785 +0x5ff5

Expected results:

3. console pods should be running well

Additional info:

https://github.com/openshift/console-operator/pull/710

Bug OCPBUGS-25159: Update 4.16 baremetal-machine-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/207

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/207

Bug OCPBUGS-25534: Update 4.16 csi-livenessprobe-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/58

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/58

Bug OCPBUGS-29479: excessive Back-off restarting failed container console

View the Description View the linked PRs

Component Readiness has found a potential regression in [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.15
Start Time: 2024-02-08T00:00:00Z
End Time: 2024-02-14T23:59:59Z
Success Rate: 91.30%
Successes: 63
Failures: 6
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 100.00%
Successes: 735
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=ovn%20upgrade-micro%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=azure&platform=azure&sampleEndTime=2024-02-14%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-02-08%2000%3A00%3A00&testId=openshift-tests-upgrade%3A37f1600d4f8d75c47fc5f575025068d2&testName=%5Bsig-cluster-lifecycle%5D%20pathological%20event%20should%20not%20see%20excessive%20Back-off%20restarting%20failed%20containers&upgrade=upgrade-micro&upgrade=upgrade-micro&variant=standard&variant=standard

Note: When you look at the link above you will notice some of the failures mention the bare metal operator. That's being investigated as part of https://issues.redhat.com/browse/OCPBUGS-27760. There have been 3 cases in the last week where the console was in a fail loop. Here's an example:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1757637415561859072

We need help understanding why this is happening and what needs to be done to avoid it.

https://github.com/openshift/console-operator/pull/869

Bug OCPBUGS-21846: Test "static pods should start after being created" failed

View the Description View the linked PRs

Description of problem:

Recently, the passing rate for test "static pods should start after being created" has dropped significantly for some platforms: 

https://sippy.dptools.openshift.org/sippy-ng/tests/4.15/analysis?test=%5Bsig-node%5D%20static%20pods%20should%20start%20after%20being%20created&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5Bsig-node%5D%20static%20pods%20should%20start%20after%20being%20created%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D

Take a look at this example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1712803313642115072

The test failed with the following message:
{  static pod lifecycle failure - static pod: "kube-controller-manager" in namespace: "openshift-kube-controller-manager" for revision: 6 on node: "ci-op-2z99zzqd-7f99c-rfp4q-master-0" didn't show up, waited: 3m0s}

Seemingly revision 6 was never reached. But if we look at the log from kube-controller-manager-operator, it jumps from revision 5 to revision 7: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1712803313642115072/artifacts/e2e-azure-sdn-techpreview/gather-extra/artifacts/pods/openshift-kube-controller-manager-operator_kube-controller-manager-operator-7cd978d745-bcvkm_kube-controller-manager-operator.log

The log also indicates that there is a possibility of race:

W1013 12:59:17.775274       1 staticpod.go:38] revision 7 is unexpectedly already the latest available revision. This is a possible race!

This might be a static controller issue. But I am starting with kube-controller-manager component for the case. Feel free to reassign. 

Here is a slack thread related to this:
https://redhat-internal.slack.com/archives/C01CQA76KMX/p1697472297510279

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/786

Bug OCPBUGS-8752: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24162: Update 4.15 ose-cluster-kube-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/772

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/772

Bug OCPBUGS-11751: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/763

Bug OCPBUGS-15754: Bump Jenkins and Jenkins Agent Base image versions

View the Description View the linked PRs

Description of problem:

Jenkins and Jenkins Agent Base image versions needs to be updated to use the latest images to mitigate known CVEs in plugins and Jenkins versions.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/504

Bug OCPBUGS-16254: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/562

Bug OCPBUGS-27335: Installation fails with 1 master and 2 workers as the console deployment set the number of replicas based on the InfrastructureTopology rather than the ControlPlaneTopology

View the Description View the linked PRs

Description of problem:

The node selector for the console deployment requires deploying it on the master nodes, The node selector for the console deployment requires deploying it on the master nodes, while the replica count is determined by the infrastructureTopology, which primarily tracks the workers' setup.

When an OpenShift cluster is installed with a single master node and multiple workers, this leads the console deployment to request 2 replicas as infrastructureTopology is set to HighlyAvailable. Instead, ControlPlaneTopology is set to SingleReplica as expected.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

    1. Install an openshift cluster with 1 master and 2 workers

Actual results:

The installation fails as the replicas for the console deployment is set to 2.

  apiVersion: config.openshift.io/v1
  kind: Infrastructure
  metadata:
    creationTimestamp: "2024-01-18T08:34:47Z"
    generation: 1
    name: cluster
    resourceVersion: "517"
    uid: d89e60b4-2d9c-4867-a2f8-6e80207dc6b8
  spec:
    cloudConfig:
      key: config
      name: cloud-provider-config
    platformSpec:
      aws: {}
      type: AWS
  status:
    apiServerInternalURI: https://api-int.adstefa-a12.qe.devcluster.openshift.com:6443
    apiServerURL: https://api.adstefa-a12.qe.devcluster.openshift.com:6443
    controlPlaneTopology: SingleReplica
    cpuPartitioning: None
    etcdDiscoveryDomain: ""
    infrastructureName: adstefa-a12-6wlvm
    infrastructureTopology: HighlyAvailable
    platform: AWS
    platformStatus:
      aws:
        region: us-east-2
      type: AWS


apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
   .... 
  creationTimestamp: "2024-01-18T08:54:23Z"
  generation: 3
  labels:
    app: console
    component: ui
  name: console
  namespace: openshift-console
spec:
  progressDeadlineSeconds: 600
  replicas: 2

Expected results:

The replica is set to 1, tracking the ControlPlaneTopology value instead of hte infrastructureTopology.

Additional info:

Bug OCPBUGS-29981: ART requests updates to 4.16 image golang-github-prometheus-prometheus-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus/pull/195

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus/pull/197

Bug OCPBUGS-16614: Remove techpreview for sts-enablement in the API infra

View the Description View the linked PRs

Description of problem:

STS cluster awareness was in techpreview for testing and assurance of quality before release. The created unit tests and runs have indicated no change in operation to the cluster. QE has reported several bugs and they've been fixed. A periodic e2e test to verify that when an STS cluster is detected and proper AWS resource access tokens are present in the CredentialsRequest a Secret is generated has been passing and has passed when run manually on several follow-on PRs.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Story OCPCLOUD-914: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/447

Bug OCPBUGS-21830: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/308

Bug OCPBUGS-2783: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/734

Bug OCPBUGS-10106: Update 4.14 configmap-reload image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/configmap-reload/pull/51

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/configmap-reload/pull/51

Bug OCPBUGS-6811: Update Cluster Sample Operator dependencies and libraries for OCP 4.13

View the Description View the linked PRs

Description of problem:

We need to update the operator to be synced with the K8 api version used by OCP 4.13. We also need to sync our samples libraries with latest available libraries. Any deprecated libraries should be removed as well.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/485

Bug OCPBUGS-29576: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-config-operator/pull/408

Bug OCPBUGS-722: Undiagnosed panic detected in pod: openshift-controller-manager-operator_openshift-controller-manager-operator invalid memory address or nil pointer dereference

View the Description View the linked PRs

Description of problem:

In looking at jobs on an accepted payload at https://amd64.ocp.releases.ci.openshift.org/releasestream/4.12.0-0.ci/release/4.12.0-0.ci-2022-08-30-122201 , I observed this job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-serial/1564589538850902016 with "Undiagnosed panic detected in pod" "pods/openshift-controller-manager-operator_openshift-controller-manager-operator-74bf985788-8v9qb_openshift-controller-manager-operator.log.gz:E0830 12:41:48.029165       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)"

Version-Release number of selected component (if applicable):

4.12

How reproducible:

probably relatively easy to reproduce (but not consistently) given it's happened several times according to this search: https://search.ci.openshift.org/?search=Observed+a+panic%3A+%22invalid+memory+address+or+nil+pointer+dereference%22&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Steps to Reproduce:

1. let nightly payloads run or run one of the presubmit jobs mentioned in the search above
2.
3.

Actual results:

Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)}

Expected results:

no panics

Additional info:

Task MON-1208: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-29532: console-operator is unable to add its OIDC client info

View the Description View the linked PRs

Description of problem:

    If the authentication.config/cluster Type=="" but the OAuth/User APIs are already missing, the console-operator won't update the authentication.config/cluster status with its own client as it's crashing on being unable to retrieve OAuthClients.

Version-Release number of selected component (if applicable):

    4.15.0

How reproducible:

    100%

Steps to Reproduce:

    1. scale oauth-apiserver to 0
    2. set featuregates to TechPreviewNotUpgradable
    3. watch the authentication.config/cluster .status.oidcClients

Actual results:

    The client for the console does not appear.

Expected results:

    The client for the console should appear.

Additional info:

https://github.com/openshift/console-operator/pull/861

Bug OCPBUGS-28666: openshift/openshift-controller-manager-operator - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/326

Bug OCPBUGS-4357: Bump samples operator k8s dep to 1.25.2

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/476

Bug OCPBUGS-18857: Update 4.15 ose-cluster-samples-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-samples-operator/pull/517

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-samples-operator/pull/517

Bug OCPBUGS-23624: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/325

Bug OCPBUGS-26541: cleanup cluster-config-operator image

View the Description View the linked PRs

Description of problem:

    manifests are duplicated with cluster-config-api image

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-15893: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/775

Bug OCPBUGS-16794: The file permission of the controller manager pod specification file should be set to 600 to conform with CIS benchmarks

View the Description View the linked PRs

Description of problem:

Observation from CISv1.4 pdf:
1.1.3 Ensure that the controller manager pod specification file



When I checked I found description of the controller manager pod specification file in CIS v1.4 PDF is as follows:
"Ensure that the controller manager pod specification file has permissions of 600 or more
restrictive.
 
OpenShift 4 deploys two API servers: the OpenShift API server and the Kube API server. The OpenShift API server delegates requests for Kubernetes objects to the Kube API server.
The OpenShift API server is managed as a deployment. The pod specification yaml for openshift-apiserver is stored in etcd.
The Kube API Server is managed as a static pod. The pod specification file for the kube-apiserver is created on the control plane nodes at /etc/kubernetes/manifests/kube-apiserver-pod.yaml. The kube-apiserver is mounted via hostpath to the kube-apiserver pods via /etc/kubernetes/static-pod-resources/kube-apiserver-pod.yaml with permissions 600."
 
To conform with CIS benchmarks, the controller manager pod specification file should be updated to 600.

$ for i in $( oc get pods -n openshift-kube-controller-manager -o name -l app=kube-controller-manager)
do                          
oc exec -n openshift-kube-controller-manager $i -- stat -c %a /etc/kubernetes/static-pod-resources/kube-controller-manager-pod.yaml  
done                                                                    
644
644
644

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

The controller manager pod specification file for the kube-apiserver is 644.

Expected results:

The controller manager pod specification file for the kube-apiserver is 644.

Additional info:

https://github.com/openshift/library-go/commit/19a42d2bae8ba68761cfad72bf764e10d275ad6e

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/750

Bug OCPBUGS-6579: update sample imagestreams with latest 4.11 image

View the Description View the linked PRs

Description of problem:

update sample imagestreams with latest 4.11 image using specific image tag reference

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/483

Bug ODC-5487: Quick start search bar's icons shift to next line on input

View the Description View the linked PRs

Description of problem:

The search icon & clear icon get shifted to the next line when text is entered.

Prerequisites (if any, like setup, operators/versions):

None

Steps to Reproduce

Enter text in the quick start search bar.

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

4.7.0-0.nightly-2021-02-04-031352

Additional info:

Appears to be a Patternfly bug: https://github.com/patternfly/patternfly-react/issues/5416

Originated from pf upgrade to fix another issue: https://github.com/openshift/console/pull/7899

https://github.com/openshift/console/pull/8071

Task MON-1302: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-14812: Update OWNERS and OWNERS_ALIASES in external-resizer repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-external-resizer/pull/142

Bug OCPBUGS-17255: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/195

Bug OCPBUGS-19080: SNO failed upgrade (4.13-> 4.14) because console operator is not available

View the Description View the linked PRs

Description of problem:

Attempted upgrade of 3480 SNOs that were deployed from 4.13.11 to 4.14.0-rc.0 and 15 SNOs ended up stuck in partial upgrade because the cluster console operator was not available

# cat 4.14.0-rc.0-partial.console | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers"
vm00255 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00320 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00327 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00405 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00705 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01224 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01310 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01320 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01928 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02052 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02588 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02704 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: wait has exceeded 40 minutes for these operators: console
vm02835 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm03110 version   4.13.11   True   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm03322 version   4.13.11   True   True   15h   Unable to apply 4.14.0-rc.0: wait has exceeded 40 minutes for these operators: console

Version-Release number of selected component (if applicable):

SNO OCP (managed clusters being upgraded) 4.13.11 upgraded to 4.14.0-rc.0
Hub OCP 4.13.12
ACM - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

15 out of 3489 SNos being upgraded however represented 15 out of the 41 partial upgrade failures group (~36% of the failures)

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/796

Bug OCPBUGS-29984: ART requests updates to 4.16 image ose-csi-external-resizer-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-resizer/pull/155

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-resizer/pull/158

Bug OCPBUGS-19178: Update 4.15 baremetal-machine-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/196

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/196

Bug OCPBUGS-22544: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/291

Bug OCPBUGS-24919: Update 4.16 baremetal-machine-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/206

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/206

Bug OCPBUGS-4347: set TLS cipher suites in Kube RBAC sidecars

View the Description View the linked PRs

See the following:
https://issues.redhat.com/browse/OCPBUGS-2083
https://github.com/openshift/library-go/pull/1413
https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/117

This was fixed for vsphere, but we need the same change for the other storage operators. Bump library-go and add --tls-cipher-suites=${TLS_CIPHER_SUITES} to the kube RBAC sidecars.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/168

Bug OCPBUGS-12662: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/212

Bug OCPBUGS-19224: Update 4.15 ose-csi-external-resizer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-resizer/pull/144

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-resizer/pull/144

Bug OCPBUGS-11882: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-22357: Fix and bump library-go for storage operators

View the Description View the linked PRs

We need to fix and bump library-go for http2 vulnerability CVE-2023-44487. This effectively turns off HTTP/2 in library-go http endpoints, i.e. metrics and health.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/290

Bug OCPBUGS-15992: OCP 4.14.0-ec.3 machine-api-controller pod crashing

View the Description View the linked PRs

Description of problem:


OCP deployments are failing with machine-api-controller pod crashing.

Version-Release number of selected component (if applicable):

OCP 4.14.0-ec.3

How reproducible:

Always

Steps to Reproduce:

1. Deploy a Baremetal cluster
2. After bootstrap is completed, check the pods running in the openshift-machine-api namespace
3. Check machine-api-controllers-* pod status (it goes from Running to Crashing all the time)
4. Deployment eventually times out and stops with only the master nodes getting deployed.

Actual results:

machine-api-controllers-* pod remains in a crashing loop and OCP 4.14.0-ec.3 deployments fail.

Expected results:

machine-api-controllers-* pod remains running and OCP 4.14.0-ec.3 deployments are completed

Additional info:

Jobs with older nightly releases in 4.14 are passing, but since Saturday Jul 10th, our CI jobs are failing

$ oc version
Client Version: 4.14.0-ec.3
Kustomize Version: v5.0.1
Kubernetes Version: v1.27.3+e8b13aa

$ oc get nodes
NAME       STATUS   ROLES                  AGE   VERSION
master-0   Ready    control-plane,master   37m   v1.27.3+e8b13aa
master-1   Ready    control-plane,master   37m   v1.27.3+e8b13aa
master-2   Ready    control-plane,master   38m   v1.27.3+e8b13aa

$ oc -n openshift-machine-api get pods -o wide
NAME                                                  READY   STATUS             RESTARTS        AGE   IP              NODE       NOMINATED NODE   READINESS GATES
cluster-autoscaler-operator-75b96869d8-gzthq          2/2     Running            0               48m   10.129.0.6      master-0   <none>           <none>
cluster-baremetal-operator-7c9cb8cd69-6bqcg           2/2     Running            0               48m   10.129.0.7      master-0   <none>           <none>
control-plane-machine-set-operator-6b65b5b865-w996m   1/1     Running            0               48m   10.129.0.22     master-0   <none>           <none>
machine-api-controllers-59694ff965-v4kxb              6/7     CrashLoopBackOff   7 (2m31s ago)   46m   10.130.0.12     master-2   <none>           <none>
machine-api-operator-58b54d7c86-cnx4w                 2/2     Running            0               48m   10.129.0.8      master-0   <none>           <none>
metal3-6ffbb8dcd4-drlq5                               6/6     Running            0               45m   192.168.62.22   master-1   <none>           <none>
metal3-baremetal-operator-bd95b6695-q6k7c             1/1     Running            0               45m   10.130.0.16     master-2   <none>           <none>
metal3-image-cache-4p7ln                              1/1     Running            0               45m   192.168.62.22   master-1   <none>           <none>
metal3-image-cache-lfmb4                              1/1     Running            0               45m   192.168.62.23   master-2   <none>           <none>
metal3-image-cache-txjg5                              1/1     Running            0               45m   192.168.62.21   master-0   <none>           <none>
metal3-image-customization-65cf987f5c-wgqs7           1/1     Running            0               45m   10.128.0.17     master-1   <none>           <none>

$ oc -n openshift-machine-api logs machine-api-controllers-59694ff965-v4kxb -c machine-controller | less
...
E0710 15:55:08.230413       1 logr.go:270] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"Metal3Remediation\" in version \"infrastructure.cluster.x-k8s.io/v1beta1\""  "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"Metal3Remediation"}
E0710 15:55:14.019930       1 controller.go:210]  "msg"="Could not wait for Cache to sync" "error"="failed to wait for metal3remediation caches to sync: timed out waiting for cache to be synced" "controller"="metal3remediation" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="Metal3Remediation" 
I0710 15:55:14.020025       1 logr.go:252]  "msg"="Stopping and waiting for non leader election runnables"  
I0710 15:55:14.020054       1 logr.go:252]  "msg"="Stopping and waiting for leader election runnables"  
I0710 15:55:14.020095       1 controller.go:247]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="machine-drain-controller" 
I0710 15:55:14.020147       1 controller.go:247]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="machineset-controller" 
I0710 15:55:14.020169       1 controller.go:247]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="machine-controller" 
I0710 15:55:14.020184       1 controller.go:249]  "msg"="All workers finished" "controller"="machineset-controller" 
I0710 15:55:14.020181       1 controller.go:249]  "msg"="All workers finished" "controller"="machine-drain-controller" 
I0710 15:55:14.020190       1 controller.go:249]  "msg"="All workers finished" "controller"="machine-controller" 
I0710 15:55:14.020209       1 logr.go:252]  "msg"="Stopping and waiting for caches"  
I0710 15:55:14.020323       1 logr.go:252]  "msg"="Stopping and waiting for webhooks"  
I0710 15:55:14.020327       1 reflector.go:225] Stopping reflector *v1alpha1.BareMetalHost (10h53m58.149951981s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
I0710 15:55:14.020393       1 reflector.go:225] Stopping reflector *v1beta1.Machine (9h40m22.116205595s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
I0710 15:55:14.020399       1 logr.go:252] controller-runtime/webhook "msg"="shutting down webhook server"  
I0710 15:55:14.020437       1 reflector.go:225] Stopping reflector *v1.Node (10h3m14.461941979s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
I0710 15:55:14.020466       1 logr.go:252]  "msg"="Wait completed, proceeding to shutdown the manager"  
I0710 15:55:14.020485       1 reflector.go:225] Stopping reflector *v1beta1.MachineSet (10h7m28.391827596s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
E0710 15:55:14.020500       1 main.go:218] baremetal-controller-manager/entrypoint "msg"="unable to run manager" "error"="failed to wait for metal3remediation caches to sync: timed out waiting for cache to be synced"  
E0710 15:55:14.020504       1 logr.go:270]  "msg"="error received after stop sequence was engaged" "error"="leader election lost"

Our CI job logs can be seen here (RedHat SSO): https://www.distributed-ci.io/jobs/7da8ee48-8918-4a97-8e3c-f525d19583b8/files

https://github.com/openshift/cluster-api-provider-baremetal/pull/193

Bug OCPBUGS-25587: Update 4.16 ose-cluster-kube-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/779

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/780

Bug OCPBUGS-29581: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/334

Bug OCPBUGS-16435: Bump samples operator k8s dep to v0.27.2

View the Description View the linked PRs

Description of problem:

Updating the k* version to v0.27.2 in cluster samples operator for OCP 4.14 release

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/514

Bug OCPBUGS-25561: Update 4.16 baremetal-machine-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/208

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/208

Bug OCPBUGS-23462: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/771

Bug OCPBUGS-25618: Bump documentationBaseURL to 4.16

View the Description View the linked PRs

Description of problem:

documentationBaseURL still points to 4.14

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

1.Check documentationBaseURL on 4.16 cluster: 
# oc get configmap console-config -n openshift-console -o yaml | grep documentationBaseURL
      documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/

2.
3.

Actual results:

1.documentationBaseURL is still pointing to 4.14

Expected results:

1.documentationBaseURL should point to 4.16

Additional info:

https://github.com/openshift/console-operator/pull/824

Bug OCPBUGS-24932: Update 4.16 ose-cluster-update-keys-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/53

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/53

Bug OCPBUGS-25540: Update 4.16 ose-csi-external-resizer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-resizer/pull/154

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-resizer/pull/154

Story CONSOLE-2768: console-operator should use bindata instead of inlining manifests

View the Description View the linked PRs

console-operator codebase contains a lot of inline manifests. Instead we should put those manifests into a `/bindata` folder, from which they will be read and then updated per purpose.

Task MON-3528: Fix Prometheus downstream manifest file

View the linked PRs

https://github.com/openshift/prometheus/pull/186

Bug OCPBUGS-5269: remove unnecessary RBAC in KCM: file removal

View the Description View the linked PRs

Description of problem:

We discovered that we are shipping unnecesary RBAC in https://coreos.slack.com/archives/CC3CZCQHM/p1667571136730989 .

This RBAC was only used 4.2 and 4.3 for

for making a switch from configMaps to leases in leader election

and we should remove it

followup to https://issues.redhat.com/browse/OCPBUGS-3283 - the RBACs are not applied anymore, but we just need to remove the actual files from the repo. No behavioral change should occur with the file removal.

Version-Release number of selected component (if applicable):{code:none}

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/681

Bug OCPBUGS-16783: Chore: Update OWNERS and OWNERS_ALIASES in CSI driver and operator repos

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES in all CSI driver and operator repos.

For driver repos:

1) OWNERS must have `component`:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

For operator repos:

1) OWNERS must have:

all team members of Storage team as `approvers`
`component`:
```
component: "Storage / Operators"
```

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/251

Bug OCPBUGS-23848: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/341

Bug OCPBUGS-4207: Remove debug level logging on openshift-config-operator

View the Description View the linked PRs

Description of problem:


We added a line to increase debugging verbosity to aid in debugging WRKLDS-540

Version-Release number of selected component (if applicable):

How reproducible:

very

Steps to Reproduce:

1.just a revert
2.
3.

Actual results:

Extra debugging lines are present in the openshift-config-operator pod logs

Expected results:

Extra debugging lines no longer in the openshift-config-operator pod logs

Additional info:

https://github.com/openshift/cluster-config-operator/pull/274

Bug OCPBUGS-21972: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/802

Bug OCPBUGS-24925: Update 4.16 ose-ovirt-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-ovirt/pull/176

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-10910: The network-tools image stream is missing in the cluster samples

View the Description View the linked PRs

Description of problem:

The network-tools image stream is missing in the cluster samples. It is needed for CI tests.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/495

Bug OCPBUGS-18115: PrometheusOperatorRejectedResources alert fires on Hypershift clusters with user-defined monitoring

View the Description View the linked PRs

Description of problem:

After enabling user-defined monitoring on an HyperShift hosted cluster, PrometheusOperatorRejectedResources starts firing.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Start an hypershift-hosted cluster with cluster-bot
2. Enable user-defined monitoring
3.

Actual results:

PrometheusOperatorRejectedResources alert becomes firing

Expected results:

No alert firing

Additional info:

Need to reach out to the HyperShift folks as the fix should probably be in their code base.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/757

Bug OCPBUGS-22225: Remove wildfly docker.io samples

View the Description View the linked PRs

Samples operator in OKD refers to docker.io/openshift/wildfly, which are no longer available. Library sync should update samples to use quay.io links

https://github.com/openshift/cluster-samples-operator/pull/519

Bug OCPBUGS-11531: Bump documentationBaseURL to 4.14

View the Description View the linked PRs

Description of problem:

documentationBaseURL is still linking to 4.13

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-05-183601

How reproducible:

Always

Steps to Reproduce:

1. get documentationBaseURL in cm/console-config
$ oc get cm console-config -n openshift-console -o yaml | grep documentationBaseURL
      documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.13/
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-04-05-183601   True        False         68m     Cluster version is 4.14.0-0.nightly-2023-04-05-183601
2.
3.

Actual results:

documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.13/

Expected results:

documentationBaseURL should be  https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/

Additional info:

https://github.com/openshift/console-operator/pull/750

Bug OCPBUGS-21593: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-457: [CI-Watcher] console-operator TestEditUnmanagedPodDisruptionBudget flakes

View the Description View the linked PRs

Description of problem:

TestEditUnmanagedPodDisruptionBudget flakes in the console-operator e2e

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Flake

Steps to Reproduce:
1. Check https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_console-operator/665/pull-ci-openshift-console-operator-master-e2e-aws-operator/1562005782164148224
2.
3.

Actual results:

Expected results:

Additional info:

There is a chance that the PDB instances is not present since prior to the Unmanaged* TCs the RemoveTest is running which is removing all the console resources (Pods, Services, PDBs, ...).

https://github.com/openshift/console-operator/pull/675

Bug OCPBUGS-7656: Console with custom route can not be accessed when the console is enabled after cluster upgrade

View the Description View the linked PRs

Description of problem:

When console with custom route is disabled before cluster upgrade, and re-enabled after cluster upgrade, console could not be accessed successfully.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-15-111607

How reproducible:

Always

Steps to Reproduce:

1. Launch a cluster with available update.
2. Create custom route for console in ingress configuration:
# oc edit ingresses.config.openshift.io cluster
spec:
  componentRoutes:
  - hostname: console-openshift-custom.apps.qe-413-0216.qe.devcluster.openshift.com
    name: console
    namespace: openshift-console
  - hostname: openshift-downloads-custom.apps.qe-413-0216.qe.devcluster.openshift.com
    name: downloads
    namespace: openshift-console
  domain: apps.qe-413-0216.qe.devcluster.openshift.com
3. After custom route is created, access console with custom route.
4. Remove console by setting managementState as Removed in console operator:
# oc edit consoles.operator.openshift.io cluster
spec:
  logLevel: Normal
  managementState: Removed
  operatorLogLevel: Normal
5. Upgrade cluster to a target version.
6. Enable console by setting managementState as Managed in console operator:
# oc edit consoles.operator.openshift.io cluster
spec:
  logLevel: Normal
  managementState: Managed
  operatorLogLevel: Normal
7. After console resources are created, access console url.

Actual results:

3. Console could be accessed through custom route.
4. Console resources are removed. And all cluster operators are in normal status
# oc get all -n openshift-console
No resources found in openshift-console namespace.

5. Upgrade succeeds, all cluster operators are in normal status
6. Console resources are created:

oc get all -n openshift-console
NAME READY STATUS RESTARTS AGE
pod/console-69d88985b-bvh46 1/1 Running 0 3m41s
pod/console-69d88985b-fwhjf 1/1 Running 0 3m41s
pod/downloads-6b6b555d8d-kn822 1/1 Running 0 3m49s
pod/downloads-6b6b555d8d-wp6zc 1/1 Running 0 3m49s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/console ClusterIP 172.30.226.112 <none> 443/TCP 3m50s
service/console-redirect ClusterIP 172.30.147.151 <none> 8444/TCP 3m50s
service/downloads ClusterIP 172.30.251.248 <none> 80/TCP 3m50s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/console 2/2 2 2 3m47s
deployment.apps/downloads 2/2 2 2 3m50s

NAME DESIRED CURRENT READY AGE
replicaset.apps/console-69d88985b 2 2 2 3m42s
replicaset.apps/console-6dbdd487d 0 0 0 3m47s
replicaset.apps/downloads-6b6b555d8d 2 2 2 3m50s

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/console console-openshift-console.apps.qe-413-0216.qe.devcluster.openshift.com console-redirect custom-route-redirect edge/Redirect None
route.route.openshift.io/console-custom console-openshift-custom.apps.qe-413-0216.qe.devcluster.openshift.com console https reencrypt/Redirect None
route.route.openshift.io/downloads downloads-openshift-console.apps.qe-413-0216.qe.devcluster.openshift.com downloads http edge/Redirect None
route.route.openshift.io/downloads-custom openshift-downloads-custom.apps.qe-413-0216.qe.devcluster.openshift.com downloads http edge/Redirect None

7. Could not open console url successfully. There is error info for console operator:

oc get co | grep console
console 4.13.0-0.nightly-2023-02-15-202607 False False False 42s RouteHealthAvailable: route not yet available, https://console-openshift-custom.apps.qe-413-0216.qe.devcluster.openshift.com returns '503 Service Unavailable'
oc get clusterversions.config.openshift.io
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-0.nightly-2023-02-15-202607 True False 4h48m Error while reconciling 4.13.0-0.nightly-2023-02-15-202607: the cluster operator console is not available

Expected results:

7. Should be able to access console successfully.

Additional info:

https://github.com/openshift/console-operator/pull/826

Bug OCPBUGS-3195: Service-ca controller exits immediately with an error on sigterm

View the Description View the linked PRs

Description of problem:

the service ca controller start func seems to return that error as soon as its context is cancelled (which seems to happen the moment the first signal is received): https://github.com/openshift/service-ca-operator/blob/42088528ef8a6a4b8c99b0f558246b8025584056/pkg/controller/starter.go#L24

that apparently triggers os.Exit(1) immediately https://github.com/openshift/service-ca-operator/blob/42088528ef8a6a4b8c99b0f55824[…]om/openshift/library-go/pkg/controller/controllercmd/builder.go

the lock release doesn't happen until the periodic renew tick breaks out https://github.com/openshift/service-ca-operator/blob/42088528ef8a6a4b8c99b0f55824[…]/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go

seems unlikely that you'd reach the call to le.release() before the call to os.Exit(1) in the other goroutine

Version-Release number of selected component (if applicable):

4.13.0

How reproducible:

~always

Steps to Reproduce:

1. oc delete -n openshift-service-ca pod <service-ca pod>

Actual results:

the old pod logs show:

W1103 09:59:14.370594       1 builder.go:106] graceful termination failed, controllers failed with error: stopped

and when a new pod comes up to replace it, it has to wait for a while before acquiring the leader lock

I1103 16:46:00.166173       1 leaderelection.go:248] attempting to acquire leader lease openshift-service-ca/service-ca-controller-lock...
 .... waiting ....
I1103 16:48:30.004187       1 leaderelection.go:258] successfully acquired lease openshift-service-ca/service-ca-controller-lock

Expected results:

new pod can acquire the leader lease without waiting for the old pod's lease to expire

Additional info:

https://github.com/openshift/service-ca-operator/pull/202

Bug OCPBUGS-4343: Use flowcontrol/v1beta3 for apf manifests in 4.13

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/273

Story TRT-1361: Hypershift failures blocking CI payloads

View the Description View the linked PRs

Two payloads in a row, first had more failures, second had less but still broken.

Both exhibit this status on the console operator:

  status:
    conditions:
    - lastTransitionTime: "2023-11-17T06:06:57Z"
      message: 'OAuthClientSyncDegraded: the server is currently unable to handle
        the request (get oauthclients.oauth.openshift.io console)'
      reason: OAuthClientSync_FailedRegister
      status: "True"
      type: Degraded

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1725383840110743552/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestUpgradeControlPlane/hostedcluster-example-dcxq4/cluster-scoped-resources/config.openshift.io/clusteroperators.yaml

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1725305112194191360/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestUpgradeControlPlane/hostedcluster-example-c7bz4/cluster-scoped-resources/config.openshift.io/clusteroperators.yaml

We are suspicious of this PR, however this change was before the payloads started failing, perhaps the issue only surfaces on upgrades once the change was in an accepted payload: https://github.com/openshift/console-operator/pull/808

There is also a hypershift PR that was only present in second failed payload, possibly a reaction to the problem but didn't fully fix? There were less failures in the second payload than the first: https://github.com/openshift/hypershift/pull/3151 ? If so, this will complicate a revert.

Discussion: https://redhat-internal.slack.com/archives/C01C8502FMM/p1700226091335339

https://github.com/openshift/console-operator/pull/813

Bug OCPBUGS-15905: ip-reconciler removes the overlappingrangeipreservations whether the pod is alive or not

View the Description View the linked PRs

Description of problem:

The reconciler removes the overlappingrangeipreservations.whereabouts.cni.cncf.io resources whether the pod is alive or not.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create pods and check the overlappingrangeipreservations.whereabouts.cni.cncf.io resources:

$ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -A
NAMESPACE          NAME                      AGE
openshift-multus   2001-1b70-820d-4b04--13   4m53s
openshift-multus   2001-1b70-820d-4b05--13   4m49s

2. Verify that when the ip-reconciler cronjob removes the overlappingrangeipreservations.whereabouts.cni.cncf.io resources when run:

$ oc get cronjob -n openshift-multus
NAME            SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
ip-reconciler   */15 * * * *   False     0        14m             4d13h

$ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -A
No resources found

$ oc get cronjob -n openshift-multus
NAME            SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
ip-reconciler   */15 * * * *   False     0        5s              4d13h

Actual results:

The overlappingrangeipreservations.whereabouts.cni.cncf.io resources are removed for each created pod by the ip-reconciler cronjob.
The "overlapping ranges" are not used.

Expected results:

The overlappingrangeipreservations.whereabouts.cni.cncf.io should not be removed regardless of if a pod has used an IP in the overlapping ranges.

Additional info:

https://github.com/openshift/whereabouts-cni/pull/167

Bug OCPBUGS-11352: --external-cloud-volume-plugin for out-of tree providers

View the Description View the linked PRs

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/721

Bug OCPBUGS-12990: Custom `Downloads` route is not being updated within the `https://custom-console-route/command-line-tools`

View the Description View the linked PRs

Description of problem:

After customizing the routes for Console and Downloads, the `Downloads` route is not being updated within the `https://custom-console-route/command-line-tools` and still pointing the old/default downloads route.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Customize Console and Downloads routes.
2. Access the web-console using custom console route.
3. Go to Command-line-tools.
4. Try to access the downloads urls.

Actual results:

While accessing the downloads urls, it is pointing towards default/old downloads route

Expected results:

While accessing the downloads urls, it should be pointing towards custom downloads route

Additional info:

https://github.com/openshift/console-operator/pull/761

Bug OCPBUGS-28835: Failed to watch Metal3Remediation template

View the Description View the linked PRs

Description of problem:

NHC failed to watch Metal3 remediation template

Version-Release number of selected component (if applicable):

OCP4.13 and higher

How reproducible:

    100%

Steps to Reproduce:

    1. Create Metal3RemediationTemplate
    2. Install NHCv.0.7.0
    3. Create NHC with Metal3RemediationTemplate

Actual results:

E0131 14:07:51.603803 1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Failed to watch infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3RemediationTemplate: failed to list infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3RemediationTemplate: metal3remediationtemplates.infrastructure.cluster.x-k8s.io is forbidden: User "system:serviceaccount:openshift-workload-availability:node-healthcheck-controller-manager" cannot list resource "metal3remediationtemplates" in API group "infrastructure.cluster.x-k8s.io" at the cluster scope

E0131 14:07:59.912283 1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Failed to watch infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3Remediation: unknown

W0131 14:08:24.831958 1 reflector.go:539] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: failed to list infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3RemediationTemplate: metal3remediationtemplates.infrastructure.cluster.x-k8s.io is forbidden: User "system:serviceaccount:openshift-workload-availability:node-healthcheck-controller-manager" cannot list resource

Expected results:

    No errors

Additional info:

https://github.com/openshift/cluster-api-provider-baremetal/pull/209

Bug OCPBUGS-785: Bump documentationBaseURL to 4.12

View the Description View the linked PRs

Description of problem:

documentationBaseURL still points to 4.10

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-08-31-101631

How reproducible:

Always

Steps to Reproduce:

1.Check documentationBaseURL on 4.12 cluster: 
# oc get configmap console-config -n openshift-console -o yaml | grep documentationBaseURL
      documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.11/

2.
3.

Actual results:

1.documentationBaseURL is still pointing to 4.11

Expected results:

1.documentationBaseURL should point to 4.12

Additional info:

https://github.com/openshift/console-operator/pull/682

Bug OCPBUGS-19263: Update 4.15 ose-cluster-kube-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/747

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/747

Bug OCPBUGS-4630: Bump documentationBaseURL to 4.13

View the Description View the linked PRs

Description of problem:

documentationBaseURL still points to 4.12 URL on a 4.13 cluster

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2022-12-07-193721

How reproducible:

Always

Steps to Reproduce:

1. check documentationBaseURL on a 4.13 cluster
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2022-12-07-193721   True        False         37m     Cluster version is 4.13.0-0.nightly-2022-12-07-193721 
$ oc get cm console-config -n openshift-console -o yaml | grep documentationBaseURL       documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.12/
2.
3.

Actual results:

it still points to 4.12

Expected results:

documentationBaseURL should be updated to  https://access.redhat.com/documentation/en-us/openshift_container_platform/4.13/

Additional info:

https://github.com/openshift/console-operator/pull/704

Bug OCPBUGS-16072: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/296

Bug OCPBUGS-212: co/kube-controller-manager degraded: GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.153.28:9091: connect: cannot assign requested address

View the Description View the linked PRs

Description of problem:

oc --context build02 get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-ec.1   True        False         45h     Error while reconciling 4.12.0-ec.1: the cluster operator kube-controller-manager is degraded

oc --context build02 get co kube-controller-manager
NAME                      VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
kube-controller-manager   4.12.0-ec.1   True        False         True       2y87d   GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.153.28:9091: connect: cannot assign requested address

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

build02 is a build farm cluster in CI production.
I can provide credentials to access the cluster if needed.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/649

Bug OCPBUGS-26992: Dockerfile.okd needs to be updated

View the Description View the linked PRs

Dockerfile.okd is behind compared to Dockerfile

https://github.com/openshift/cluster-samples-operator/pull/531

Bug OCPBUGS-24888: Update 4.16 ose-cluster-openshift-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/321

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/321

Bug OCPBUGS-28556: Update 4.16 ose-multus-whereabouts-ipam-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/whereabouts-cni/pull/242

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/whereabouts-cni/pull/242

Bug OCPBUGS-6185: Update 4.13 ose-cluster-config-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-config-operator/pull/276

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-config-operator/pull/280

Bug OCPBUGS-4491: hypershift: aws-ebs-csi-driver-operator uses wrong kubeconfig

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/169

Story CONSOLE-2024: Console: Surface pod status info

View the Description View the linked PRs

Add a popover for pod status that shows additional details for unschedulable and failing pods.

If pod is Unschedulable, add a popover to Pending status that includes the reason the pod is unschedulable on the pod list and pod details pages.
If pod status is CrashLoopBackoff, add a popover to the CrashLoopBackoff status that includes the reason for the error on the pod list and pod details pages.
If pod status is ErrImagePull, add a popover to the ErrImagePull status that includes the reason for the error on the pod list and pod details pages.
If pod status is ImagePullBackOff, add a popover to the ImagePullBackOff status that includes the reason for the error on the pod list and pod details pages.

https://github.com/openshift/console/pull/7302

Bug OCPBUGS-13895: route-controller-manager configuration changes for a refactoring

View the Description View the linked PRs

Description of problem:

The following changes are required for openshift/route-controller-manager#22 refactoring.

add POD_NAME to route-controller-manager deployment
introduce route-controller-defaultconfig and customize lease name openshift-route-controllers to override the default one supplied by library-go
add RBAC for infrastructures which is used by library-go for configuring leader election

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/288

Bug OCPBUGS-14716: Add Red Hat OpenShift Service on AWS branding option

View the Description View the linked PRs

Description of problem:

ROSA is being branded via custom branding; as a result, the favicon disappears since we do not want any Red Hat/Openshift-specific branding to appear when custom branding is in use.  Since ROSA is a Red Hat product, it should get a branding option added to the console so all the correct branding including favicon appears.

Version-Release number of selected component (if applicable):

4.14.0, 4.13.z, 4.12.z, 4.11.z

How reproducible:

Always

Steps to Reproduce:

1.  View a ROSA cluster
2.  Note the absence of the OpenShift logo favicon

https://github.com/openshift/console-operator/pull/769

Bug OCPBUGS-25484: Install failure for console operator

View the Description View the linked PRs

Description of problem:

Reviewing 4.15 Install failures (install should succeed: overall) there are a number of variants impacted by recent install failures.

search.ci: Cluster operator console is not available

Jobs like periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-sdn-serial show failures that appear to start with 4.15.0-0.nightly-2023-12-07-225558 have installation failures due to console-operator

ConsoleOperator reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again

4.15.0-0.nightly-2023-12-07-225558 contains console-operator/pull/814, noting in case it is related

Version-Release number of selected component (if applicable):

 4.15

How reproducible:

Steps to Reproduce:

    1. Review link to install failures above
    2.
    3.

Actual results:

Expected results:

Additional info:
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-sdn
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade

Bug OCPBUGS-6282: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/186

Bug OCPBUGS-16507: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-config-operator/pull/333

Bug OCPBUGS-21794: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/197

Bug OCPBUGS-6259: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/691

Bug OCPBUGS-14323: Change static manifest pod files permissions to 0600 to conform with CIS benchmarks

View the Description View the linked PRs

Refer to the CIS RedHat OpenShift Container Platform Benchmark PDF: https://drive.google.com/file/d/12o6O-M2lqz__BgmtBrfeJu1GA2SJ352c/view
1.1.7 Ensure that the etcd pod specification file permissions are set to 600 or more restrictive (Manual)
======================================================================================================
As per CIS v1.3 PDF permissions should be 600 with the following statement:
"The pod specification file is created on control plane nodes at /etc/kubernetes/manifests/etcd-member.yaml with permissions 644. Verify that the permissions are 600 or more restrictive."
But when I ran the following command it was showing 644 permissions

for i in $(oc get pods -n openshift-etcd -l app=etcd -o name | grep etcd )
do
echo "check pod $i"
oc rsh -n openshift-etcd $i \
stat -c %a /etc/kubernetes/manifests/etcd-pod.yaml
done

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/739

Bug OCPBUGS-3985: Allow PSa enforcement in 4.13 by using featuresets

View the Description View the linked PRs

Allow users to turn PodSecurity admission in enforcement mode in 4.13 as TechPreviewNoUpgrade in order to be able to test the feature with their workloads and see if there is anything that needs fixing.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/663

Story CONSOLE-2426: Optimize Cypress yarn scripts

View the Description View the linked PRs

Currently, to launch the Cypress test runner GUI, in frontend/package.json we have:

"test-cypress": "cd packages/integration-tests-cypress && cypress open ...",
"test-cypress-devconsole": "cd packages/dev-console/integration-tests && cypress open...",
"test-cypress-olm": "cd packages/operator-lifecycle-manager/integration-tests-cypress && cypress open ..."

We want one "test-cypress" command which takes a 'pkg' parameter, values of 'olm', 'devconsole', 'console'(default).

This will cd to correct dir and use any other config settings needed for olm cypress testing:

yarn run test-cypress olm

We also have cypress headless yarn scripts:

"test-cypress-headless": "yarn run test-cypress-console-headless && yarn run test-cypress-devconsole-headless && yarn run test-cypress-olm-headless",
"test-cypress-console-headless": "cd packages/integration-tests-cypress && cypress run ...",
"test-cypress-devconsole-headless": "cd packages/dev-console/integration-tests && cypress run --spec \"features/project-creation.feature\"
"test-cypress-olm-headless": "cd packages/operator-lifecycle-manager/integration-tests-cypress && cypress run --config-file cypress-olm.json"

We want to extend the aforementioned `test-cypress` command to take a '--headless' parmeter which will run the pkg in --headless mode:

"yarn run test-cypress olm --headless"

This would replace the individual "test-cypress-<pkg>-headless" yarn scripts (if possible).

test-cypress.sh will need to be updated accordingly!

https://github.com/openshift/console/pull/7053

Bug OCPBUGS-24203: Metrics: ConsolePlugins must no longer needs to be grouped

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/819

Bug OCPBUGS-3426: Update Cluster Sample Operator dependencies and libraries for OCP 4.13

View the Description View the linked PRs

Description of problem:

We need to update the operator to be synced with the K8 api version used by OCP 4.13. We also need to sync our samples libraries with latest available libraries. Any deprecated libraries should be removed as well.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/471

Bug OCPBUGS-12483: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/191

Bug OCPBUGS-15256: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/740

Bug OCPBUGS-1904: CSI driver operators are degraded without "CSISnapshot" capability

View the Description View the linked PRs

With CSISnapshot capability is disabled, all CSI driver operators are Degraded. For example AWS EBS CSI driver operator during installation:

18:12:16.895: Some cluster operators are not ready: storage (Degraded=True AWSEBSCSIDriverOperatorCR_AWSEBSDriverStaticResourcesController_SyncError: AWSEBSCSIDriverOperatorCRDegraded: AWSEBSDriverStaticResourcesControllerDegraded: "volumesnapshotclass.yaml" (string): the server could not find the requested resource
AWSEBSCSIDriverOperatorCRDegraded: AWSEBSDriverStaticResourcesControllerDegraded: )
Ginkgo exit error 1: exit with code 1}

Version-Release number of selected component (if applicable):
4.12.nightly

The reason is that cluster-csi-snapshot-controller-operator does not create VolumeSnapshotClass CRD, which AWS EBS CSI driver operator expects to exist.

CSI driver operators must skip VolumeSnapshotClass creation if the CRD does not exist.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/164

Bug OCPBUGS-26117: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/784

Bug OCPBUGS-6488: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/725

Bug OCPBUGS-22743: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/185

Story CONSOLE-2396: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/6596

Bug OCPBUGS-12775: Update Cluster Sample Operator dependencies and libraries for OCP 4.14

View the Description View the linked PRs

Description of problem:

We need to update the operator to be synced with the K8 api version used by OCP 4.13. We also need to sync our samples libraries with latest available libraries. Any deprecated libraries should be removed as well.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/500

Bug OCPBUGS-25001: Update 4.16 ose-cluster-kube-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/774

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-29565: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/793

Bug OCPBUGS-8691: Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

View the Description View the linked PRs

Description of problem:

In hypershift context:
Operands managed by Operators running in the hosted control plane namespace in the management cluster do not honour affinity opinions https://hypershift-docs.netlify.app/how-to/distribute-hosted-cluster-workloads/
https://github.com/openshift/hypershift/blob/main/support/config/deployment.go#L263-L265

These operands running management side should honour the same affinity, tolerations, node selector and priority rules than the operator.
This could be done by looking at the operator deployment itself or at the HCP resource.

aws-ebs-csi-driver-controller
aws-ebs-csi-driver-operator
csi-snapshot-controller
csi-snapshot-webhook

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift cluster.
2. Check affinity rules and node selector of the operands above.
3.

Actual results:

Operands missing affinity rules and node selecto

Expected results:

Operands have same affinity rules and node selector than the operator

Additional info:

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/205

Story ACM-2063: Implement quick start guide for on-premise host inventory

View the Description View the linked PRs

Value Statement

Ensure the issue title clearly reflects the value of this user story to the
intended persona. (Explain the "WHY")

Implement a quick start guide to help onboard users with hosted cluster creation.

See this for more details: https://docs.google.com/document/d/1wPAtfW6vdd2fZhh2Lax8k6abLxPUgwpZvXIdWyiSyLY/edit#

Definition of Done for Engineering Story Owner (Checklist)

Development Complete

The code is complete.
Functionality is working.
Any required downstream Docker file changes are made.

Tests Automated

[ ] Unit/function tests have been automated and incorporated into the
build.
[ ] 100% automated unit/function test coverage for new or changed APIs.

Secure Design

[ ] Security has been assessed and incorporated into your threat model.

Multidisciplinary Teams Readiness

[ ] Create an informative documentation issue using the [Customer
Portal_doc_issue template](
https://github.com/stolostron/backlog/issues/new?assignees=&labels=squad%3Adoc&template=doc_issue.md&title=),
and ensure doc acceptance criteria is met. Link the development issue to
the doc issue.
[ ] Provide input to the QE team, and ensure QE acceptance criteria
(established between story owner and QE focal) are met.

Support Readiness

[ ] The must-gather script has been updated.

https://github.com/openshift/console-operator/pull/699

Bug OCPBUGS-24041: Console blips Available=False with RouteHealth_FailedGet and such

View the Description View the linked PRs

Description

Seen in 4.15-related update CI:

$ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/console.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False[^:]*: \(.*\)|\1 \2 \3|' | sed 's|[.]apps[.][^ /]*|.apps...|g' | sort | uniq -c | sort -n
      1 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... dial tcp 52.158.160.194:443: connect: connection refused
      1 console RouteHealth_StatusError route not yet available, https://console-openshift-console.apps... returns '503 Service Unavailable'
      2 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... dial tcp: lookup console-openshift-console.apps... on 172.30.0.10:53: no such host
      2 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... EOF
      8 console RouteHealth_RouteNotAdmitted console route is not admitted
     16 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... context deadline exceeded (Client.Timeout exceeded while awaiting headers)

For example this 4.14 to 4.15 run had:

: [bz-Management Console] clusteroperator/console should not change condition/Available 
Run #0: Failed 	1h25m23s
{  1 unexpected clusteroperator state transitions during e2e test run 

Nov 28 03:42:41.207 - 1s    E clusteroperator/console condition/Available reason/RouteHealth_FailedGet status/False RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.ci-op-d2qsp1gp-2a31d.aws-2.ci.openshift.org): Get "https://console-openshift-console.apps.ci-op-d2qsp1gp-2a31d.aws-2.ci.openshift.org": context deadline exceeded (Client.Timeout exceeded while awaiting headers)}

While a timeout for console Route isn't fantastic, an issue that only persists for 1s is not long enough to warrant immediate admin intervention. Teaching the console operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required.

Version-Release number of selected component

At least 4.15. Possibly other versions; I haven't checked.

.h2 How reproducible

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/console.*condition/Available.*status/False' | grep 'periodic.*failures match' | sort
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 12 runs, 17% failed, 50% of failures match = 8% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-ppc64le (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 12 runs, 17% failed, 100% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 12 runs, 25% failed, 33% of failures match = 8% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 23% failed, 28% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 28% failed, 23% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade (all) - 63 runs, 38% failed, 8% of failures match = 3% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-azure-sdn-upgrade (all) - 60 runs, 73% failed, 11% of failures match = 8% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 70 runs, 7% failed, 20% of failures match = 1% impact

Seems like it's primarily minor-version updates that trip this, and in jobs with high run counts, the impact percentage is single-digits.

Steps to reproduce

There may be a way to reliable trigger these hiccups, but as a reproducer floor, running days of CI and checking to see whether impact percentages decrease would be a good way to test fixes post-merge.

Actual results

Lots of console ClusterOperator going Available=False blips in 4.15 update CI.

Expected results

Console goes Available=False if and only if immediate admin intervention is appropriate.

https://github.com/openshift/console-operator/pull/834

Bug OCPBUGS-3978: AWS EBS CSI driver operator is degraded without "CSISnapshot" capability

View the Description View the linked PRs

With CSISnapshot capability disabled, the CSI driver operator are Degraded. For example:

18:12:16.895: Some cluster operators are not ready: storage (Degraded=True AWSEBSCSIDriverOperatorCR_AWSEBSDriverStaticResourcesController_SyncError: AWSEBSCSIDriverOperatorCRDegraded: AWSEBSDriverStaticResourcesControllerDegraded: "volumesnapshotclass.yaml" (string): the server could not find the requested resource
AWSEBSCSIDriverOperatorCRDegraded: AWSEBSDriverStaticResourcesControllerDegraded: )
Ginkgo exit error 1: exit with code 1}

Version-Release number of selected component (if applicable):
4.12.nightly

The reason is that cluster-csi-snapshot-controller-operator does not create VolumeSnapshotClass CRD, which AWS EBS CSI driver operator expects to exist.

CSI driver operators must skip VolumeSnapshotClass creation if the CRD does not exist.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/165

Bug OCPBUGS-4401: limit cluster-policy-controller RBAC permissions

View the Description View the linked PRs

Description of problem:

cluster-policy-controller has  unnecessary permissions and is able to operate on all leases in KCM namespace. This also applies to namespace-security-allocation-controller that was moved some time ago and does not need lock mechanism.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/670

Bug OCPBUGS-656: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/grafana/pull/91

Bug OCPBUGS-13017: aws-ebs-csi-driver-controller-sa ServiceAccount does not include the HCP pull-secret in its imagePullSecrets

View the Description View the linked PRs

aws-ebs-csi-driver-controller-ca ServiceAccount does not include the HCP pull-secret in its imagePullSecrets. Thus, if a HostedCluster is created with a `pullSecret` that contains creds that the management cluster pull secret does not have, the image pull fails.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/219

Bug OCPBUGS-15877: LatencySensitive featureset must be removed

View the Description View the linked PRs

LatencySensitive has been functionally equivalent to "" (Default) for several years. Code has forgotten that the featureset must be handled and its more efficacious to remove the featureset (with migration code) than try to plug all the holes.

To ensure this is working, update a cluster to use LatencySensitve and see that the FEatureSet value is reset after two minutes

https://github.com/openshift/cluster-config-operator/pull/325

Bug OCPBUGS-2873: Prometheus doesn't reload TLS certificate and key files on disk

View the Description View the linked PRs

Description of problem:

Prometheus fails to scrape metrics from the storage operator after some time.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

Always

Steps to Reproduce:

1. Install storage operator.
2. Wait for 24h (time for the certificate to be recycled).
3.

Actual results:

Targets being down because Prometheus didn't reload the CA certificate.

Expected results:

Prometheus reloads its client TLS certificate and scrapes the target successfully.

Additional info:

https://github.com/openshift/prometheus/pull/145

Task ETCD-178: refactor discover-etcd-initial-cluster and add tests

View the Description View the linked PRs

discover-etcd-initial-cluster was written very early on in the cluster-etcd-operator life cycle. We have observed at least one bug in this code and in order to validate logical correctness it needs to be rewritten with unit tests.

PR: https://github.com/openshift/etcd/pull/73

https://github.com/openshift/etcd/pull/74

Bug OCPBUGS-21633: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/173

Bug OCPBUGS-26986: whereabouts reconciler schedule is not configurable

View the Description View the linked PRs

Description of problem:

whereabouts reconciler is responsible for reclaiming dangling IPs, and freeing them to be available to allocate to new pods.
This is crucial for scenarios where the amount of addresses are limited and dangling IPs prevent whereabouts from successfully allocating new IPs to new pods.

The reconciliation schedule is currently hard-coded to run once a day, without a user-friendly way to configure.

Version-Release number of selected component (if applicable):

How reproducible:

    Create a Whereabouts reconciler daemon set, not able to configure the reconciler schedule.

Steps to Reproduce:

    1. Create a Whereabouts reconciler daemonset
       instructions: https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-      network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network

     2. Run `oc get pods -n openshift-multus | grep whereabouts-reconciler`

     3. Run `oc logs whereabouts-reconciler-xxxxx`

Actual results:

    You can't configure the cron-schedule of the reconciler.

Expected results:

    Be able to modify the reconciler cron schedule.

Additional info:

    The fix for this bug is in two places: whereabouts, and cluster-network-operator.
    From this reason, in order to verify correctly we need to use both fixed components.
    Please read below for more details about how to apply the new configurations.

How to Verify:

    Create a whereabouts-config ConfigMap with a custom value, and check in the
    whereabouts-reconciler pods' logs that it is updated, and triggering the clean up.

Steps to Verify:

    1. Create a Whereabouts reconciler daemonset
    2. Wait for the whereabouts-reconciler pods to be running. (takes time for the daemonset to get created).
    3. See in logs: "[error] could not read file: <nil>, using expression from flatfile: 30 4 * * *"
       This means it uses the hardcoded default value. (Because no ConfigMap yet)
    4. Run: oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/2 * * * *"
    5. Check in the logs for: "successfully updated CRON configuration" 
    6. Check that in the next 2 minutes the reconciler runs: "[verbose] starting reconciler run"

https://github.com/openshift/whereabouts-cni/pull/227

Bug OCPBUGS-12133: Update 4.14 ose-cluster-kube-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/726

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/727

Bug OCPBUGS-12165: Wrong cleanup of stale conditions from OCPBUGS-2783

View the Description View the linked PRs

Description of problem:

While updating a cluster to 4.12.11, which contains the bug fix for [OCPBUGS-7999|https://issues.redhat.com/browse/OCPBUGS-7999] (which is the 4.12.z backport of [OCPBUGS-2783|https://issues.redhat.com/browse/OCPBUGS-2783], it seems that the older {{{Custom|Default}RouteSync{Degraded|Progressing}}} conditions are not cleaned up as they should, as per [OCPBUGS-2783|https://issues.redhat.com/browse/OCPBUGS-2783] resolution, while the newer ones are added.

Due to this, on an upgrade to 4.12.11 (or higher, until this bug is fixed), it is possible to hit a problem very similar to the one that lead to [OCPBUGS-2783|https://issues.redhat.com/browse/OCPBUGS-2783] in the first place, but while upgrading to 4.12.11.

So, we need to do a proper cleanup of the older conditions.

Version-Release number of selected component (if applicable):

4.12.11 and higher

How reproducible:

Always in what regards the wrong conditions. It only leads to issues if one of the wrong conditions was in unhealthy state.

Steps to Reproduce:

1. Upgrade
2.
3.

Actual results:

Both new (and correct) conditions plus older (and wrong) conditions.

Expected results:

Both new (and correct) conditions only.

Additional info:

Problem seems to be that the stale conditions controller is created[1] with a list that says {{CustomRouteSync}} and {{DefaultRouteSync}}, while that list should be {{CustomRouteSyncDegraded}}, {{CustomRouteSyncProgressing}}, {{DefaultRouteSyncDegraded}} and {{DefaultRouteSyncProgressing}}. I read the source code of the controller a bit and it seems that it does not admit prefixes but performs a literal comparison.

[1] - https://github.com/openshift/console-operator/blob/0b54727/pkg/console/starter/starter.go#L403-L404

https://github.com/openshift/console-operator/pull/757

Bug OCPBUGS-11324: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/121

Bug OCPBUGS-30458: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/884

Bug OCPBUGS-32931: whereabout-cni add .snyk file

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/whereabouts-cni/pull/254

Task MON-3825: Bump downstream Prometheus to v2.51.2

View the linked PRs

Bug OCPBUGS-20466: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/873

Bug OCPBUGS-22969: Use v1 for flowcontrol API objects

View the Description View the linked PRs

flowcontrol v1beta3 is deprecated from 1.29, and will be removed in 1.32
update the OpenShift specific APF manifests to use v1

The flowcontrol manifests in the following operators (kas, oas, etcd, openshift controller manager, auth, and network) should use v1.

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/316

Bug OCPBUGS-24871: Update 4.16 ose-cluster-config-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-config-operator/pull/390

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-config-operator/pull/390

Story TRT-1623: Console operator change broke hypershift E2E

View the Description View the linked PRs

https://github.com/openshift/console-operator/pull/889 is causing failures in hypershift e2e https://testgrid.k8s.io/redhat-hypershift#4.16-aws-ovn

Some payloads are affected too. Retry sometimes avoided the problem. But this should still be reverted.

https://github.com/openshift/console-operator/pull/892

Task BUILD-854: Add adambkaplan as approver for core OCP repositories

View the Description View the linked PRs

Adam Kaplan will be assuming the role of "Staff Engineer" for core OpenShift for the team, taking over the role from Ben Parees. To expedite reviews and other OCP processes, Adam needs to be added back as an approver in the following repositories:

openshift/source-to-image
openshift/builder
openshift/openshift-controller-manager
openshift/cluster-openshift-controller-manager-operator

Adam may also need to be added to the OWNERS files in the following repos:

openshift/enhancements
openshift/api

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/338

Bug OCPBUGS-31384: api-int Certificate Authority rotation during 4.14.17 to 4.15.3 update

View the Description View the linked PRs

Description of problem:

In a cluster updating from 4.5.11 through many intermediate versions to 4.14.17 and on to 4.15.3 (initiated 2024-03-18T07:33:11Z), multus pods are sad about api-int X.509:

$ tar -xOz inspect.local.5020316083985214391/namespaces/openshift-kube-apiserver/core/events.yaml <hivei01ue1.inspect.local.5020316083985214391.gz | yaml2json | jq -r '[.items[] | select(.reason == "FailedCreatePodSandBox")][0].message'
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-928-ip-10-164-221-242.ec2.internal_openshift-kube-apiserver_9e87f20b-471a-447e-9679-edce26b4ef78_0(8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c): error adding pod openshift-kube-apiserver_installer-928-ip-10-164-221-242.ec2.internal to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c Netns:/var/run/netns/6e2b0b10-5006-4bf9-bd74-17333e0cdceb IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-928-ip-10-164-221-242.ec2.internal;K8S_POD_INFRA_CONTAINER_ID=8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c;K8S_POD_UID=9e87f20b-471a-447e-9679-edce26b4ef78 Path: StdinData:[REDACTED]} ContainerID:"8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c" Netns:"/var/run/netns/6e2b0b10-5006-4bf9-bd74-17333e0cdceb" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-928-ip-10-164-221-242.ec2.internal;K8S_POD_INFRA_CONTAINER_ID=8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c;K8S_POD_UID=9e87f20b-471a-447e-9679-edce26b4ef78" Path:"" ERRORED: error configuring pod [openshift-kube-apiserver/installer-928-ip-10-164-221-242.ec2.internal] networking: Multus: [openshift-kube-apiserver/installer-928-ip-10-164-221-242.ec2.internal/9e87f20b-471a-447e-9679-edce26b4ef78]: error waiting for pod: Get "https://api-int.REDACTED:6443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-928-ip-10-164-221-242.ec2.internal?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority

Version-Release number of selected component (if applicable)

4.15.3, so we have 4.15.2's ~~OCPBUGS-30304~~ but not 4.15.5's ~~OCPBUGS-30237~~.

How reproducible

Seen in two clusters after updating from 4.14 to 4.15.3.

Steps to Reproduce

Unclear.

Actual results

Sad multus pods.

Expected results

Happy cluster.

Additional info

$ openssl s_client -showcerts -connect api-int.REDACTED:6443 < /dev/null
...
Certificate chain
 0 s:CN = api-int.REDACTED
   i:CN = openshift-kube-apiserver-operator_loadbalancer-serving-signer@1710747228
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Mar 25 19:35:55 2024 GMT; NotAfter: Apr 24 19:35:56 2024 GMT
...
 1 s:CN = openshift-kube-apiserver-operator_loadbalancer-serving-signer@1710747228
   i:CN = openshift-kube-apiserver-operator_loadbalancer-serving-signer@1710747228
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Mar 18 07:33:47 2024 GMT; NotAfter: Mar 16 07:33:48 2034 GMT
...

So that's created seconds after the update was initiated. We have inspect logs for some namespaces, but they don't go back quite that far, because the machine-config roll at the end of the update into 4.15.3 rolled all the pods:

$ tar -xOz inspect.local.5020316083985214391/namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-6cbfdd467c-4ctq7/kube-apiserver-operator/kube-apiserver-operator/logs/current.log <hivei01ue1.inspect.local.5020316083985214391.gz | head -n2
2024-03-18T08:22:05.058253904Z I0318 08:22:05.056255       1 cmd.go:241] Using service-serving-cert provided certificates
2024-03-18T08:22:05.058253904Z I0318 08:22:05.056351       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.

We were able to recover individual nodes via:

oc config new-kubelet-bootstrap-kubeconfig > bootstrap.kubeconfig from any machine with an admin kubeconfig
copy to all nodes as /etc/kubernetes/kubeconfig
on each node rm /var/lib/kubelet/kubeconfig
restart each node
approve each kubelet CSR
delete the node's multus-* pod.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/800

Bug OCPBUGS-29583: [service-ca-operator] Apply hypershift cluster-profile for ibm-cloud-managed

View the Description View the linked PRs

Since HyperShift / Hosted Control Plane have adopted include.release.openshift.io/ibm-cloud-managed, to tailor the resources of clusters running in the ROKS IBM environment, the include.release.openshift.io/hypershift addition will allow Hypershift to express different profile choices than ROKS

https://github.com/openshift/service-ca-operator/pull/234

Bug OCPBUGS-31482: [External OIDC] console pods stuck in ContainerCreating status when issuerCertificateAuthority is set due to the CA configmap is not propagated to openshift-console namespace

View the Description View the linked PRs

Description of problem:

In the tested HCP external OIDC env, when issuerCertificateAuthority is set, console pods are stuck in ContainerCreating status. The reason is the CA configmap is not propagated to openshift-console namespace by the console operator.

Version-Release number of selected component (if applicable):

Latest 4.16 and 4.15 nightly payloads

How reproducible:

Always

Steps to Reproduce:

1. Configure HCP external OIDC env with issuerCertificateAuthority set.
2. Check oc get pods -A

Actual results:

2. Before OCPBUGS-31319 is fixed, console pods are in CrashLoopBackOff status. After OCPBUGS-31319 is fixed or manually coping the CA configmap to openshift-config namespace as workaround, console pods are stuck in ContainerCreating status until the CA configmap is manually copied to openshift-console namespace too. Console login is affected.

Expected results:

2. Console operator should be responsible to copy the CA to openshift-console namespace. And console login should succeed.

Additional info:

In https://redhat-internal.slack.com/archives/C060D1W96LB/p1711548626625499 , HyperShift Dev side Seth requested to create this separate console bug to unblock the PR merge and backport for OCPBUGS-31319 . So creating it

https://github.com/openshift/console-operator/pull/879

Bug OCPBUGS-5006: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/678

Task MON-3794: Bump downstream Prometheus to v2.51.1

View the linked PRs

Bug OCPBUGS-31303: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-samples-operator/pull/536

Bug OCPBUGS-19106: Update 4.15 ose-cluster-config-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-config-operator/pull/353

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-32400: Console-operator should update OIDC status based on the ExternalOIDC feature gate

View the Description View the linked PRs

Description of problem:

console-operator is updating the OIDC status without checking the feature gate

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Setup a OCP cluster without external OIDC provider, using default OAuth.

Steps to Reproduce:

    1. 
    2.
    3.

Actual results:

the OIDC related conditions are is being surfaced in the console-operator's config conditions.

Expected results:

    the OIDC related conditions should not be surfaced in the console-operator's config conditions.

Additional info:

https://github.com/openshift/console-operator/pull/887

Bug OCPBUGS-30502: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-resizer/pull/159

Task MON-3676: Remove raptorsun from OWNER files

View the Description View the linked PRs

Remove the github handle of Haoyu Sun from OWNER files of monitoring components.

https://github.com/openshift/prometheus/pull/194

Story TRT-1581: single-node installs are failing

View the Description View the linked PRs

aws single-node are failing starting with https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.nightly/release/4.16.0-0.nightly-2024-03-27-123853

periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node

periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node-serial

A bunch of operators are degraded, I did notice this but still investigating:

    - lastTransitionTime: '2024-03-27T15:56:02Z'
      message: 'OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve
        route from cache: route.route.openshift.io "oauth-openshift" not found        OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.201.206:443/healthz":
        dial tcp 172.30.201.206:443: connect: connection refused        OAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints
        "oauth-openshift" not found        ReadyIngressNodesAvailable: Authentication requires functional ingress which
        requires at least one schedulable and ready node. Got 0 worker nodes, 1 master
        nodes, 0 custom target nodes (none are schedulable or ready for ingress pods).        WellKnownAvailable: The well-known endpoint is not yet available: failed to
        get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap:
        configmap "oauth-openshift" not found (check authentication operator, it is
        supposed to create this)'

https://github.com/openshift/cluster-config-operator/pull/414

Bug OCPBUGS-29570: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-update-keys/pull/54

Bug OCPBUGS-6520: [CI Watcher] TestEditUnmanagedPodDisruptionBudget and other Unmanaged e2e tests in the console-operator are flaking

View the Description View the linked PRs

Description of problem:

TestEditUnmanagedPodDisruptionBudget and other Unmanaged e2e tests in the console-operator are flaking

https://search.ci.openshift.org/?search=FAIL%3A+TestEditUnmanagedPodDisruptionBudget&maxAge=168h&context=1&type=junit&name=pull-ci-openshift-console-operator-master-e2e-aws-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/console-operator/pull/726

Bug OCPBUGS-24916: Update 4.16 ose-service-ca-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/service-ca-operator/pull/227

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/service-ca-operator/pull/240

Bug OCPBUGS-28230: terminationMessagePolicy should be FallbackToLogsOnError

View the Description View the linked PRs

While debugging a problem, I noticed some containers lack FallbackToLogsOnError. This is important for debugging via the API. Found via https://github.com/openshift/origin/pull/28547

Bug OCPBUGS-29547: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/868

Story API-1673: Add description annotations to service CA secret and configmap

View the Description View the linked PRs

Service CA operator creates certificates and secrets to inject cert info into configmaps that request via annotation.

Those secrets and configmaps need to have ownership and description annotations to support cert ownership validation.

https://github.com/openshift/service-ca-operator/pull/225

Bug OCPBUGS-20129: The upgradenotification triggers itself in the cluster installation time

View the Description View the linked PRs

Description of problem:

We should be checking the `currentVersion` and `desiredVersion` for being empty.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/790

Bug OCPBUGS-31020: console-config sets telemeterClientDisabled: true when telemeter client is NOT disabled

View the Description View the linked PRs

Description of problem:

console-config sets telemeterClientDisabled: true even telemeter client is NOT disabled

Version-Release number of selected component (if applicable):

a cluster launched by image built with cluster-bot: build 4.16-ci,openshift/console#13677,openshift/console-operator#877

How reproducible:

Always

Steps to Reproduce:

1. Check if telemeter client is enabled
$ oc -n openshift-monitoring get pod | grep telemeter-clienttelemeter-client-7cc8bf56db-7wcs5                       3/3     Running   0          83m 
$ oc get cm cluster-monitoring-config -n openshift-monitoring
Error from server (NotFound): configmaps "cluster-monitoring-config" not found

2. Check console-config settings
$ oc get cm console-config -n openshift-console -o yaml
apiVersion: v1
data:
  console-config.yaml: |
    apiVersion: console.openshift.io/v1
    auth:
      authType: openshift
      clientID: console
      clientSecretFile: /var/oauth-config/clientSecret
      oauthEndpointCAFile: /var/oauth-serving-cert/ca-bundle.crt
    clusterInfo:
      consoleBaseAddress: https://xxxxx
      controlPlaneTopology: HighlyAvailable
      masterPublicURL: https://xxxxx:6443
      nodeArchitectures:
      - amd64
      nodeOperatingSystems:
      - linux
      releaseVersion: 4.16.0-0.test-2024-03-18-024238-ci-ln-0q7bq2t-latest
    customization:
      branding: ocp
      documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.16/
    kind: ConsoleConfig
    monitoringInfo:
      alertmanagerTenancyHost: alertmanager-main.openshift-monitoring.svc:9092
      alertmanagerUserWorkloadHost: alertmanager-main.openshift-monitoring.svc:9094
    plugins:
      monitoring-plugin: https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/
    providers: {}
    servingInfo:
      bindAddress: https://[::]:8443
      certFile: /var/serving-cert/tls.crt
      keyFile: /var/serving-cert/tls.key
    session: {}
    telemetry:
      telemeterClientDisabled: "true"
kind: ConfigMap
metadata:
  creationTimestamp: "2024-03-19T01:20:23Z"
  labels:
    app: console
  name: console-config
  namespace: openshift-console
  resourceVersion: "27723"
  uid: 2f9282c3-1c4a-4400-9908-4e70025afc33

Actual results:

in cm/console-config, telemeterClientDisabled is set with 'true'

Expected results:

telemeterClientDisabled property should reveal the real status of telemeter client

telemeter client is not disabled because 
1. telemeter client pod is running
2. user didn't disable telemeter client manually because 'cluster-monitoring-config' configmap doesn't exist

Additional info:

https://github.com/openshift/console-operator/pull/877

Story WRKLDS-1071: [R&D] revision controller spinning 30+ revisions

View the Description View the linked PRs

Slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1705425516419799

A revision controller is spinning to many revisions.

Goal: update the revision controller code to temporarily log config changes to validate the newly created revisions are valid. Or, proof some new revisions are unnecessary.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/796

4.7.0-0.ci-2024-03-26-073721

Changes from 4.6.62

Complete Features

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Feature Overview.

Goals

Requirements

(Optional) Use Cases

Questions to answer…

Out of Scope

Background, and strategic fit

Assumptions

Customer Considerations

Documentation Considerations

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Epic Goal

Why is this important?

Acceptance Criteria

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Feature Overview

Goal:

Benefits:

Requirements

Background:

Goals:

Acceptance Criteria:

Current UI in OpenShift console:

Goal

Why is this important?

Acceptance Criteria

Epic Goal

Why is this important?

Acceptance Criteria

Previous Work (Optional):

Done Checklist

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Why?

Overview

DoD

< High-Level description of the feature ie: Executive Summary >

Goals

Requirements

(Optional) Use Cases

Out of scope