Jump to: Complete Features | Incomplete Features | Complete Epics | Incomplete Epics | Other Complete | Other Incomplete |
Note: this page shows the Feature-Based Change Log for a release
These features were completed when this image was assembled
Currently the Get started with on-premise host inventory quickstart gets delivered in the Core console. If we are going to keep it here we need to add the MCE or ACM operator as a prerequisite, otherwise it's very confusing.
Goal:
As an administrator, I would like to deploy OpenShift 4 clusters to AWS C2S region
Problem:
Customers were able to deploy to AWS C2S region in OCP 3.11, but our global configuration in OCP 4.1 doesn't support this.
Why is this important:
Lifecycle Information:
Previous Work:**
Here are the relevant PRs from OCP 3.11. You can see that these endpoints are not part of the standard SDK (they use an entirely separate SDK). To support these regions the endpoints had to be configured explicitly.
Seth Jennings has put together a highly customized POC.
Dependencies:
Prioritized epics + deliverables (in scope / not in scope):
Related : https://jira.coreos.com/browse/CORS-1271
Estimate (XS, S, M, L, XL, XXL): L
Customers: North America Public Sector and Government Agencies
Open Questions:
https://github.com/openshift/enhancements/pull/555
https://github.com/openshift/api/pull/827
The console operator will need to support single-node clusters.
We have a console deployment and downloads deployment. Each will to be updated so that there's only a single replica when high availability mode is disabled in the Infrastructure config. We should also remove the anti-affinity rule in the console deployment that tries to spread console pods across nodes.
The downloads deployment is currently a static manifest. That likely needs to be created by the console operator instead going forward.
Acceptance Criteria:
Bump github.com/openshift/api to pickup changes from openshift/api#827
OCP/Telco Definition of Done
Feature Template descriptions and documentation.
Early customer feedback is that they see SNO as a great solution covering smaller footprint deployment, but are wondering what is the evolution story OpenShift is going to provide where more capacity or high availability are needed in the future.
While migration tooling (moving workload/config to new cluster) could be a mid-term solution, customer desire is not to include extra hardware to be involved in this process.
For Telecommunications Providers, at the Far Edge they intend to start small and then grow. Many of these operators will start with a SNO-based DU deployment as an initial investment, but as DUs evolve, different segments of the radio spectrum are added, various radio hardware is provisioned and features delivered to the Far Edge, the Telecommunication Providers desire the ability for their Far Edge deployments to scale up from 1 node to 2 nodes to n nodes. On the opposite side of the spectrum from SNO is MMIMO where there is a robust cluster and workloads use HPA.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
This is a ticket meant to track all the all the OCP PRs that are involved in the implementation of the SNO + workers enhancement
In the console-operator repo we need to add `capability.openshift.io/console` annotation to all the manifests that the operator either contains creates on the fly.
Manifests are currently present in /bindata and /manifest directories.
Here is example of the insights-operator change.
Here is the overall enhancement doc.
We need to continue to maintain specific areas within storage, this is to capture that effort and track it across releases.
Goals
Requirements
Requirement | Notes | isMvp? |
---|---|---|
Telemetry | No | |
Certification | No | |
API metrics | No | |
Out of Scope
n/a
Background, and strategic fit
With the expected scale of our customer base, we want to keep load of customer tickets / BZs low
Assumptions
Customer Considerations
Documentation Considerations
Notes
In progress:
High prio:
Unsorted
Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.
Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.
This includes (but is not limited to):
Operators:
Update all CSI sidecars to the latest upstream release.
This includes update of VolumeSnapshot CRDs in https://github.com/openshift/cluster-csi-snapshot-controller-operator/tree/master/assets
On new installations, we should make the StorageClass created by the CSI operator the default one.
However, we shouldn't do that on an upgrade scenario. The main reason is that users might have set a different quota on the CSI driver Storage Class.
Exit criteria:
OpenShift console supports new features and elevated experience for Operator Lifecycle Manager (OLM) Operators and Cluster Operators.
OCP Console improves the controls and visibility for managing vendor-provided software in customers’ infrastructure and making these solutions available for customers' internal users.
To achieve this,
We want to make sure OLM’s and Cluster Operators' new features are exposed in the console so admin console users can benefit from them.
Requirement | Notes | isMvp? |
---|---|---|
OCP console supports the latest OLM APIs and features | This is a requirement for ALL features. | YES |
OCP console improves visibility to Cluster Operators related resources and features. | This is a requirement for ALL features. | YES |
(Optional) Use Cases
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Main success scenarios - high-level user stories
* Alternate flow/scenarios - high-level user stories
* ...
Questions to answer...
How will the user interact with this feature?
Which users will use this and when will they use it?
Is this feature used as part of the current user interface?
Out of Scope
<--- Remove this text when creating a Feature in Jira, only for reference --->
# List of non-requirements or things not included in this feature
# ...
Background, and strategic fit
<--- Remove this text when creating a Feature in Jira, only for reference --->
What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Assumptions
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there assumptions being made regarding prerequisites and dependencies?
* Are there assumptions about hardware, software or people resources?
* ...
Customer Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there specific customer environments that need to be considered (such as working with existing h/w and software)?
...
Documentation Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?
OpenShift console allows users (cluster admins) to change the state of the “default hub sources” for OperatorHub on the cluster from “enabled” to “disabled” and vice versa through “Global Configuration → OperatorHub” view from the “Cluster Settings” view.
Starting from OpenShift 4.4, the console and OLM provides richer configurations for the ‘CatalogSource’ objects that enable users to create their curated sources for OperatorHub with custom “Display Name”, “URL of Image Registry”, and the “Polling Interval” for updating the custom OperatorHub source.
This epic is about reflecting/exposing the newer capabilities on the ‘OperatorHub’ (Cluster Config view) and the ‘CatalogSource’ list and details views.
1. As an admin user of console, I'd like to:
easily disable/enable the predefined Operator sources for the OperatorHub
so that I can:
control the sources of the Operators my cluster users see on the OperatorHub view.
2. As an admin user of console, I'd like to:
easily understand how to add/edit/view/remove my custom Operator catalogSource for the OperatorHub
so that I can
easier managing (add/edit/view/remove) my custom sources for the OperatorHub
3. As an admin user of console, I'd like to:
easily see the configurations and status of my custom Operator catalogSource for the OperatorHub
so that I can
easier managing (review/edit) the configurations of my custom sources for the OperatorHub
Change the state of the default hub sources for OperatorHub on the cluster from enabled to disabled and vice versa. Add and manage your curated sources for OperatorHub on the Sources tab with custom Display Name, URL of Image Registry, and the Polling Interval for updating your custom OperatorHub source.
URL/k8s/cluster/config.openshift.io~v1~OperatorHub/cluster/sources
See current console screenshots in the attachments for reference:
The Sources tab list view now includes 2 new columns for the Catalog Sources:
Action menu changes:
Improve ‘CatalogSource’ details view (on the “Details” tab)
Feature Overview
PF4 has introduced a bevy of new components to really enhance the user experience, specifically around list views. These new components make it easier than ever for customers to find the data they care about and to take action.
Goals
Requirements
Requirement | Notes | isMvp? |
---|---|---|
Improvements must be applied to all resource list view pages including the search page | This is a requirement for ALL features. | YES |
Search page will be the only page that needs "Saved Searches" | This is a requirement for ALL features. | NO |
All components updated must be PF4 supported | This is a requirement for ALL features. | YES |
Questions to answer...
How will the user interact with this feature?
Which users will use this and when will they use it?
Is this feature used as part of the current user interface?
Out of Scope
<--- Remove this text when creating a Feature in Jira, only for reference --->
# List of non-requirements or things not included in this feature
# ...
Background, and strategic fit
<--- Remove this text when creating a Feature in Jira, only for reference --->
What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Assumptions
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there assumptions being made regarding prerequisites and dependencies?
* Are there assumptions about hardware, software or people resources?
* ...
Customer Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there specific customer environments that need to be considered (such as working with existing h/w and software)?
...
Documentation Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?
The inventory card needs a couple of visual tweaks:
Attaching the desired design from Michael
Feature Overview
This will be phase 1 of Internationalization of the OpenShift Console.
Phase 1 will include the following:
Phase 1 will not include:
Initial List of Languages to Support
---------- 4.7* ----------
*This will be based on the ability to get all the strings externalized, there is a good chance this gets pushed to 4.8.
---------- Post 4.7 ----------
POC
Goals
Internationalization has become table stakes. OpenShift Console needs to support different languages in each of the major markets. This is key functionality that will help unlock sales in different regions.
Requirements
Requirement | Notes | isMvp? |
---|---|---|
Language Selector | YES | |
Localized Date. + Time | YES | |
Externalization and translation of all client side strings | YES | |
Translation for Chinese and Japanese | YES | |
Process, infra, and testing capabilities put into place | YES | |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Out of Scope
Assumptions
Customer Considerations
We are rolling this feature in phases, based on customer feedback, there may be no phase 2.
Documentation Considerations
I believe documentation already supports a large language set.
Externalize strings in the Monitoring nav section (Alerting, Metrics, Dashboards).
Externalize strings in the pages under the console Workloads nav section.
i18n for Horizontal Pod AutoScalers
Bump i18next dependencies to the latest versions.
Add zh-cn and ja-jp translations.
Externalize strings in the Compute nav section (Nodes, Machines, Machine Sets, Machine Autoscalers, Machine Health Checks, Machine Configs, Machine Config Pools).
We need to translate following common components used in the list and details pages:
Includes list page component, filter toolbar, details page breadcrumbs, resource summary/details item, common details page headings, managed by operator link, and the conditions table.
Externalize strings in the User Management nav section (Users, Groups, Service Accounts, Roles, Role Bindings).
By default, we should use the user's browser preference for language, but we should give users a language selector to override. Here is a proposed design:
https://docs.google.com/document/d/17iIBDlEneu0DNhWi2TkQShRSobQZubWOhS0OQr_T3hE/edit#
Externalize strings in the pages under the console Home nav section.
externalize strings for the home / search page
Externalize strings in Home / Explore pages
Externalize strings for Home / Events section
Externalize strings for Home / project / dashboard section
externalize strings for home / namespace pages
Externalize strings in the Builds nav section (Build Configs, Builds, Image Streams).
Update i18n files/scripts to support Korean in advance of translation work from Terry.
Specifically:
Check in translations from Sprint 192.
We have too many namespaces if we're loading them upfront. We should consolidate some of the files.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Rebase openshift-controller-manager to k8s 1.24
Assumption
Doc: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit
Customers do not pay Red Hat more to run HyperShift control planes and supporting infrastructure than Standalone control planes and supporting infrastructure.
Assumption
Run cluster-storage-operator (CSO) + AWS EBS CSI driver operator + AWS EBS CSI driver control-plane Pods in the management cluster, run the driver DaemonSet in the hosted cluster.
More information here: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit
As HyperShift Cluster Instance Admin, I want to run AWS EBS CSI driver operator + control plane of the CSI driver in the management cluster, so the guest cluster runs just my applications.
Exit criteria:
Cluster administrators need an in-product experience to discover and install new Red Hat offerings that can add high value to developer workflows.
Requirements | Notes | IS MVP |
Discover new offerings in Home Dashboard | Y | |
Access details outlining value of offerings | Y | |
Access step-by-step guide to install offering | N | |
Allow developers to easily find and use newly installed offerings | Y | |
Support air-gapped clusters | Y |
< What are we making, for who, and why/what problem are we solving?>
Discovering solutions that are not available for installation on cluster
No known dependencies
Background, and strategic fit
None
Quick Starts
Developers using Dev Console need to be made aware of the RH developer tooling available to them.
Provide awareness to developers using Dev Console of the RH developer tooling that is available to them, including:
Consider enhancing the +Add page and/or the Guided tour
Provide a Quick Start for installing the Cryostat Operator
To increase usage of our RH portfolio
This issue is to handle the PR comment - https://github.com/openshift/console-operator/pull/770#pullrequestreview-1501727662 for the issue https://issues.redhat.com/browse/ODC-7292
Feature Goal*
What is our purpose in implementing this? What new capability will be available to customers?
The goal of this feature is to provide a consistent, predictable and deterministic approach on how the default storage class(es) is managed.
Why is this important? (mandatory)
The current default storage class implementation has corner cases which can result in PVs staying in pending because there is either no default storage class OR multiple storage classes are defined
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
No default storage class
In some cases there is no default SC defined, this can happen during OCP deployment where components such as the registry request a PV whereas the SC are not been defined yet. This can also happen during a change in default SC, there won't be any between the admin unset the current one and set the new on.
Another user creates PVC requesting a default SC, by leaving pvc.spec.storageClassName=nil. The default SC does not exist at this point, therefore the admission plugin leaves the PVC untouched with pvc.spec.storageClassName=nil.
The admin marks SC2 as default.
PV controller, when reconciling the PVC, updates pvc.spec.storageClassName=nil to the new SC2.
PV controller uses the new SC2 when binding / provisioning the PVC.
The installer creates a default SC.
PV controller, when reconciling the PVC, updates pvc.spec.storageClassName=nil to the new default SC.
PV controller uses the new default SC when binding / provisioning the PVC.
Multiple Storage Classes
In some cases there are multiple default SC, this can be an admin mistake (forget to unset the old one) or during the period where a new default SC is created but the old one is still present.
New behavior:
-> the PVC will get the default storage class with the newest CreationTimestamp (i.e. B) and no error should show.
-> admin will get an alert that there are multiple default storage classes and they should do something about it.
CSI that are shipped as part of OCP
The CSI drivers we ship as part of OCP are deployed and managed by RH operators. These operators automatically create a default storage class. Some customers don't like this approach and prefer to:
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
No external dependencies.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Can bring confusion to customer as there is a change in the default behavior customer are used to. This needs to be carefully documented.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Create a GCP cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in GCP) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.
Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.
Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").
Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.
Goals
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
List any affected packages or components.
This epic covers the work to apply user defined labels GCP resources created for openshift cluster available as tech preview.
The user should be able to define GCP labels to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags/labels in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for labeling when the resources are created.
Updating/deleting of labels added during cluster creation or adding new labels as Day-2 operation is out of scope of this epic.
List any affected packages or components.
Reference - https://issues.redhat.com/browse/RFE-2017
cluster-config-operator makes Infrastructure CRD available for installer, which is included in it's container image from the openshift/api package and requires the package to be updated to have the latest CRD.
As our customers create more and more clusters, it will become vital for us to help them support their fleet of clusters. Currently, our users have to use a different interface(ACM UI) in order to manage their fleet of clusters. Our goal is to provide our users with a single interface for managing a fleet of clusters to deep diving into a single cluster. This means going to a single URL – your Hub – to interact with your OCP fleet.
The goal of this tech preview update is to improve the experience from the last round of tech preview. The following items will be improved:
Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster.
Why customers want this?
Why we want this?
Phase 2 Goal: Productization of the united Console
Description of problem:
There is a possible race condition in the console operator where the managed cluster config gets updated after the console deployment and doesn't trigger a rollout.
Version-Release number of selected component (if applicable):
4.10
How reproducible:
Rarely
Steps to Reproduce:
1. Enable multicluster tech preview by adding TechPreviewNoUpgrade featureSet to FeatureGate config. (NOTE THIS ACTION IS IRREVERSIBLE AND WILL MAKE THE CLUSTER UNUPGRADEABLE AND UNSUPPORTED) 2. Install ACM 2.5+ 3. Import a managed cluster using either the ACM console or the CLI 4. Once that managed cluster is showing in the cluster dropdown, import a second managed cluster
Actual results:
Sometimes the second managed cluster will never show up in the cluster dropdown
Expected results:
The second managed cluster eventually shows up in the cluster dropdown after a page refresh
Additional info:
Migrated from bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2055415
In order for hub cluster console OLM screens to behave as expected in a multicluster environment, we need to gather "copiedCSVsDisabled" flags from managed clusters so that the console backend/frontend can consume this information.
AC:
Allow to configure compute and control plane nodes on across multiple subnets for on-premise IPI deployments. With separating nodes in subnets, also allow using an external load balancer, instead of the built-in (keepalived/haproxy) that the IPI workflow installs, so that the customer can configure their own load balancer with the ingress and API VIPs pointing to nodes in the separate subnets.
I want to install OpenShift with IPI on an on-premise platform (high priority for bare metal and vSphere) and I need to distribute my control plane and compute nodes across multiple subnets.
I want to use IPI automation but I will configure an external load balancer for the API and Ingress VIPs, instead of using the built-in keepalived/haproxy-based load balancer that come with the on-prem platforms.
Customers require using multiple logical availability zones to define their architecture and topology for their datacenter. OpenShift clusters are expected to fit in this architecture for the high availability and disaster recovery plans of their datacenters.
Customers want the benefits of IPI and automated installations (and avoid UPI) and at the same time when they expect high traffic in their workloads they will design their clusters with external load balancers that will have the VIPs of the OpenShift clusters.
Load balancers can distribute incoming traffic across multiple subnets, which is something our built-in load balancers aren't able to do and which represents a big limitation for the topologies customers are designing.
While this is possible with IPI AWS, this isn't available with on-premise platforms installed with IPI (for the control plane nodes specifically), and customers see this as a gap in OpenShift for on-premise platforms.
Epic | Control Plane with Multiple Subnets | Compute with Multiple Subnets | Doesn't need external LB | Built-in LB |
---|---|---|---|---|
NE-1069 (all-platforms) | ✓ | ✓ | ✓ | ✓ |
NE-905 (all-platforms) | ✓ | ✓ | ✓ | ✕ |
✓ | ✓ | ✓ | ✓ | |
✓ | ✓ | ✓ | ✓ | |
✓ | ✓ | ✓ | ||
✓ | ✓ | ✓ | ✕ | |
NE-905 (all platforms) | ✓ | ✓ | ✓ | ✕ |
✓ | ✓ | ✓ | ✓ | |
✕ | ✓ | ✓ | ✓ | |
✕ | ✓ | ✓ | ✓ | |
✕ | ✓ | ✓ | ✓ |
Workers on separate subnets with IPI documentation
We can already deploy compute nodes on separate subnets by preventing the built-in LBs from running on the compute nodes. This is documented for bare metal only for the Remote Worker Nodes use case: https://docs.openshift.com/container-platform/4.11/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.html#configure-network-components-to-run-on-the-control-plane_ipi-install-installation-workflow
This procedure works on vSphere too, albeit no QE CI and not documented.
External load balancer with IPI documentation
As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers
Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy.
vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409
This is needed once the API patch for External LB has merged.
Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.
Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories
Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.
This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.
Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.
This includes (but is not limited to):
Operators:
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
This work will require updates to the core OpenShift API repository to add the new platform type, and then a distribution of this change to all components that use the platform type information. For components that partners might replace, per-component action will need to be taken, with the project team's guidance, to ensure that the component properly handles the "External" platform. These changes will look slightly different for each component.
To integrate these changes more easily into OpenShift, it is possible to take a multi-phase approach which could be spread over a release boundary (eg phase 1 is done in 4.X, phase 2 is done in 4.X+1).
OCPBU-5: Phase 1
OCPBU-510: Phase 2
OCPBU-329: Phase.Next
Phase 1
Phase 2
Phase 3
As a Red Hat Partner installing OpenShift using the External platform type, I would like to install my own Cloud Controller Manager(CCM). Having a field in the Infrastructure configuration object to signal that I will install my own CCM and that Kubernetes should be configured to expect an external CCM will allow me to run my own CCM on new OpenShift deployments.
This work has been defined in the External platform enhancement , and had previously been part of openshift/api . The CCM API pieces were removed for the 4.13 release of OpenShift to ensure that we did not ship unused portions of the API.
In addition to the API changes, library-go will need to have an update to the IsCloudProviderExternal function to detect the if the External platform is selected and if the CCM should be enabled for external mode.
We will also need to check the ObserveCloudVolumePlugin function to ensure that it is not affected by the external changes and that it continues to use the external volume plugin.
After updating openshift/library-go, it will need to be re-vendored into the MCO , KCMO , and CCCMO (although this is not as critical as the other 2).
Update ETCD datastore encryption to use AES-GCM instead of AES-CBC
2. What is the nature and description of the request?
The current ETCD datastore encryption solution uses the aes-cbc cipher. This cipher is now considered "weak" and is susceptible to padding oracle attack. Upstream recommends using the AES-GCM cipher. AES-GCM will require automation to rotate secrets for every 200k writes.
The cipher used is hard coded.
3. Why is this needed? (List the business requirements here).
Security conscious customers will not accept the presence and use of weak ciphers in an OpenShift cluster. Continuing to use the AES-CBC cipher will create friction in sales and, for existing customers, may result in OpenShift being blocked from being deployed in production.
4. List any affected packages or components.
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
The Kube APIserver is used to set the encryption of data stored in etcd. See https://docs.openshift.com/container-platform/4.11/security/encrypting-etcd.html
Today with OpenShift 4.11 or earlier, only aescbc is allowed as the encryption field type.
RFE-3095 is asking that aesgcm (which is an updated and more recent type) be supported. Furthermore RFE-3338 is asking for more customizability which brings us to how we have implemented cipher customzation with tlsSecurityProfile. See https://docs.openshift.com/container-platform/4.11/security/tls-security-profiles.html
Why is this important? (mandatory)
AES-CBC is considered as a weak cipher
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.
Console operator should be building up a set of cluster nodes OS types, which he should supply to console, so it renders only operators that could be installed on the cluster.
This will be needed when we will support different OS types on the cluster.
We need to scan through the compute nodes and build a set of supported OS from those. Each node on the cluster has a label for its operating system: e.g. kubernetes.io/os=linux,
AC:
Pre-Work Objectives
Since some of our requirements from the ACM team will not be available for the 4.12 timeframe, the team should work on anything we can get done in the scope of the console repo so that when the required items are available in 4.13, we can be more nimble in delivering GA content for the Unified Console Epic.
Overall GA Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster.
Why customers want this?
Why we want this?
Phase 2 Goal: Productization of the united Console
As a developer I would like to disable clusters like *KS that we can't support for multi-cluster (for instance because we can't authenticate). The ManagedCluster resource has a vendor label that we can use to know if the cluster is supported.
cc Ali Mobrem Sho Weimer Jakub Hadvig
UPDATE: 9/20/22 : we want an allow-list with OpenShift, ROSA, ARO, ROKS, and OpenShiftDedicated
Acceptance criteria:
Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster.
Why customers want this?
Why we want this?
Phase 2 Goal: Productization of the united Console
We need a way to show metrics for workloads running on spoke clusters. This depends on ACM-876, which lets the console discover the monitoring endpoints.
Open Issues:
We will depend on ACM to create a route on each spoke cluster for the prometheus tenancy service, which is required for metrics for normal users.
Openshift console backend should proxy managed cluster monitoring requests through the MCE cluster proxy addon to prometheus services on the managed cluster. This depends on https://issues.redhat.com/browse/ACM-1188
Description/Acceptance Criteria:
1. Proposed title of this feature request
BYOK encrypts root vols AND default storageclass
2. What is the nature and description of the request?
User story
As a customer spinning up managed OpenShift clusters, if I pass a custom AWS KMS key to the installer, I expect it (installer and cluster-storage-operator) to not only encrypt the root volumes for the nodes in the cluster, but also be applied to encrypt the first/default (gp2 in current case) StorageClass, so that my assumptions around passing a custom key are met.
In current state, if I pass a KMS key to the installer, only root volumes are encrypted with it, and the default AWS managed key is used for the default StorageClass.
Perhaps this could be offered as a flag to set in the installer to further pass the key to the storage class, or not.
3. Why does the customer need this? (List the business requirements here)
To satisfy that customers wish to encrypt their owned volumes with their selected key instead of the AWS default account key, by accident.
4. List any affected packages or components.
Note: this implementation should take effect on AWS, GCP and Azure (any cloud provider) equally.
As a cluster admin, I want OCP to provision new volumes with my custom encryption key that I specified during cluster installation in install-config.yaml so all OCP assets (PVs, VMs & their root disks) use the same encryption key.
Description of criteria:
Re-encryption of existing PVs with a new key. Only newly provisioned PVs will use the new key.
Enhancement (incl. TBD API with encryption key reference) will be provided as part of https://issues.redhat.com/browse/CORS-2080.
"Raw meat" of this story is translation of the key reference in TBD API to StorageClass.Parameters. AWS EBS CSi driver operator should update both the StorageClass it manages (managed-csi) with:
Parameters:
encrypted: "true"
kmsKeyId: "arn:aws:kms:us-east-1:012345678910:key/abcd1234-a123-456a-a12b-a123b4cd56ef"
Upstream docs: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/parameters.md
Enable sharing ConfigMap and Secret across namespaces
Requirement | Notes | isMvp? |
---|---|---|
Secrets and ConfigMaps can get shared across namespaces | YES |
NA
NA
Consumption of RHEL entitlements has been a challenge on OCP 4 since it moved to a cluster-based entitlement model compared to the node-based (RHEL subscription manager) entitlement mode. In order to provide a sufficiently similar experience to OCP 3, the entitlement certificates that are made available on the cluster (OCPBU-93) should be shared across namespaces in order to prevent the need for cluster admin to copy these entitlements in each namespace which leads to additional operational challenges for updating and refreshing them.
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
As an OpenShift cluster admin
I want to use the TechPreview feature set to try out CSI volumes in builds
So that I can test using CSI volumes in builds through tech preview features
Product documentation will not be required until BUILD-275 is complete. Documentation for CSI volumes in builds will need to note that the TechPreviewNoUpgrade feature set needs to be enabled on the cluster.
Additional training enablement materials may not be needed - product docs should be sufficient.
Full e2e testing may not be feasible until BUILD-275 is completed.
CI testing should verify that the appropriate configuration values were passed to the build controller.
We will likely need a new CI job that installs the cluster with tech preview enabled before we verify that the BuildCSIVolumes feature gate has been enabled.
OpenShift already has feature gates baked into the core platform via the FeatureGate API object. For this feature, we need to declare a feature gate that is added to the TechPreviewNoUpgrade feature set, which openshift-controller-manager-operator then reads and applies to the build controller.
Feature gate needs to be proposed to openshift/api (add to the TechPreviewNoUpgrade feature set).
An example PR on how to do this: https://github.com/openshift/api/pull/982.
Once approved, the updated tech preview feature set needs to be vendored into openshift/library-go.
Openshift-controller-manager-operator needs to read the feature gate, pass it on to the build controller.
The build controller has its own configuration "API" - this was a relic of the 3.x master configuration that is not exposed to admins in OCP 4.x: https://github.com/openshift/api/blob/master/openshiftcontrolplane/v1/types.go#L198-L207
A separate operator looks checks if a *NoUpgrade feature set is enabled, and if so marks the cluster as unable to be upgraded to the next minor OCP
release.
To test this in CI, we need a suite that runs with the TechPreviewNoUpgrade feature set enabled. The step registry has primitives which bring up a cluster with tech preview features enabled. We will need to update ocm-o's CI configuration to run our operator tests with tech preview enabled. Testing for this specific feature will need to have separate logic that verifies we are sending the right configuration to the build controller under normal and TechPreview mode.
Existing techpreview CI step registry setups (note the per cloud elements, which make sense, since the existing CSI drivers are per cloud):
/ci-operator/step-registry/ipi/aws/pre/techpreview
./ci-operator/step-registry/ipi/azure/pre/techpreview
./ci-operator/step-registry/ipi/conf/aws/techpreview
./ci-operator/step-registry/ipi/conf/azure/techpreview
./ci-operator/step-registry/ipi/conf/techpreview
./ci-operator/step-registry/ipi/conf/openstack/techpreview
./ci-operator/step-registry/ipi/openstack/pre/techpreview
./ci-operator/step-registry/openshift/e2e/aws/techpreview
./ci-operator/step-registry/openshift/e2e/gcp/techpreview
./ci-operator/step-registry/openshift/e2e/azure/techpreview
./ci-operator/step-registry/openshift/e2e/openstack/techpreview
./ci-operator/step-registry/openshift/e2e/vsphere/techpreview
Given shared resources span all clouds, etc. does that mean we touch each of these, or create a new one, or both?
More details at ARO managed identity scope and impact.
This Section: A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
This document describes the expectations for promoting a feature that is behind a feature gate.
The criteria includes:
Create a Azure cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in Azure) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.
Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.
Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").
Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.
Goals
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
List any affected packages or components.
This epic covers the work to apply user defined tags to Azure created for openshift cluster available as tech preview.
The user should be able to define the azure tags to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for tagging when the resources are created.
Updating/deleting of tags added during cluster creation or adding new tags as Day-2 operation is out of scope of this epic.
List any affected packages or components.
Reference - https://issues.redhat.com/browse/RFE-2017
cluster-config-operator makes Infrastructure CRD available for installer, which is included in it's container image from the openshift/api package and requires the package to be updated to have the latest CRD.
Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.
Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.
This includes (but is not limited to):
Operators:
EOL, do not upgrade:
Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.
(Using separate cards for each driver because these updates can be more complicated)
Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories
Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.
This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
Why is this important? (mandatory)
What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
The observable functionality that the user now has as a result of receiving this feature. Complete during New status.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.
Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
As a developer of serverless functions, we don't provide any samples.
Provide Serverless Function samples in the sample catalog. These would be utilizing the Builder Image capabilities.
As an operator author, I want to provide additional samples that are tied to an operator version, not an OpenShift release. For that, I want to create a resource to add new samples to the web console.
Extend the Workload Partitioning feature to support multi-node clusters.
Customers running RAN workloads on C-RAN Hubs (i.e. multi-node clusters) that want to maximize the cores available to the workloads (DU) should be able to utilize WP to isolate CP processes to reserved cores.
Requirements
A list of specific needs or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
requirement | Notes | isMvp? |
< How will the user interact with this feature? >
< Which users will use this and when will they use it? >
< Is this feature used as part of current user interface? >
< What does the person writing code, testing, documenting need to know? >
< Are there assumptions being made regarding prerequisites and dependencies?>
< Are there assumptions about hardware, software or people resources?>
< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>
< Are there Upgrade considerations that customers need to account for or that the feature should address on behalf of the customer?>
<Does the Feature introduce data that could be gathered and used for Insights purposes?>
< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >
< What does success look like?>
< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact>
< If unsure and no Technical Writer is available, please contact Content Strategy. If yes, complete the following.>
< Which other products and versions in our portfolio does this feature impact?>
< What interoperability test scenarios should be factored by the layered product(s)?>
Question | Outcome |
When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release
For users who are using OpenShift but have not yet begun to explore multicluster and we we offer them.
I'm investigating where Learning paths are today and what is required.
As a user I'd like to have learning path for how to get started with Multicluster.
Install MCE
Create multiple clusters
Use HyperShift
Provide access to cluster creation to devs via templates
Scale up to ACM/ACS (OPP?)
Status
https://github.com/patternfly/patternfly-quickstarts/issues/37#issuecomment-1199840223
Goal: Resources provided via the Dynamic Resource Allocation Kubernetes mechanism can be consumed by VMs.
Details: Dynamic Resource Allocation
Come up with a design of how resources provided by Dynamic Resource Allocation can be consumed by KubeVirt VMs.
The Dynamic Resource Allocation (DRA) feature is an alpha API in Kubernetes 1.26, which is the base for OpenShift 4.13.
This feature provides the ability to create ResourceClaim and ResourceClasse to request access to Resources. This is similar to the dynamic provisioning of PersistentVolume via PersistentVolumeClaim and StorageClasse.
NVIDIA has been a lead contributor to the KEP and has already an initial implementation of a DRA driver and plugin, with a nice demo recording. NVIDIA is expecting to have this DRA driver available in CY23 Q3 or Q4, so likely in NVIDIA GPU Operator v23.9, around OpenShift 4.14.
When asked about the availability of MIG-backed vGPU for Kubernetes, NVIDIA said that the timeframe is not decided yet, because it will likely use DRA for the MIG devices creation and their registration with the vGPU host driver. The MIG-base vGPU feature for OpenShift Virtualization will then likely require support of DRA to request vGPU resources for the VMs.
Not having MIG-backed vGPU is a risk for OpenShift Virtualization adoption in GPU use cases, such as virtual workstations for rendering with Windows-only softwares. Customers who want to have a mix of passthrough, time-based vGPU and MIG-backed vGPU will prefer competitors who offer the full range of options. And the certification of NVIDIA solutions like NVIDIA Omniverse will be blocked, despite a great potential to increase the OpenShift consumption, as it uses RTX/A40 GPU for virtual workstations (not certified by NVIDIA on OpenShift Virtualization yet) and A100/H100 for physics simulation, both use cases probably leveraring vGPUs [7]. There's a lot of necessary conditions for that to happen and MIG-backed vGPU support is one of them.
Who | What | Reference |
---|---|---|
DEV | Upstream roadmap issue (or individual upstream PRs) | <link to GitHub Issue> |
DEV | Upstream documentation merged | <link to meaningful PR> |
DEV | gap doc updated | <name sheet and cell> |
DEV | Upgrade consideration | <link to upgrade-related test or design doc> |
DEV | CEE/PX summary presentation | label epic with cee-training and add a <link to your support-facing preso> |
QE | Test plans in Polarion | <link or reference to Polarion> |
QE | Automated tests merged | <link or reference to automated tests> |
DOC | Downstream documentation merged | <link to meaningful PR> |
Part of making https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation available for early adoption.
Plugin teams need a mechanism to extend the OCP console that is decoupled enough so they can deliver at the cadence of their projects and not be forced in to the OCP Console release timelines.
The OCP Console Dynamic Plugin Framework will enable all our plugin teams to do the following:
Requirement | Notes | isMvp? |
---|---|---|
UI to enable and disable plugins | YES | |
Dynamic Plugin Framework in place | YES | |
Testing Infra up and running | YES | |
Docs and read me for creating and testing Plugins | YES | |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Documentation Considerations
Questions to be addressed:
Console static plugins, maintained as part of the frontend monorepo, currently use static extension types (packages/console-plugin-sdk/src/typings) which directly reference various kinds of objects, including React components and arbitrary functions.
To ease the long-term transition from static to dynamic plugins, we should support a use case where an existing static plugin goes through the following stages:
Once a static plugin reaches the "use dynamic extensions only" stage, its maintainers can move it out of the Console monorepo - the plugin becomes dynamic, shipped via the corresponding operator and loaded by Console app at runtime.
Dynamic plugins will need changes to the console operator config to be enabled and disabled. We'll need either a new CRD or an annotation on CSVs for console to discover when plugins are available.
This story tracks any API updates needed to openshift/api and any operator updates needed to wire through dynamic plugins as console config.
See https://github.com/openshift/enhancements/pull/441 for design details.
Requirement | Notes | isMvp? |
---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
The work on this story is dependent on following changes:
The console already supports custom routes on the operator config. With the new proposed CustomDomains API introduces a unified way how to stock install custom domains for routes, which both names and serving cert/keys, customers want to customise. From the console perspective those are:
The setup should be done on the Ingress config. There two new fields are introduced:
Console-operator will be only consuming the API and check for any changes. If a custom domain is set for either `console` or `downloads` route in the `openshift-console` namespace, console-operator will read the setup set a custom route accordingly. When a custom route will be setup for any of console's route, the default route wont be deleted, but instead it will updated so it redirects to the custom one. This is done because of two reasons:
Console-operator will still need to support the CustomDomain API that is available on it's config.
Acceptance criteria:
Questions:
Implement console-operator changes to consume new CustomDomains API, based on the story details.
Dump openshift/api godep to pickup new CustomDomain API for the Ingress config.
During master nodes upgrade when nodes are getting drained there's currently no protection from two or more operands going down. If your component is required to be available during upgrade or other voluntary disruptions, please consider deploying PDB to protect your operands.
The effort is tracked in https://issues.redhat.com/browse/WRKLDS-293.
Example:
Acceptance Criteria:
1. Create PDB controller in console-operator for both console and downloads pods
2. Add e2e tests for PDB in single node and multi node cluster
Note: We should consider to backport this to 4.10
When OCP is performing cluster upgrade user should be notified about this fact.
There are two possibilities how to surface the cluster upgrade to the users:
AC:
Note: We need to decide if we want to distinguish this particular notification by a different color? ccing Ali Mobrem
Created from: https://issues.redhat.com/browse/RFE-3024
Quick Starts are key tool for helping our customer to discover and understand how to take advantage of services that run on top of the OpenShift Platform. This feature will focus on making the Quick Starts extensible. With this Console extension our customer and partners will be able to add in their own Quick Starts to help drive a great user experience.
Enhancement PR: https://github.com/openshift/enhancements/pull/360
Requirement | Notes | isMvp? |
---|---|---|
Define QuickStart CRD | YES | |
Console Operator: Out of the box support for installing Quick Starts for enabling Operators | YES | |
Process( Design\Review), Documentation + Template for providing out of the box quick starts | YES | |
Process, Docs, for enabling Operators to add Quick Starts | YES | |
Migrate existing UI to work with CRD | ||
Move Existing Quick Starts to new CRD | YES | |
Support Internationalization | NO | |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Questions to be addressed:
Goal
Provide a dynamic extensible mechanism to add Guided Tours to the OCP Console.
User Stories/Scenarios
As a product owner, I need a mechanism to add guide tours that will help guide my users to enable my service.
As a product owner, I need a mechanism to add guide tours that will help guide my users to consume my service.
As an Operator, I need a mechanism to add guide tours that will help guide my users to consume my service.
+Acceptance Criteria
Dependencies (internal and external)
Previous Work (Optional):
Open questions::
Done Checklist
As user I would like to see a YAMLSample for the new QuickStart CRD when I go and create my own QuickStarts.
We need a README about the submission process for adding quick starts to the console-operator repo.
Epic Goal*
Provide a long term solution to SELinux context labeling in OCP.
Why is this important? (mandatory)
As of today when selinux is enabled, the PV's files are relabeled when attaching the PV to the pod, this can cause timeout when the PVs contains lot of files as well as overloading the storage backend.
https://access.redhat.com/solutions/6221251 provides few workarounds until the proper fix is implemented. Unfortunately these workaround are not perfect and we need a long term seamless optimised solution.
This feature tracks the long term solution where the PV FS will be mounted with the right selinux context thus avoiding to relabel every file.
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
As we are relying on mount context there should not be any relabeling (chcon) because all files / folders will inherit the context from the mount context
More on design & scenarios in the KEP and related epic STOR-1173
Dependencies (internal and external) (mandatory)
None for the core feature
However the driver will have to set SELinuxMountSupported to true in the CSIDriverSpec to enable this feature.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Support upstream feature "SELinux relabeling using mount options (CSIDriver API change)"" in OCP as Beta, i.e. test it and have docs for it (unless it's Alpha upstream).
Summary: If Pod has defined SELinux context (e.g. it uses "resticted" SCC) and it uses ReadWriteOncePod PVC and CSI driver responsible for the volume supports this feature, kubelet + the CSI driver will mount the volume directly with the correct SELinux labels. Therefore CRI-O does not need to recursive relabel the volume and pod startup can be significantly faster. We will need a thorough documentation for this.
This upstream epic actually will be implemented by us!
As a cluster user, I want to use mounting with SELinux context without any configuration.
This means OCP ships CSIDriver objects with "SELinuxMount: true" for CSI drivers that support mounting with "-o context". I.e. all CSI drivers that are based on block volumes and use ext4/xfs should have this enabled.
This Epic is to track upstream work in the Storage SIG community
This Epic is to track the SELinux specific work required. fsGroup work is not included here.
Goal:
Continue contributing to and help move along the upstream efforts to enable recursive permissions functionality.
Finish current SELinuxMountReadWriteOncePod feature upstream:
The feature is probably going to stay alpha upstream.
Problem:
Recursive permission change takes very long for fsGroup and SELinux. For volumes with many small files Kubernetes currently does a chown for every file on the volume (due to fsGroup). Similarly for container runtimes (such as CRI-O) a chcon of every file on the volume is performed due to SCC's SELinux context. Data on the volume may already have the correct GID/SELinux context so Kubernetes needs way to detect this automatically to avoid the long delay.
Why is this important:
Dependencies (internal and external):
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL):
Previous Work:
Customers:
Open questions:
Notes:
As OCP developer (and as OCP user in the future), I want all CSI drivers shipped as part of OCP to support mounting with -o context=XYZ, so I can test with CSIDriver.SELinuxMount: true (or my pods are running without CRI-O recursively relabeling my volume).
In detail:
Exit criteria:
The storage operators need to be automatically restarted after the certificates are renewed.
From OCP doc "The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left."
Since OCP is now offering an 18 months lifecycle per release, the storage operator pods need to be automatically restarted after the certificates are renewed.
The storage operators will be transparently restarted. The customer benefit should be transparent, it avoids manually restart of the storage operators.
The administrator should not need to restart the storage operator when certificates are renew.
This should apply to all relevant operators with a consistent experience.
As an administrator I want the storage operators to be automatically restarted when certificates are renewed.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
This feature request is triggered by the new extended OCP lifecycle. We are moving from 12 to 18 months support per release.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
No doc is required
This feature only cover storage but the same behavior should be applied to every relevant components.
The pod `aws-ebs-csi-driver-controller` mounts the secret:
$ oc get po -n openshift-cluster-csi-drivers aws-ebs-csi-driver-controller-559f74d7cd-5tk4p -o yaml ... name: driver-kube-rbac-proxy name: provisioner-kube-rbac-proxy name: attacher-kube-rbac-proxy name: resizer-kube-rbac-proxy name: snapshotter-kube-rbac-proxy volumeMounts: - mountPath: /etc/tls/private name: metrics-serving-cert volumes: - name: metrics-serving-cert secret: defaultMode: 420 secretName: aws-ebs-csi-driver-controller-metrics-serving-cert
Hence, if the secret is updated (e.g. as a result of CA cert update), the Pod must be restarted
Console enhancements based on customer RFEs that improve customer user experience.
Requirement | Notes | isMvp? |
---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
Based on https://issues.redhat.com/browse/RFE-3775 we should be extending our proxy package timeout to match the browser's timeout, which is 5 minutes.
AC: Bump the 30second timeout in the proxy pkg to 5 minutes
Description of problem:
Even though in 4.11 we introduced LegacyServiceAccountTokenNoAutoGeneration to be compatible with upstream K8s to not generate secrets with tokens when service accounts are created, today OpenShift still creates secrets and tokens that are used for legacy usage of openshift-controller as well as the image-pull secrets.
Customer issues:
Customers see auto-generated secrets for service accounts which is flagged as a security risk.
This Feature is to track the implementation for removing legacy usage and image-pull secret generation as well so that NO secrets are auto-generated when a Service Account is created on OpenShift cluster.
NO Secrets to be auto-generated when creating service accounts
Following *secrets need to NOT be generated automatically with every Serivce account creation:*
Use Cases (Optional):
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
Concerns/Risks: Replacing functionality of one of the openshift-controller used for controllers that's been in the code for a long time may impact behaviors that w
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Existing documentation needs to be clear on where we are today and why we are providing the above 2 credentials. Related Tracker: https://issues.redhat.com/browse/OCPBUGS-13226
Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
As an Infrastructure Administrator, I want to deploy OpenShift on Nutanix distributing the control plane and compute nodes across multiple regions and zones, forming different failure domains.
As an Infrastructure Administrator, I want to configure an existing OpenShift cluster to distribute the nodes across regions and zones, forming different failure domains.
Install OpenShift on Nutanix using IPI / UPI in multiple regions and zones.
This implementation would follow the same idea that has been done for vSphere. The following are the main PRs for vSphere:
https://github.com/openshift/enhancements/blob/master/enhancements/installer/vsphere-ipi-zonal.md
Nutanix Zonal: Multiple regions and zones support for Nutanix IPI and Assisted Installer
Note
As a user, I want to be able to spread control plane nodes for an OCP clusters across Prism Elements (zones).
This feature will track upstream work from the OpenShift Control Plane teams - API, Auth, etcd, Workloads, and Storage.
To continue and develop meaningful contributions to the upstream community including feature delivery, bug fixes, and leadership contributions.
Note: The matchLabelKeys field is a beta-level field and enabled by default in 1.27. You can disable it by disabling the MatchLabelKeysInPodTopologySpread [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/).
Removing from the TP as the feature is enabled by default.
Just a clean up work.
Console enhancements based on customer RFEs that improve customer user experience.
Requirement | Notes | isMvp? |
---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
According to security best practice, it's recommended to set readOnlyRootFilesystem: true for all containers running on kubernetes. Given that openshift-console does not set that explicitly, it's requested that this is being evaluated and if possible set to readOnlyRootFilesystem: true or otherwise to readOnlyRootFilesystem: false with a potential explanation why the file-system needs to be write-able.
3. Why does the customer need this? (List the business requirements here)
Extensive security audits are run on OpenShift Container Platform 4 and are highlighting that many vendor specific container is missing to set readOnlyRootFilesystem: true or else justify why readOnlyRootFilesystem: false is set.
AC: Set up readOnlyRootFilesystem field on both console and console-operator deployment's spec. Part of the work is to determine the value. True if the pod if not doing any writing to its filesystem, otherwise false.
Unify and update hosted control planes storage operators so that they have similar code patterns and can run properly in both standalone OCP and HyperShift's control plane.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.
Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
Epic Goal*
Our current design of EBS driver operator to support Hypershift does not scale well to other drivers. Existing design will lead to more code duplication between driver operators and possibility of errors.
Why is this important? (mandatory)
An improved design will allow more storage drivers and their operators to be added to hypershift without requiring significant changes in the code internals.
Scenarios (mandatory)
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Until final structure is finalized we should be able to build and test aws-ebs image from legacy folder.
This feature is the place holder for all epics related to technical debt associated with Console team
Outcome Overview
Once all Features and/or Initiatives in this Outcome are complete, what tangible, incremental, and (ideally) measurable movement will be made toward the company's Strategic Goal(s)?
Success Criteria
What is the success criteria for this strategic outcome? Avoid listing Features or Initiatives and instead describe "what must be true" for the outcome to be considered delivered.
Expected Results (what, how, when)
What incremental impact do you expect to create toward the company's Strategic Goals by delivering this outcome? (possible examples: unblocking sales, shifts in product metrics, etc. + provide links to metrics that will be used post-completion for review & pivot decisions). {}For each expected result, list what you will measure and when you will measure it (ex. provide links to existing information or metrics that will be used post-completion for review and specify when you will review the measurement such as 60 days after the work is complete)
Post Completion Review – Actual Results
After completing the work (as determined by the "when" in Expected Results above), list the actual results observed / measured during Post Completion review(s).
An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.
Added client certificates based on https://github.com/deads2k/openshift-enhancements/blob/master/enhancements/monitoring/client-cert-scraping.md
Upstream Kuberenetes is following other SIGs by moving it's intree cloud providers to an out of tree plugin format, Cloud Controller Manager, at some point in a future Kubernetes release. OpenShift needs to be ready to action this change
GA of the cloud controller manager for the GCP platform
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.
Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
To make the CCM GA, we need to update the switch case in library go to make sure the GCP CCM is always considered external.
We then need to update the vendor in KCMO, CCMO, KASO and MCO.
The OpenShift API needs to be updated to define VSphereFailureDomain. A draft PR is here: https://github.com/openshift/api/pull/1539
Also, ensure that the client-go and openshift-cluster-config-operator projects are bumped once the API changes merge.
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
Why is this important? (mandatory)
What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF).
Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories
Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.
This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.
Cluster administrators need an in-product experience to discover and install new Red Hat offerings that can add high value to developer workflows.
Requirements | Notes | IS MVP |
Discover new offerings in Home Dashboard | Y | |
Access details outlining value of offerings | Y | |
Access step-by-step guide to install offering | N | |
Allow developers to easily find and use newly installed offerings | Y | |
Support air-gapped clusters | Y |
< What are we making, for who, and why/what problem are we solving?>
Discovering solutions that are not available for installation on cluster
No known dependencies
Background, and strategic fit
None
Quick Starts
Cluster admins need to be guided to install RHDH on the cluster.
Enable admins to discover RHDH, be guided to installing it on the cluster, and verifying its configuration.
RHDH is a key multi-cluster offering for developers. This will enable customers to self-discover and install RHDH.
RHDH operator
The MCO should properly report its state in a way that's consistent and able to be understood by customers, troubleshooters, and maintainers alike.
Some customer cases have revealed scenarios where the MCO state reporting is misleading and therefore could be unreliable to base decisions and automation on.
In addition to correcting some incorrect states, the MCO will be enhanced for a more granular view of update rollouts across machines.
The MCO should properly report its state in a way that's consistent and able to be understood by customers, troubleshooters, and maintainers alike.
For this epic, "state" means "what is the MCO doing?" – so the goal here is to try to make sure that it's always known what the MCO is doing.
This includes:
While this probably crosses a little bit into the "status" portion of certain MCO objects, as some state is definitely recorded there, this probably shouldn't turn into a "better status reporting" epic. I'm interpreting "status" to mean "how is it going" so status is maybe a "detail attached to a state".
Exploration here: https://docs.google.com/document/d/1j6Qea98aVP12kzmPbR_3Y-3-meJQBf0_K6HxZOkzbNk/edit?usp=sharing
https://docs.google.com/document/d/17qYml7CETIaDmcEO-6OGQGNO0d7HtfyU7W4OMA6kTeM/edit?usp=sharing
Ensure that the pod exists but the functionality behind the pod is not exposed by default in the release version this work ships in.
This can be done by creating a new featuregate in openshift/api, vendoring that into the cluster config operator, and then checking for this featuregate in the state controller code of the MCO.
When the internal oauth-server and oauth-apiserver are removed and replaced with an external OIDC issuer (like azure AD), the console must work for human users of the external OIDC issuer.
An end user can use the openshift console without a notable difference in experience. This must eventually work on both hypershift and standalone, but hypershift is the first priority if it impacts delivery
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
DPDK applications require dedicated CPUs, and isolated any preemption (other processes, kernel threads, interrupts), and this can be achieved with the “static” policy of the CPU manager: the container resources need to include an integer number of CPUs of equal value in “limits” and “request”. For instance, to get six exclusive CPUs:
spec:
containers:
- name: CNF
image: myCNF
resources:
limits:
cpu: "6"
requests:
cpu: "6"
The six CPUs are dedicated to that container, however non trivial, meaning real DPDK applications do not use all of those CPUs as there is always at least one of the CPU running a slow-path, processing configuration, printing logs (among DPDK coding rules: no syscall in PMD threads, or you are in trouble). Even the DPDK PMD drivers and core libraries include pthreads which are intended to sleep, they are infrastructure pthreads processing link change interrupts for instance.
Can we envision going with two processes, one with isolated cores, one with the slow-path ones, so we can have two containers? Unfortunately no: going in a multi-process design, where only dedicated pthreads would run on a process is not an option as DPDK multi-process is going deprecated upstream and has never picked up as it never properly worked. Fixing it and changing DPDK architecture to systematically have two processes is absolutely not possible within a year, and would require all DPDK applications to be re-written. Knowing that the first and current multi-process implementation is a failure, nothing guarantees that a second one would be successful.
The slow-path CPUs are only consuming a fraction of a real CPU and can safely be run on the “shared” CPU pool of the CPU Manager, however containers specifications do not accept to request two kinds of CPUs, for instance:
spec:
containers:
- name: CNF
image: myCNF
resources:
limits:
cpu_dedicated: "4"
cpu_shared: "20m"
requests:
cpu_dedicated: "4"
cpu_shared: "20m"
Why do we care about allocating one extra CPU per container?
Let’s take a realistic example, based on a real RAN CNF: running 6 containers with dedicated CPUs on a worker node, with a slow Path requiring 0.1 CPUs means that we waste 5 CPUs, meaning 3 physical cores. With real life numbers:
Intel public CPU price per core is around 150 US$, not even taking into account the ecological aspect of the waste of (rare) materials and the electricity and cooling…
Requirement | Notes | isMvp? |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This issue has been addressed lately by OpenStack.
N/A
bump cluster-config-operator to pull mixed-cpus feature-gate api
Telecommunications providers continue to deploy OpenShift at the Far Edge. The acceleration of this adoption and the nature of existing Telecommunication infrastructure and processes drive the need to improve OpenShift provisioning speed at the Far Edge site and the simplicity of preparation and deployment of Far Edge clusters, at scale.
A list of specific needs or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
requirement | Notes | isMvp? |
Telecommunications Service Provider Technicians will be rolling out OCP w/ a vDU configuration to new Far Edge sites, at scale. They will be working from a service depot where they will pre-install/pre-image a set of Far Edge servers to be deployed at a later date. When ready for deployment, a technician will take one of these generic-OCP servers to a Far Edge site, enter the site specific information, wait for confirmation that the vDU is in-service/online, and then move on to deploy another server to a different Far Edge site.
Retail employees in brick-and-mortar stores will install SNO servers and it needs to be as simple as possible. The servers will likely be shipped to the retail store, cabled and powered by a retail employee and the site-specific information needs to be provided to the system in the simplest way possible, ideally without any action from the retail employee.
Q: how challenging will it be to support multi-node clusters with this feature?
< What does the person writing code, testing, documenting need to know? >
< Are there assumptions being made regarding prerequisites and dependencies?>
< Are there assumptions about hardware, software or people resources?>
< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>
< Are there Upgrade considerations that customers need to account for or that the feature should address on behalf of the customer?>
<Does the Feature introduce data that could be gathered and used for Insights purposes?>
< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >
< What does success look like?>
< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact>
< If unsure and no Technical Writer is available, please contact Content Strategy. If yes, complete the following.>
< Which other products and versions in our portfolio does this feature impact?>
< What interoperability test scenarios should be factored by the layered product(s)?>
Question | Outcome |
Description of problem:
The bootkube scripts spend ~1 minute failing to apply manifests while waiting fot eh openshift-config namespace to get created
Version-Release number of selected component (if applicable):
4.12
How reproducible:
100%
Steps to Reproduce:
1.Run the POC using the makefile here https://github.com/eranco74/bootstrap-in-place-poc 2. Observe the bootkube logs (pre-reboot)
Actual results:
Jan 12 17:37:09 master1 cluster-bootstrap[5156]: Failed to create "0000_00_cluster-version-operator_01_adminack_configmap.yaml" configmaps.v1./admin-acks -n openshift-config: namespaces "openshift-config" not found .... Jan 12 17:38:27 master1 cluster-bootstrap[5156]: "secret-initial-kube-controller-manager-service-account-private-key.yaml": failed to create secrets.v1./initial-service-account-private-key -n openshift-config: namespaces "openshift-config" not found Here are the logs from another installation showing that it's not 1 or 2 manifests that require this namespace to get created earlier: Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-ca-bundle-configmap.yaml": failed to create configmaps.v1./etcd-ca-bundle -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-client-secret.yaml": failed to create secrets.v1./etcd-client -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-metric-client-secret.yaml": failed to create secrets.v1./etcd-metric-client -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-metric-serving-ca-configmap.yaml": failed to create configmaps.v1./etcd-metric-serving-ca -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-metric-signer-secret.yaml": failed to create secrets.v1./etcd-metric-signer -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-serving-ca-configmap.yaml": failed to create configmaps.v1./etcd-serving-ca -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "etcd-signer-secret.yaml": failed to create secrets.v1./etcd-signer -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "kube-apiserver-serving-ca-configmap.yaml": failed to create configmaps.v1./initial-kube-apiserver-server-ca -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "openshift-config-secret-pull-secret.yaml": failed to create secrets.v1./pull-secret -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "openshift-install-manifests.yaml": failed to create configmaps.v1./openshift-install-manifests -n openshift-config: namespaces "openshift-config" not found Jan 12 17:38:10 master1 bootkube.sh[5121]: "secret-initial-kube-controller-manager-service-account-private-key.yaml": failed to create secrets.v1./initial-service-account-private-key -n openshift-config: namespaces "openshift-config" not found
Expected results:
expected resources to get created successfully without having to wait for the namespace to get created.
Additional info:
Description of problem:
while trying to figure out why it takes so long to install Single node OpenShift I noticed that the kube-controller-manager cluster operator is degraded for ~5 minutes due to: GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused I don't understand how the prometheusClient is successfully initialized, but we get a connection refused once we try to query the rules. Note that if the client initialization fails the kube-controller-manger won't set the GarbageCollectorDegraded to true.
Version-Release number of selected component (if applicable):
4.12
How reproducible:
100%
Steps to Reproduce:
1. install SNO with bootstrap in place (https://github.com/eranco74/bootstrap-in-place-poc) 2. monitor the cluster operators staus
Actual results:
GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.119.108:9091: connect: connection refused
Expected results:
Expected the GarbageCollectorDegraded status to be false
Additional info:
It seems that for PrometheusClient to be successfully initialised it needs to successfully create a connection but we get connection refused once we make the query. Note that installing SNO with this patch (https://github.com/eranco74/cluster-kube-controller-manager-operator/commit/26e644503a8f04aa6d116ace6b9eb7b9b9f2f23f) reduces the installation time by 3 minutes
openshift- service-ca service-ca pod takes a few minutes to start when installing SNO
kubectl get events -n openshift-service-ca --sort-by='.metadata.creationTimestamp' -o custom-columns=FirstSeen:.firstTimestamp,LastSeen:.lastTimestamp,Count:.count,From:.source.component,Type:.type,Reason:.reason,Message:.message FirstSeen LastSeen Count From Type Reason Message 2023-01-22T12:25:58Z 2023-01-22T12:25:58Z 1 deployment-controller Normal ScalingReplicaSet Scaled up replica set service-ca-6dc5c758d to 1 2023-01-22T12:26:12Z 2023-01-22T12:27:53Z 9 replicaset-controller Warning FailedCreate Error creating: pods "service-ca-6dc5c758d-" is forbidden: error fetching namespace "openshift-service-ca": unable to find annotation openshift.io/sa.scc.uid-range 2023-01-22T12:27:58Z 2023-01-22T12:27:58Z 1 replicaset-controller Normal SuccessfulCreate Created pod: service-ca-6dc5c758d-k7bsd 2023-01-22T12:27:58Z 2023-01-22T12:27:58Z 1 default-scheduler Normal Scheduled Successfully assigned openshift-service-ca/service-ca-6dc5c758d-k7bsd to master1
Seems that creating the serivce-ca namespace early allows it to get
openshift.io/sa.scc.uid-range annotation and start running earlier, the
service-ca pod is required for other pods (CVO and all the control plane pods) to start since it's creating the serving-cert
Reduce the OpenShift platform and associated RH provided components to a single physical core on Intel Sapphire Rapids platform for vDU deployments on SingleNode OpenShift.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Provide a mechanism to tune the platform to use only one physical core. |
Users need to be able to tune different platforms. | YES |
Allow for full zero touch provisioning of a node with the minimal core budget configuration. | Node provisioned with SNO Far Edge provisioning method - i.e. ZTP via RHACM, using DU Profile. | YES |
Platform meets all MVP KPIs | YES |
Questions to be addressed:
This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were completed when this image was assembled
Monitoring needs to be reliable and is the very useful when trying to debug clusters in an already degraded state. We want to ensure that metrics scraping can always work if the scraper can reach the target, even if the kube-apiserver is unavailable or unreachable. To do this, we will combine a local authorizer (already merged in many binaries and the rbac-proxy) and client-cert based authentication to have a fully local authentication and authorization path for scraper targets.
If networking (or part of networking) is down and a scraper target cannot reach the kube-apiserver to verify a token and a subjectaccessreview, then the metrics scraper can be rejected. The subjectaccessreview (authorization) is already largely addressed, but service account tokens are still used for scraping targets. Tokens require an external network call that we can avoid by using client certificates. Gathering metrics, especially client metrics, from partially functionally clusters helps narrow the search area between kube-apiserver, etcd, kubelet, and SDN considerably.
In addition, this will significantly reduce the load on the kube-apiserver. We have observed in the CI cluster that token and subject access reviews are a significant percentage of all kube-apiserver traffic.
User story:
As cluster-policy-controller I automatically approve cert signing requests issued by monitoring.
DoD:
Implementation hints: leverage approving logic implemented in https://github.com/openshift/library-go/pull/1083.
As an cluster administrator of a disconnected OCP cluster,
I want a list of possible sample images to mirror
So that I can configure my image mirror prior to installing OCP in a disconnected environment.
it is too onerous to find a connected cluster in order to obtain the list of possible samples images to mirror using the current documented procedures.
I need a list make available to me in my disconnected cluster that I can reference after initial install.
Sample operator use of the reason field in its config object to track imagestream import completion has resulted in that singleton being a bottleneck and source of update conflicts (we are talking 60 or 70 imagestreams potentially updating that once field concurrently).
See this hackday PR
for an alternative approach which uses a configmap per imagestream
Goal:
Notify users that the lastImageChangeTriggeredID field in the BuildConfig spec is deprecated, and will be ignored in a future release.
Problem:
BuildConfigs use the lastImageChangeTriggeredID field to record the SHA of the last image used to trigger a build. This information should only be managed by the Build controllers and be placed in the BuildConfig status. By keeping the data in spec, cluster admins and developers can easily alter or remove this information.
Why is this important?
Moving this information to status is a prerequisite for using GitOps to manage BuildConfigs with ImageChange triggers.
Dependencies
None
Stories and Deliverables
Estimate: (XS, S, M, L, XL, XXL) S
Previous Work:
https://bugzilla.redhat.com/show_bug.cgi?id=1876500
User Stories:
As an OpenShift engineer
I want to image change triggers to record their information BuildConfig status
So that eventually changes to ImageChange triggers in the BuildConfig spec do not cause builds to launch
As a developer using OpenShift to build images
I want to trigger a build when the base image of my application changes
So that bug fixes and security patches are applied to my application.
As an OpenShift cluster admin
I want to be alerted if something cleared the LastImageTriggeredID in a BuildConfig
So that I am aware that cluster users are relying on deprecated behavior.
Success Criteria:
1. Image change triggers record lastImageTriggeredID in the BuildConfig's status
2. Customers are notified that lastImageTriggeredID is deprecated in BuildConfig's spec
3. Customers are alerted if their users are relying on deprecated behavior
Open Questions:
See https://docs.google.com/document/d/1thcVR31TElJBXBGmyaYXZ9jRejp5W4-ppNYn4g3_mgk/edit# for a fuller explanation on how to use this template.
As an OpenShift cluster admin
I want to be alerted if something cleared the LastImageTriggeredID in a BuildConfig
So that I am aware that cluster users are relying on deprecated behavior.
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
After review with the monitoring team, we agreed to use Warning events rather than alerts. To properly create an alert for this behavior, we would have needed to introduce a metric with unbound cardinality. Such metrics can lead to performance degradation in the cluster monitoring system.
When the LastImageChangeTriggerID is cleared in a BuildConfig object, a warning event is created which should appear in relevant contexts within the Web Console.
Goal:
Record the lastImageChangeTriggeredID field in the BuildConfig status, instead of spec.
Problem:
BuildConfigs use the lastImageChangeTriggeredID field to record the SHA of the last image used to trigger a build. This information should only be managed by the Build controllers and be placed in the BuildConfig status. By keeping the data in spec, cluster admins and developers can easily alter or remove this information.
Why is this important?
Moving this information to status is a prerequisite for using GitOps to manage BuildConfigs with ImageChange triggers.
Dependencies
None
Stories and Deliverables
Estimate: (XS, S, M, L, XL, XXL) S
Previous Work:
https://bugzilla.redhat.com/show_bug.cgi?id=1876500
User Stories:
As an OpenShift engineer
I want to image change triggers to record their information BuildConfig status
So that eventually changes to ImageChange triggers in the BuildConfig spec do not cause builds to launch
As a developer using OpenShift to build images
I want to trigger a build when the base image of my application changes
So that bug fixes and security patches are applied to my application.
As an OpenShift cluster admin
I want to be alerted if something cleared the LastImageTriggeredID in a BuildConfig
So that I am aware that cluster users are relying on deprecated behavior.
Success Criteria:
1. Image change triggers record lastImageTriggeredID in the BuildConfig's status
2. Customers are notified that lastImageTriggeredID is deprecated in BuildConfig's spec
3. Customers are alerted if their users are relying on deprecated behavior
Open Questions:
See https://docs.google.com/document/d/1thcVR31TElJBXBGmyaYXZ9jRejp5W4-ppNYn4g3_mgk/edit# for a fuller explanation on how to use this template.
As an OpenShift engineer
I want to image change triggers to record their information BuildConfig status
So that eventually changes to ImageChange triggers in the BuildConfig spec do not cause builds to launch
As a developer using OpenShift to build images
I want to trigger a build when the base image of my application changes
So that bug fixes and security patches are applied to my application.
Release note should inform customers that the lastImageChangeTriggeredID field in the BuildConfig spec is deprecated, and will be ignored in OCP 4.9. Users relying on this information should update their scripts and jobs to read the triggered image ID from BuildConfig status.
Please see https://issues.redhat.com/browse/RHDEVDOCS-2738 for more details.
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
If lastImageTriggeredID is set to the empty string or nil, today this will trigger a build. By moving the data to status, we can ignore the lastImageTriggeredID field in spec in a future release. To give users proper notice, we are not going to remove the current behavior until a later release.
A deprecation notice will need to be added to the release notes upon completion.
Changes will be needed in the following:
1. API
2. openshift-apiserver
3. openshift-controller-manager (build controller)
See https://bugzilla.redhat.com/show_bug.cgi?id=1876500.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
We should expand the set of Cypress e2e tests we started in 4.6. This can either be new tests, or migrating some of our more flaky protractor tests.
We still have some tests remaining in the Protractor CRUD scenario:
https://github.com/openshift/console/blob/master/frontend/integration-tests/tests/crud.scenario.ts
We should migrate these to Cypress.
Acceptance Criteria
Convert labels tests to Cypress.
An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.
It has been a few releases since we've gone through and updated the various frontend dependencies for console. We should update major dependencies such as TypeScript, React, and webpack.
We should consider updating our TypeScript typings as well, which have gotten out of date.
We should update yarn to the latest version.
An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.
Console operator should swap from using monis.app to openshift/operator-boilerplate-legacy. This will allow switching to klog/v2, which the shared libs (api,client-go,library-go) have already done.
In the 4.11 release, a console.openshift.io/default-i18next-namespace annotation is being introduced. The annotation indicates whether the ConsolePlugin contains localization resources. If the annotation is set to "true", the localization resources from the i18n namespace named after the dynamic plugin (e.g. plugin__kubevirt), are loaded. If the annotation is set to any other value or is missing on the ConsolePlugin resource, localization resources are not loaded.
In case these resources are not present in the dynamic plugin, the initial console load will be slowed down. For more info check BZ#2015654
AC:
Follow up of https://issues.redhat.com/browse/CONSOLE-3159
when defining two proxy endpoints,
apiVersion: console.openshift.io/v1alpha1
kind: ConsolePlugin
metadata:
...
name: forklift-console-plugin
spec:
displayName: Console Plugin Template
proxy:
service:
basePath: /
I get two proxy endpoints
/api/proxy/plugin/forklift-console-plugin/forklift-inventory
and
/api/proxy/plugin/forklift-console-plugin/forklift-must-gather-api
but both proxy to the `forklift-must-gather-api` service
e.g.
curl to:
[server url]/api/proxy/plugin/forklift-console-plugin/forklift-inventory
will point to the `forklift-must-gather-api` service, instead of the `forklift-inventory` service
Currently the ConsolePlugins API version is v1alpha1. Since we are going GA with dynamic plugins we should be creating a v1 version.
This would require updates in following repositories:
AC:
NOTE: This story does not include the conversion webhook change which will be created as a follow on story
This enhancement Introduces support for provisioning and upgrading heterogenous architecture clusters in phases.
We need to scan through the compute nodes and build a set of supported architectures from those. Each node on the cluster has a label for architecture: e.g. kubernetes.io/arch=arm64, kubernetes.io/arch=amd64 etc. Based on the set of supported architectures console will need to surface only those operators in the Operator Hub, which are supported on our Nodes.
AC:
@jpoulin is good to ask about heterogeneous clusters.
Goal: Support OCI images.
Problem: Buildah and podman use OCI format by default and OpenShift Image Registry and ImageStream API doesn't understand it.
Why is this important: OCI images are supposed to replace Docker schema 2 images, OpenShift should be ready when OCI images become widely adopted.
Dependencies (internal and external):
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL): XL
Previous Work:
Customers:
Open Questions:
As a user of OpenShift
I want to push OCI images to the registry
So that I can use buildah and podman with their defaults to push images
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Add pertinent notes here:
As a user of OpenShift
I want to import OCI images
So that I can use images that are built by new tools
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Add pertinent notes here:
As a user of OpenShift
I want the image pruner to be aware of OCI images
So that it doesn't delete their layers/configs
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Add pertinent notes here:
https://github.com/openshift/origin/pull/25475/files marked our tests for ISI as Disruptive.
Tests should wait until operators become stable, otherwise other tests will be run on an unstable cluster and it'll cause flakes.
The integration tests for the image registry expect that OpenShift and tests are run on the same machine (i.e. OpenShift can connect to sockets that the tests listen). This is not the case with e2e tests.
Goal: Rebase registry to Docker Distribution
Problem: The registry is currently based on an outdated version of the upstream docker/distribution project. The base does not even have a version associated with it - DevEx last rebased on an untagged commit.
Why is this important: Update the registry with improvements and bug fixes from the upstream community.
Dependencies (internal and external):
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL): M
Previous Work:
Customers:
Open questions:
As a user of OpenShift
I want the image registry to be rebased on the latest docker/distribution release (v2.7.1)
So that the image registry has the latest upstream bugfixes and enhancements
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Add pertinent notes here:
This is an API change and we will consider this as a feature request.
https://issues.redhat.com/browse/NE-799 Please check this for more details
https://issues.redhat.com/browse/NE-799 Please check this for more details
No
N/A
We need tests for the ovirt-csi-driver and the cluster-api-provider-ovirt. These tests help us to
Also, having dedicated tests on lower levels with a smaller scope (unit, integration, ...) has the following benefits:
Integration tests need to be implemented according to https://cluster-api.sigs.k8s.io/developer/testing.html#integration-tests using envtest.
Today in the Dev Perspective navigation, we have pre-pinned resources in the "third section" of the vertical navigation. Customers have asked if there is a way for admins to to provide defaults for those pre-pinned resources for all users.
tbd
As an admin, I want to define the pre-pinned resources on the developer perspective navigation
Based on the https://issues.redhat.com/browse/ODC-7181 enhancement proposal, it is required to extend the console configuration CRD to enable the cluster admins to configure this data in the console resource
Previous customization work:
Provide a form driven experience to allow cluster admins to manage the perspectives to meet the ACs below.
We have heard the following requests from customers and developer advocates:
As an admin, I want to hide the admin perspective for non-privileged users or hide the developer perspective for all users
Based on the https://issues.redhat.com/browse/ODC-6730 enhancement proposal, it is required to extend the console configuration CRD to enable the cluster admins to configure this data in the console resource
Previous customization work:
Currently we are only able to get limited telemetry from the Dev Sandbox, but not from any of our managed clusters or on prem clusters.
In order to improve properly analyze usage and the user experience, we need to be able to gather as much data as possible.
// JS type
telemetry?: Record<string, string>
./bin/bridge --telemetry SEGMENT_API_KEY=a-key-123-xzy ./bin/bridge --telemetry CONSOLE_LOG=debug
Customers don't want their users to have access to some/all of the items which are available in the Developer Catalog. The request is to change access for the cluster, not per user or persona.
Provide a form driven experience to allow cluster admins easily disable the Developer Catalog, or one or more of the sub catalogs in the Developer Catalog.
Multiple customer requests.
We need to consider how this will work with subcatalogs which are installed by operators: VMs, Event Sources, Event Catalogs, Managed Services, Cloud based services
As an admin, I want to hide/disable access to specific sub-catalogs in the developer catalog or the complete dev catalog for all users across all namespaces.
Based on the https://issues.redhat.com/browse/ODC-6732 enhancement proposal, it is required to extend the console configuration CRD to enable the cluster admins to configure this data in the console resource
Extend the "customization" spec type definition for the CRD in the openshift/api project
Previous customization work:
This epic has 3 main goals
Currently we have no accurate telemetry of usage of the OpenShift Console across all clusters in the fleet. We should be able to utilize the auth and console telemetry to glean details which will allow us to get a picture of console usage by our customers.
Let's do a spike to validate, and possibly have to update this list after the spike:
Need to verify HOW do we define a cluster Admin -> Listing all namespaces in a cluster? Install operators? Make sure that we consider OSD cluster admins as well (this should be aligned with how we send people to dev perspective in my mind)
Capture additional information via console plugin ( and possibly the auth operator )
Understanding how to capture telemetry via the console operator
We have removed the following ACs for this release:
As Red Hat, we want to understand the usage of the (dev) console, for that, we want to add new Prometheus metrics (how many users have a cluster, etc.) and collect them later (as telemetry data) via cluster-monitoring-operator.
Eigther the console-operator or cluster-monitoring-operator needs to apply a PrometheusRule to collect the right data and make these later available in Superset DataHat or Tableau.
This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were not completed when this image was assembled
An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.
Console-operator should switch from using bindata to using assets, similar to what cluster-kube-apiserver-operator and other operators are doing so we dont need to regenerate the bindata when yaml files are changes.
There is also an issue with generating bindata on ARM and other arch., where switching to assets, will make it obsolete.
Cluster-version operator (CVO) manifests declaring a runlevel are supposed to use 0000_<runlevel>_<dash-separated-component>_<manifest_filename>, but since api#598 added 0000_10-helm-chart-repository.crd.yaml and api#1084 added 0000_10-project-helm-chart-repository.crd.yaml, those have diverged from that pattern, so the cluster-version operator will fail to parse their runlevel. They're still getting sorted into a runlevel around 10 by this code, but unless there are reasons that the CRD needs to be pushed early in an update, it gives the CVO the ability to parallelize the reconciliation with more sibling resources if you leave the runlevel prefix off in the API repository (after which these COPY lines would need a matching bump).
We need to enable the storage for the v1 version of our ConsolePlugin CRD in the API repository. ConsolePlugin v1 CRD was added in CONSOLE-3077.
AC: Enable the storage for the v1 version of ConsolePlugin CRD and disable the storage for v1alpha1 version
Remove code that was added thought the ACM integration into all of the console's codebase repositories
Since there was decision made stop with the ACM integration, we as a team decided that it would be better to remove the unused code in order avoid any confusion or regressions.
Remove all multicluster-related code from the console operator repo.
AC:
Following points to consider to implement for switching to metrics-server:
Acceptance Criteria:
Vendor openshift/api to openshift/cluster-config-operator to bring featuregate changes.
https://github.com/openshift/api/pull/1615 is merged but this change won't be available unless we vendor change in openshift/cluster-config-operator
We need to Bump the version of K8 and to run a library sync for OCP4.13 .Two stories will be created for each activity
As a Sample Operator Developer, I will like to run the library sync process, so the new libraries can be pushed to OCP 4.15
This is a runbook we need to execute on every release of OpenShift
NA
NA
NA
Follow instructions here: https://source.redhat.com/groups/public/appservices/wiki/cluster_samples_operator_release_activities
Library Repo
Library sync PR is merged in master
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Unknown
Verified
Unsatisfied
We need to bump the Kubernetes Version. To the latest API version OCP is using.
This what was done last time:
https://github.com/openshift/cluster-samples-operator/pull/409
Find latest stable version from here: https://github.com/kubernetes/api
This is described in wiki: https://source.redhat.com/groups/public/appservices/wiki/cluster_samples_operator_release_activities
This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were completed when this image was assembled
Description of problem:
ROSA is being branded via custom branding; as a result, the favicon disappears since we do not want any Red Hat/Openshift-specific branding to appear when custom branding is in use. Since ROSA is a Red Hat product, it should get a branding option added to the console so all the correct branding including favicon appears.
Version-Release number of selected component (if applicable):
4.14.0, 4.13.z, 4.12.z, 4.11.z
How reproducible:
Always
Steps to Reproduce:
1. View a ROSA cluster 2. Note the absence of the OpenShift logo favicon
Please review the following PR: https://github.com/openshift/console-operator/pull/737
The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.
Differences in upstream and downstream builds impact the fidelity of your CI signal.
If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.
Closing this issue without addressing the difference will cause the issue to
be reopened automatically.
Refer to the CIS RedHat OpenShift Container Platform Benchmark PDF: https://drive.google.com/file/d/12o6O-M2lqz__BgmtBrfeJu1GA2SJ352c/view
1.1.7 Ensure that the etcd pod specification file permissions are set to 600 or more restrictive (Manual)
======================================================================================================
As per CIS v1.3 PDF permissions should be 600 with the following statement:
"The pod specification file is created on control plane nodes at /etc/kubernetes/manifests/etcd-member.yaml with permissions 644. Verify that the permissions are 600 or more restrictive."
But when I ran the following command it was showing 644 permissions
for i in $(oc get pods -n openshift-etcd -l app=etcd -o name | grep etcd ) do echo "check pod $i" oc rsh -n openshift-etcd $i \ stat -c %a /etc/kubernetes/manifests/etcd-pod.yaml done
Description of problem:
https://github.com/openshift/api/pull/1186 - https://issues.redhat.com/browse/CONSOLE-3069 promoted ConsolePlugin CRD to v1. The PR introduces also a conversion webhook from v1alpha1 to v1. In new CRD version I18n ConsolePluginI18n is marked as optional. The conversion webhook will not set a default valid ("Lazy"/"Preload") value writing the v1 object and a v1 object completely omitting spec.i18n will be accepted we no valid default value as well. On the other side, at garbage collection time the object will be stuck forever due to the lack of a valid value for spec.i18n.loadType Example, create a v1 ConsolePlugin object: cat <<EOF | oc apply -f - apiVersion: console.openshift.io/v1 kind: ConsolePlugin metadata: name: test472 spec: backend: service: basePath: / name: test472-service namespace: kubevirt-hyperconverged port: 9443 type: Service displayName: Test 472 Plugin EOF Delete it in foreground mode: stirabos@t14s:~$ oc delete consoleplugin test472 --timeout=30s --cascade='foreground' -v 7 I1011 18:20:03.255605 31610 loader.go:372] Config loaded from file: /home/stirabos/.kube/config I1011 18:20:03.266567 31610 round_trippers.go:463] DELETE https://api.ci-ln-krdzphb-72292.gcp-2.ci.openshift.org:6443/apis/console.openshift.io/v1/consoleplugins/test472 I1011 18:20:03.266581 31610 round_trippers.go:469] Request Headers: I1011 18:20:03.266588 31610 round_trippers.go:473] Accept: application/json I1011 18:20:03.266594 31610 round_trippers.go:473] Content-Type: application/json I1011 18:20:03.266600 31610 round_trippers.go:473] User-Agent: oc/4.11.0 (linux/amd64) kubernetes/fcf512e I1011 18:20:03.266606 31610 round_trippers.go:473] Authorization: Bearer <masked> I1011 18:20:03.688569 31610 round_trippers.go:574] Response Status: 200 OK in 421 milliseconds consoleplugin.console.openshift.io "test472" deleted I1011 18:20:03.688911 31610 round_trippers.go:463] GET https://api.ci-ln-krdzphb-72292.gcp-2.ci.openshift.org:6443/apis/console.openshift.io/v1/consoleplugins?fieldSelector=metadata.name%3Dtest472 I1011 18:20:03.688919 31610 round_trippers.go:469] Request Headers: I1011 18:20:03.688928 31610 round_trippers.go:473] Authorization: Bearer <masked> I1011 18:20:03.688935 31610 round_trippers.go:473] Accept: application/json I1011 18:20:03.688941 31610 round_trippers.go:473] User-Agent: oc/4.11.0 (linux/amd64) kubernetes/fcf512e I1011 18:20:03.840103 31610 round_trippers.go:574] Response Status: 200 OK in 151 milliseconds I1011 18:20:03.840825 31610 round_trippers.go:463] GET https://api.ci-ln-krdzphb-72292.gcp-2.ci.openshift.org:6443/apis/console.openshift.io/v1/consoleplugins?fieldSelector=metadata.name%3Dtest472&resourceVersion=175205&watch=true I1011 18:20:03.840848 31610 round_trippers.go:469] Request Headers: I1011 18:20:03.840884 31610 round_trippers.go:473] Accept: application/json I1011 18:20:03.840907 31610 round_trippers.go:473] User-Agent: oc/4.11.0 (linux/amd64) kubernetes/fcf512e I1011 18:20:03.840928 31610 round_trippers.go:473] Authorization: Bearer <masked> I1011 18:20:03.972219 31610 round_trippers.go:574] Response Status: 200 OK in 131 milliseconds error: timed out waiting for the condition on consoleplugins/test472 and in kube-controller-manager logs we see: 2022-10-11T16:25:32.192864016Z I1011 16:25:32.192788 1 garbagecollector.go:501] "Processing object" object="test472" objectUID=0cc46a01-113b-4bbe-9c7a-829a97d6867c kind="ConsolePlugin" virtual=false 2022-10-11T16:25:32.282303274Z I1011 16:25:32.282161 1 garbagecollector.go:623] remove DeleteDependents finalizer for item [console.openshift.io/v1/ConsolePlugin, namespace: , name: test472, uid: 0cc46a01-113b-4bbe-9c7a-829a97d6867c] 2022-10-11T16:25:32.304835330Z E1011 16:25:32.304730 1 garbagecollector.go:379] error syncing item &garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"console.openshift.io/v1", Kind:"ConsolePlugin", Name:"test472", UID:"0cc46a01-113b-4bbe-9c7a-829a97d6867c", Controller:(*bool)(nil), BlockOwnerDeletion:(*bool)(nil)}, Namespace:""}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:1, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{}, deletingDependents:true, deletingDependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, beingDeleted:true, beingDeletedLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, virtual:false, virtualLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, owners:[]v1.OwnerReference(nil)}: ConsolePlugin.console.openshift.io "test472" is invalid: spec.i18n.loadType: Unsupported value: "": supported values: "Preload", "Lazy"
Version-Release number of selected component (if applicable):
OCP 4.12.0 ec4
How reproducible:
100%
Steps to Reproduce:
1. cat <<EOF | oc apply -f - apiVersion: console.openshift.io/v1 kind: ConsolePlugin metadata: name: test472 spec: backend: service: basePath: / name: test472-service namespace: kubevirt-hyperconverged port: 9443 type: Service displayName: Test 472 Plugin EOF
2. oc delete consoleplugin test472 --timeout=30s --cascade='foreground' -v 7
Actual results:
2022-10-11T16:25:32.192864016Z I1011 16:25:32.192788 1 garbagecollector.go:501] "Processing object" object="test472" objectUID=0cc46a01-113b-4bbe-9c7a-829a97d6867c kind="ConsolePlugin" virtual=false 2022-10-11T16:25:32.282303274Z I1011 16:25:32.282161 1 garbagecollector.go:623] remove DeleteDependents finalizer for item [console.openshift.io/v1/ConsolePlugin, namespace: , name: test472, uid: 0cc46a01-113b-4bbe-9c7a-829a97d6867c] 2022-10-11T16:25:32.304835330Z E1011 16:25:32.304730 1 garbagecollector.go:379] error syncing item &garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"console.openshift.io/v1", Kind:"ConsolePlugin", Name:"test472", UID:"0cc46a01-113b-4bbe-9c7a-829a97d6867c", Controller:(*bool)(nil), BlockOwnerDeletion:(*bool)(nil)}, Namespace:""}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:1, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{}, deletingDependents:true, deletingDependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, beingDeleted:true, beingDeletedLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, virtual:false, virtualLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, owners:[]v1.OwnerReference(nil)}: ConsolePlugin.console.openshift.io "test472" is invalid: spec.i18n.loadType: Unsupported value: "": supported values: "Preload", "Lazy"
Expected results:
Object correctly deleted
Additional info:
The issue doesn't happen with --cascade='background' which is the default on the CLI client
Poking around in 4.12.0-ec.3 CI:
$ curl -s https://gcsweb-ci.apps.ci