OpenShift 4.13
https://github.com/openshift/installer/pull/6770 https://github.com/openshift/installer/pull/6782 https://github.com/openshift/installer/pull/6750 https://github.com/openshift/installer/pull/6738 https://github.com/openshift/installer/pull/6612 https://github.com/openshift/installer/pull/6327 https://github.com/openshift/api/pull/1388 https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/224 https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/218 https://github.com/openshift/openshift-docs/pull/54788 https://github.com/openshift/installer/pull/6905

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Role	Contact
PM	Peter Lauterbach
Documentation Owner	TBD
Delivery Owner	(See assignee)
Quality Engineer	(See QE Assignee)

Who	What	Reference
DEV	Upstream code and tests merged	https://github.com/openshift/hypershift/pull/3066
DEV	Upstream documentation merged	https://github.com/openshift/hypershift/pull/3464
DEV	gap doc updated	N/A
DEV	Upgrade consideration	None
DEV	CEE/PX summary presentation	N/A
QE	Test plans in Polarion	N/A
QE	Automated tests merged	https://github.com/openshift/hypershift/pull/3449
DOC	Downstream documentation merged	https://github.com/openshift/hypershift/pull/3464

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES
Provide a mechanism to tune the platform to use only one physical core.	Users need to be able to tune different platforms.	YES
Allow for full zero touch provisioning of a node with the minimal core budget configuration.	Node provisioned with SNO Far Edge provisioning method - i.e. ZTP via RHACM, using DU Profile.	YES
Platform meets all MVP KPIs		YES

Requirement	Notes
OCI Bare Metal Shapes must be certified with RHEL	It must also work with RHCOS (see iSCSI boot notes) as OCI BM standard shapes require RHCOS iSCSI to boot (~~OCPSTRAT-1246~~) Certified shapes: https://catalog.redhat.com/cloud/detail/249287
Successfully passing the OpenShift Provider conformance testing – this should be fairly similar to the results from the OCI VM test results.	Oracle will do these tests.
Updating Oracle Terraform files
Making the Assisted Installer modifications needed to address the CCM changes and surface the necessary configurations.	Support Oracle Cloud in Assisted-Installer CI: ~~MGMT-14039~~

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
vSphere autoscaling from zero		No
Upstream E2E testing		No
Upstream adapt scale from zero replicas		No

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Bug OCPBUGS-19710: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/541

Bug OCPBUGS-25858: Improve Agent Platform API Server Address Handling (backport card)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25857~~. The following is the description of the original issue:
—

Improve the description of the --api-server-address flag, be concrete about which api server address is needed
Add a note + the above to let folks to know they need to set it if they don’t want to be “connected” to a cluster.

Future:

Improve APIServerAddress Selection?
Add documentation about the restriction to have the management cluster being standalone

https://github.com/openshift/hypershift/pull/3977

Bug OCPBUGS-27076: CNO pod restart during kubevirt e2e

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27027~~. The following is the description of the original issue:
—

W0109 17:47:02.340203       1 builder.go:109] graceful termination failed, controllers failed with error: failed to get infrastructure name: infrastructureName not set in infrastructure 'cluster'

https://github.com/openshift/hypershift/pull/3419

Bug OCPBUGS-27311: Fix "depreciated" typo

View the Description View the linked PRs

s/depreciate/deprecate/ throughout (non-vendor) codebase.

Backport to... 4.13?

https://github.com/openshift/installer/pull/7912

Bug OCPBUGS-44355: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14476

Bug OCPBUGS-19251: Update 4.15 ose-prometheus-adapter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/k8s-prometheus-adapter/pull/74

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/k8s-prometheus-adapter/pull/74

Bug OCPBUGS-23554: After PatternFly 5 update? YAML edit tab collapse the current section after user changes the content

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13414

Bug OCPBUGS-26520: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28512

Bug OCPBUGS-27103: Failed to create secret on HyperShift Hosted Cluster with short-lived token was enabled by CCO.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25897~~. The following is the description of the original issue:
—
Description of problem:

Hosted cluster credentialsMode mode is not manual and cannot create secrets.
Now the Control Plan credentialsMode is the same as Management Cluster, but for this feature, it should be manual mode on Hosted Cluster no matter what the credentialsMode of Management Cluster is.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

   
 1.Creates CredentialsRequest including the spec.providerSpec.stsIAMRoleARN string. 
   
 2.Cloud Credential Operator could not populate Secret based on CredentialsRequest.   

$ oc get secret -A | grep test-mihuang
#Secret not found.  

$ oc get CredentialsRequest -n openshift-cloud-credential-operator
NAME                                                  AGE
...
test-mihuang                                               44s
    3.

Actual results:

    Secret not create successfully.

Expected results:

    Successfully created the secret on the hosted cluster.

Additional info:

https://github.com/openshift/hypershift/pull/3420

Bug OCPBUGS-18800: Fix MCO Image Registry ConfigMap updating

View the Description View the linked PRs

Description of problem:

currently the mco updates its image registry certificate configmap by deleting and re-creating it on each MCO sync. Instead, we should be patching it

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3851

Bug OCPBUGS-21638: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-25652: [4.15] conformance tests failing due to openshift-multus config

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19830~~. The following is the description of the original issue:
—
Description of problem:

There are several testcases in conformance testsuite that are failing due to openshift-multus configuration.

We are running conformance testsuite as part of our Openshift on Openstack CI. We use that just to confirm correct functionality of the cluster. The command we are using to run the test suite is:

openshift-tests run  --provider '{\"type\":\"openstack\"}' openshift/conformance/parallel

The name of the tests that failed are:
1. sig-arch] Managed cluster should ensure platform components have system-* priority class associated [Suite:openshift/conformance/parallel]

Reason is:

6 pods found with invalid priority class (should be openshift-user-critical or begin with system-):
openshift-multus/whereabouts-reconciler-6q6h7 (currently "")
openshift-multus/whereabouts-reconciler-87dwn (currently "")
openshift-multus/whereabouts-reconciler-fvhwv (currently "")
openshift-multus/whereabouts-reconciler-h68h5 (currently "")
openshift-multus/whereabouts-reconciler-nlz59 (currently "")
openshift-multus/whereabouts-reconciler-xsch6 (currently "")

2. [sig-arch] Managed cluster should only include cluster daemonsets that have maxUnavailable or maxSurge update of 10 percent or maxUnavailable of 33 percent [Suite:openshift/conformance/parallel]
Reason is:

fail [github.com/openshift/origin/test/extended/operators/daemon_set.go:105]: Sep 23 16:12:15.283: Daemonsets found that do not meet platform requirements for update strategy:
  expected daemonset openshift-multus/whereabouts-reconciler to have maxUnavailable 10% or 33% (see comment) instead of 1, or maxSurge 10% instead of 0
Ginkgo exit error 1: exit with code 1

3.[sig-arch] Managed cluster should set requests but not limits [Suite:openshift/conformance/parallel]

Reason is:

fail [github.com/openshift/origin/test/extended/operators/resources.go:196]: Sep 23 16:12:17.489: Pods in platform namespaces are not following resource request/limit rules or do not have an exception granted:
  apps/v1/DaemonSet/openshift-multus/whereabouts-reconciler/container/whereabouts defines a limit on cpu of 50m which is not allowed (rule: "apps/v1/DaemonSet/openshift-multus/whereabouts-reconciler/container/whereabouts/limit[cpu]")
  apps/v1/DaemonSet/openshift-multus/whereabouts-reconciler/container/whereabouts defines a limit on memory of 100Mi which is not allowed (rule: "apps/v1/DaemonSet/openshift-multus/whereabouts-reconciler/container/whereabouts/limit[memory]")
Ginkgo exit error 1: exit with code 1

4. [sig-node][apigroup:config.openshift.io] CPU Partitioning cluster platform workloads should be annotated correctly for DaemonSets [Suite:openshift/conformance/parallel]

Reason is:

fail [github.com/openshift/origin/test/extended/cpu_partitioning/pods.go:159]: Expected
    <[]error | len:1, cap:1>: [
        <*errors.errorString | 0xc0010fa380>{
            s: "daemonset (whereabouts-reconciler) in openshift namespace (openshift-multus) must have pod templates annotated with map[target.workload.openshift.io/management:{\"effect\": \"PreferredDuringScheduling\"}]",
        },
    ]
to be empty

How reproducible: Always
Steps to Reproduce: Run conformance testsuite:
https://github.com/openshift/origin/blob/master/test/extended/README.md

Actual results: Testcases failing
Expected results: Testcases passing

https://github.com/openshift/cluster-network-operator/pull/2168

Bug OCPBUGS-32506: Setting image trigger from web console adds annotation as pause instead of paused

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26073~~. The following is the description of the original issue:
—
Description of problem:

We want to update trigger from auto to manual or vice versa. We can do it with CLI 'oc set triggers deployment/<name> --manual'. It normally changes to deployment annotation metadata.annotations.image.openshift.io/triggers to "paused: true" or "paused: false" when set to auto. But when we enable or disable auto trigger by editing deployment from web console, it overrides annotation to "pause: false" or "pause: true" without 'd'.

Version-Release number of selected component (if applicable):

How reproducible:

Create simple httpd application. Follow [1] to  set trigger using CLI. Steps to set trigger from console:

Web console->deployment-> Edit deployment > Form view-> Images section -> Enable Deploy image from an image stream tag -> Enable Auto deploy when new Image is available an save the changes -> check annotations

[1] https://docs.openshift.com/container-platform/4.12/openshift_images/triggering-updates-on-imagestream-changes.html

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

code: https://github.com/openshift/console/blob/master/frontend/packages/dev-console/src/utils/resource-label-utils.ts#L78

https://github.com/openshift/console/pull/13783

Bug OCPBUGS-44522: OAuth, Konnectivity, Ingress, Ignition fails due to netpol in HCP deployed with NodePort via KubeVirt

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43973~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42879~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42579. The following is the description of the original issue:
—
Hello Team,

When we deploy the HyperShift cluster with OpenShift Virtualization by specifying NodePort strategy for services, the requests to ignition, oauth, connectivity (for oc rsh, oc logs, oc exec), virt-launcher-hypershift-node-pool pod fails as by default following netpols get created automatically and restricting the traffic on on all other ports.

$ oc get netpol
NAME                      POD-SELECTOR           AGE
kas                       app=kube-apiserver     153m
openshift-ingress         <none>                 153m
openshift-monitoring      <none>                 153m
same-namespace            <none>                 153m

I resolved

$ cat ingress-netpol
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ingress
spec:
  ingress:
  - ports:
    - port: 31032
      protocol: TCP
  podSelector:
    matchLabels:
      kubevirt.io: virt-launcher
  policyTypes:
  - Ingress


$ cat oauth-netpol
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: oauth
spec:
  ingress:
  - ports:
    - port: 6443
      protocol: TCP
  podSelector:
    matchLabels:
      app: oauth-openshift
      hypershift.openshift.io/control-plane-component: oauth-openshift
  policyTypes:
  - Ingress


$ cat ignition-netpol
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: nodeport-ignition-proxy
spec:
  ingress:
  - ports:
    - port: 8443
      protocol: TCP
  podSelector:
    matchLabels:
      app: ignition-server-proxy
  policyTypes:
  - Ingress


$ cat konn-netpol
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: konn
spec:
  ingress:
  - ports:
    - port: 8091
      protocol: TCP
  podSelector:
    matchLabels:
      app: kube-apiserver
      hypershift.openshift.io/control-plane-component: kube-apiserver
  policyTypes:
  - Ingress

The bug for ignition netpol has already been reported.

--> https://issues.redhat.com/browse/OCPBUGS-39158

--> https://issues.redhat.com/browse/OCPBUGS-39317

It would be helpful if these policies get created automatically as well or maybe we get an option in HyperShift to disable the automatic management of network policies where we can manually take care of the network policies.

https://github.com/openshift/hypershift/pull/5120

Bug OCPBUGS-33191: Hide dev perspective Pipelines nav option if dynamic plugin nav option is enable

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31431~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13810

Bug OCPBUGS-22560: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver-operator/pull/107

Bug OCPBUGS-24385: Align status and assignee between jira and github in predispatch script

View the Description View the linked PRs

In the python script used during bug pre-dispatch, we should align status and assignee between jira and github, keeping github as the source of truth:

For a given github issue, the jira story that tracks it must show the same status (open / closed) and the same assignee; if not, script will align it.
If ever more than one jira story tracks the same github issue, then the jira story with the lowest ID is kept and all other ones are closed.

https://github.com/openshift/network-tools/pull/100

Bug OCPBUGS-32933: [release-4.15] OperatorHub: Change the display text of the filtering option for FIPS to “Designed for FIPS”

View the Description View the linked PRs

Overview

Change the display text of the filtering option for FIPS to “Designed for FIPS” in the OperatorHub to align with the latest official Red Hat wording about FIPS.

Background

The updated official Red Hat wording about FIPS is reflected in: https://docs.google.com/presentation/d/1Rt0AoUK12__2CQgWM_Qk9A7GPEqKMO4mQZhW3mA4pU8/edit#slide=id.g285e61acaf3_1_66 (slides 2 & 3)

As NIST only validated FIPS crypto modules (not solutions), at Red Hat we describe our solutions as “Designed for FIPS”, then to meet the FIPS 140 regulatory requirement the solution must use a FIPS validated cryptographic module.

In the OperatorHub, the current UI flag states “FIPS mode” which is meant to demonstrate the solution can operate in FIPS mode however could be confusing:
- Does “FIPS mode” mean we have both a “Designed for FIPS” solution AND a FIPS-validated crypto module?
- If I use such a solution on an OCP FIPS cluster, do I obey FIPS 140 regulation and there is nothing else to do?

Hence, this story tracks the display text alignment to use “Designed for FIPS” as the filtering option in the OperatorHub in the console.
(see notes in the attached screenshot: operatorhub_filtering_Designed-for-FIPS.png)

Acceptance Criteria

OperatorHub UI uses “Designed for FIPS” as the display text of the filtering for the ‘FIPS’ infrastructure feature CSV annotation.

This change requires backporting to OCP 4.14 and OCP 4.15.

https://github.com/openshift/console/pull/13796

Bug ACM-7278: hcp with --secrets-creds provided still requires pull secret

View the Description View the linked PRs

Description of problem:

When we try to create a cluster with --secret-creds, an MCE AWS k8s secret that includes aws-creds, pull secret, and base domain, then the binary should not ask for pull secret. However, it does now after changing from hypershift.

Adding pull secret param will allow the command to continue as expected, though I would think whole point of the secret-creds is to reuse what exists.

 /usr/local/bin/hcp create cluster aws --name acmqe-hc-ad5b1f645d93464c --secret-creds test1-cred --region us-east-1 --node-pool-replicas 1 --namespace local-cluster --instance-type m6a.xlarge --release-image quay.io/openshift-release-dev/ocp-release:4.14.0-ec.4-multi --generate-ssh Output:
  Error: required flag(s) "pull-secret" not set
  required flag(s) "pull-secret" not set

Version-Release number of selected component (if applicable):

2.4.0-DOWNANDBACK-2023-08-31-13-34-02 or mce 2.4.0-137

hcp version openshift/hypershift: 8b4b52925d47373f3fe4f0d5684c88dc8a93368a. Latest supported OCP: 4.14.0

How reproducible:

always

Steps to Reproduce:

download hcp cli from mce
run hcp cluster create aws with valid secret-creds param
...

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3013

Bug HOSTEDCP-1956: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4697

Bug OCPBUGS-19165: Update 4.15 ose-nutanix-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-nutanix/pull/19

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-nutanix/pull/19

Bug OCPBUGS-20506: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-rukpak/pull/36

Bug OCPBUGS-22054: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/62

Bug OCPBUGS-33542: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8392

Bug OCPBUGS-37067: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-30601~~.

https://github.com/openshift/installer/pull/8746

Bug OCPBUGS-44046: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/454

Bug OCPBUGS-35131: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8646

Task OSASINFRA-3297: MAPO: remove unnecessary retrieval of Network ID during Port specification

View the linked PRs

https://github.com/openshift/machine-api-provider-openstack/pull/96

Bug TRT-1512: Aggregator not seeing test results?

View the Description View the linked PRs

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregated-azure-sdn-upgrade-4.15-minor-release-openshift-release-analysis-aggregator/1757905312053989376

Aggregator claims these tests only ran 4 times out of what looks like 10 jobs that ran to normal completion:

[sig-network-edge] Application behind service load balancer with PDB remains available using new connections
[sig-network-edge] Application behind service load balancer with PDB remains available using reused connections

However looking at one of the jobs not in the list of passes, we can see these tests ran:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1757905303602466816

Why is the aggregator missing this result somehow?

https://github.com/openshift/origin/pull/28610

Task HOSTEDCP-1209: Update Containerfile.operator tags to prevent inheriting them from ubi

View the Description View the linked PRs

We are currently inheriting labels:

skopeo inspect -n docker://quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-operator/hypershift-operator-main@sha256:a2e9ad049c260409cb09f82396be70d60efa4ed579ac8f95cb304332b8a9920a | jq -e ".Labels"
{
  "architecture": "x86_64",
  "build-date": "2023-09-21T19:24:45",
  "com.redhat.component": "ubi9-minimal-container",
  "com.redhat.license_terms": "https://www.redhat.com/en/about/red-hat-end-user-license-agreements#UBI",
  "description": "The Universal Base Image Minimal is a stripped down image that uses microdnf as a package manager. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.",
  "distribution-scope": "public",
  "io.buildah.version": "1.31.0",
  "io.k8s.description": "The Universal Base Image Minimal is a stripped down image that uses microdnf as a package manager. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.",
  "io.k8s.display-name": "Red Hat Universal Base Image 9 Minimal",
  "io.openshift.expose-services": "",
  "io.openshift.hypershift.control-plane-operator-applies-management-kas-network-policy-label": "true",
  "io.openshift.hypershift.control-plane-operator-creates-aws-sg": "true",
  "io.openshift.hypershift.control-plane-operator-manages-ignition-server": "true",
  "io.openshift.hypershift.control-plane-operator-manages.cluster-autoscaler": "true",
  "io.openshift.hypershift.control-plane-operator-manages.cluster-machine-approver": "true",
  "io.openshift.hypershift.control-plane-operator-manages.decompress-decode-config": "true",
  "io.openshift.hypershift.control-plane-operator-skips-haproxy": "true",
  "io.openshift.hypershift.control-plane-operator-subcommands": "true",
  "io.openshift.hypershift.ignition-server-healthz-handler": "true",
  "io.openshift.hypershift.restricted-psa": "true",
  "io.openshift.tags": "minimal rhel9",
  "maintainer": "Red Hat, Inc.",
  "name": "ubi9-minimal",
  "release": "750",
  "summary": "Provides the latest release of the minimal Red Hat Universal Base Image 9.",
  "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/ubi9-minimal/images/9.2-750",
  "vcs-ref": "7ef59505f75bf0c11c8d3addefebee5ceaaf4c41",
  "vcs-type": "git",
  "vendor": "Red Hat, Inc.",
  "version": "9.2"
}

Thus, we need to set:

description
io.k8s.description
io.k8s.display-name
io.openshift.tags
summary

https://github.com/openshift/hypershift/pull/3039

Bug OCPBUGS-19189: Update 4.15 ose-cluster-update-keys image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/51

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/51

Bug OCPBUGS-24068: Update 4.15 ose-cluster-olm-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-olm-operator/pull/35

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-olm-operator/pull/35

Bug OCPBUGS-24163: Update 4.15 ose-aws-pod-identity-webhook-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/aws-pod-identity-webhook/pull/179

Bug OCPBUGS-18307: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/180

Bug OCPBUGS-18357: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1105

Bug OCPBUGS-18986: AWS IAM Instance Profiles sometimes not cleant up

View the Description View the linked PRs

Description of problem:

We have observed that when creating clusters through OCM using the Hive provisioner, which uses OpenShift installer, sometimes some of the AWS IAM Instance Profiles are not cleant up when their corresponding cluster.

Version-Release number of selected component (if applicable):

"time=\"2023-09-11T10:37:10Z\" level=debug msg=\"OpenShift Installer v4.12.0\""

How reproducible:

At the moment we have not found a way to reproduce it consistently, but what we observe is that it does not seem to be an isolated case due to we ended up accumulating AWS IAM Instance Profiles in the AWS account that we are making use for our tests.

Actual results:

Sometimes some of the AWS IAM instance profiles associated to the cluster that has been deleted are also cleant up

Expected results:

The AWS IAM instance profiles associated to the cluster that has been deleted are also deleted.

Additional info:

In https://issues.redhat.com/browse/OCM-2748 we have been doing an investigation of accumulated AWS IAM Instance Profiles in one of our AWS accounts. If you are interested in full details of the investigation please take a look at the issue and its comments.

Focusing on the instance profiles associated to clusters that we create as part of our test suite we see that the majority of them are worker instance profiles. We also see some occurrences of master and bootstrap instance profiles but for the purposes of the investigation we focused on worker profile because they are the vast majority of the accumulated ones.

For the purposes of the investigation we focused on a specific cluster 'cs-ci-2lmxd' and we have seen that the worker iam instance profile was created by the openshift installer:

time="2023-09-11T10:37:43Z" level=debug msg="module.iam.aws_iam_instance_profile.worker: Creation complete after 0s [id=cs-ci-2lmxd-9qtk4-worker-profile]"

But we found that when the cluster was deleted the openshift installer didn't delete it.
However, we could see that the master profile was created:

time="2023-09-11T10:37:43Z" level=debug msg="module.masters.aws_iam_instance_profile.master: Creation complete after 0s [id=cs-ci-2lmxd-9qtk4-master-profile]"

but in this case openshift installer deleted it properly when the cluster was deleted:

time="2023-09-11T10:49:58Z" level=info msg=Deleted InstanceProfileName=cs-ci-2lmxd-9qtk4-master-profile arn="arn:aws:iam::765374464689:instance-profile/cs-ci-2lmxd-9qtk4-master-profile" id=i-079f2d1580240e3cb resourceType=instance

As additional information, I can see that the worker profile has no tags:

msoriano@localhost:~/go/src/gitlab.cee.redhat.com/service/uhc-clusters-service (master)(ocm:S)$ aws iam list-instance-profile-tags --instance-profile-name=cs-ci-2lmxd-9qtk4-worker-profile
{
    "Tags": []
}

I attach the install and uninstall logs in this issue too.

Bug OCPBUGS-31954: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/81

Bug OCPBUGS-24096: Update 4.15 ose-csi-driver-shared-resource-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/156

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/156

Bug OCPBUGS-26993: [Driver: pd.csi.storage.gke.io] [Testpattern: Dynamic PV (block volmode)] provisioning should provision storage with pvc data source in parallel [Slow] failing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26486~~. The following is the description of the original issue:
—
Description of problem:

The following test started to fail freequently in the periodic tests:

External Storage [Driver: pd.csi.storage.gke.io] [Testpattern: Dynamic PV
 (block volmode)] provisioning should provision storage with pvc data 
source in parallel

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Sometimes, but way too often in the CI

Steps to Reproduce:

    1. Run the periodic-ci-openshift-release-master-nightly-X.X-e2e-gcp-ovn-csi test

Actual results:

    Provisioning of some volumes fails with

time="2024-01-05T02:30:07Z" level=info msg="resulting interval message" message="{ProvisioningFailed  failed to provision volume with StorageClass \"e2e-provisioning-9385-e2e-scw2z8q\": rpc error: code = Internal desc = CreateVolume failed to create single zonal disk pvc-35b558d6-60f0-40b1-9cb7-c6bdfa9f28e7: failed to insert zonal disk: unknown Insert disk operation error: rpc error: code = Internal desc = operation operation-1704421794626-60e299f9dba08-89033abf-3046917a failed (RESOURCE_OPERATION_RATE_EXCEEDED): Operation rate exceeded for resource 'projects/XXXXXXXXXXXXXXXXXXXXXXXX/zones/us-central1-a/disks/pvc-501347a5-7d6f-4a32-b0e0-cf7a896f316d'. Too frequent operations from the source resource. map[reason:ProvisioningFailed]}"

Expected results:

    Test passes

Additional info:

    Looks like we're hitting the API quota limits with the test

Failed test run example:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-ovn-csi/1743082616304701440

Link to Sippy:

https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Dynamic%20PV%20%28block%20volmode%29&component=Storage%20%2F%20Kubernetes%20External%20Components&confidence=95&environment=ovn%20no-upgrade%20amd64%20gcp%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=gcp&platform=gcp&sampleEndTime=2024-01-08%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-01-02%2000%3A00%3A00&testId=openshift-tests%3A7845229f6a2c8faee6573878f566d2f3&testName=External%20Storage%20%5BDriver%3A%20pd.csi.storage.gke.io%5D%20%5BTestpattern%3A%20Dynamic%20PV%20%28block%20volmode%29%5D%20provisioning%20should%20provision%20storage%20with%20pvc%20data%20source%20in%20parallel%20%5BSlow%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=standard&variant=standard

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/114

Bug OCPBUGS-30042: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1212

Bug OCPBUGS-14787: Stop installing containernetworking-cni plugins in ovnk images

View the Description View the linked PRs

Nothing uses these plugins in the ovnk image, and having them complicates security checking that needs to use a different path to check RPMs instead of stuff build directly in the dockerfile.

Since they're unused, just remove them.

https://github.com/openshift/ovn-kubernetes/pull/1702

Bug OCPBUGS-22473: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3206

Bug OCPBUGS-23084: Test "start build with broken proxy should start a build and wait for the build to fail [apigroup:build.openshift.io]" is too loose with checking for errors

View the Description View the linked PRs

Description of problem:

When the "start build with broken proxy should start a build and wait for the build to fail [apigroup:build.openshift.io]" test runs, it expects the build to exit with a failure before printing the text "clone" for its log.
Part of attempting to add a variant of this test which exercises the same functionality using an unprivileged build involves turning up the logging level so that the builder will log information that the test can look for which confirms that it was run in an unprivileged mode.  I'd like for it to print the name under which it was invoked, so that it's easier to find where a particular container's output starts in the log, but that name is openshift-git-clone.  The log message which would indicate that the test failed includes the text "git clone", so I'd like to amend the test to fail when that text is found in the log instead.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Modify the test to increase the logging level for its test build.
2. Apply https://github.com/openshift/builder/pull/358 to the builder image.
3. Run the test.

Actual results:

The test always fails (or "fails").

Expected results:

The test passes, unless we broke something somewhere.

Additional info:

https://github.com/openshift/origin/pull/28352

Bug OCPBUGS-38323: Potentially wrong junit test report name

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38129~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-38119~~. The following is the description of the original issue:
—
Description of problem:

It would be nice to have each of the e2e test specs shown in the test grid report (https://testgrid.k8s.io/redhat-openshift-olm#periodic-ci-openshift-operator-framework-olm-master-periodics-e2e-gcp-olm&show-stale-tests=). I noticed that the test grid for 4.14 is exhibiting the right behaviour: 
https://testgrid.k8s.io/redhat-openshift-olm#periodic-ci-openshift-operator-framework-olm-release-4.14-periodics-e2e-gcp-olm&show-stale-tests=

So, we should make the junit e2e report look like what it looks like in the 4.14 branch.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Open browser of your choice
    2. Go to the link in the description section
    3. Direct eyeballs to screen

Actual results:

    No e2e specs in the test grid table

Expected results:

    e2e specs in the test grid table

Additional info:

https://github.com/openshift/operator-framework-olm/pull/839

Bug OCPBUGS-43716: [4.15 IPSEC] pod to pod communication is degraded

View the Description View the linked PRs

Description of problem:

Bare Metal UPI cluster

Nodes lose communication with other nodes and this affects the pod communication on these nodes as well. This issue can be fixed with an OVN rebuild on the nodes db that are hitting the issue but eventually the nodes will degrade again and lose communication again. Note despite an OVN Rebuild fixing the issue temporarily Host Networking is set to True so it's using the kernel routing table. 

**update: observed on Vsphere with routingViaHost: false, ipForwarding: global configuration as well.

Version-Release number of selected component (if applicable):

 4.14.7, 4.14.30

How reproducible:

Can't reproduce locally but reproducible and repeatedly occurring in customer environment

Steps to Reproduce:

identify a host node who's pods can't be reached from other hosts in default namespaces ( tested via openshift-dns). observe curls to that peer pod consistently timeout. TCPdumps to target pod observe that packets are arriving and are acknowledged, but never route back to the client pod successfully. (SYN/ACK seen at pod network layer, not at geneve; so dropped before hitting geneve tunnel).

Actual results:

Nodes will repeatedly degrade and lose communication despite fixing the issue with a ovn db rebuild (db rebuild only provides hours/days of respite, no permanent resolve).

Expected results:

Nodes should not be losing communication and even if they did it should not happen repeatedly

Additional info:

What's been tried so far
========================

- Multiple OVN rebuilds on different nodes (works but node will eventually hit issue again)

- Flushing the conntrack (Doesn't work)

- Restarting nodes (doesn't work)

Data gathered
=============

- Tcpdump from all interfaces for dns-pods going to port 7777 (to segregate traffic)

- ovnkube-trace

- SOSreports of two nodes having communication issues before an OVN rebuild

- SOSreports of two nodes having communication issues after an OVN rebuild 

- OVS trace dumps of br-int and br-ex 


====

More data in nested comments below.

linking KCS: https://access.redhat.com/solutions/7091399

https://github.com/openshift/cluster-network-operator/pull/2599

Story TRT-1362: Disruption Automated Test Data Update Stuck for a Month

View the Description View the linked PRs

https://github.com/openshift/origin/pull/28360

Failing unit tests.

Every row has MasterNodesUpdated null, might have something to do with it. Fix would be in ci-tools.

https://github.com/openshift/origin/pull/28409

Bug OCPBUGS-18439: Failure when creating operator-backed resources

View the Description View the linked PRs

Description of problem:

In the developer sandbox, the happy path to create operator-backed resources is broken.

Users can only work on their assigned namespace. When doing so, and attempting to create an Operator-backed resource from the Developer console, the user interface switches inadvertendly the working namespace from the user's to the `openshift` one. The console shows an error message when the user clicks the "create" button.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Login to the Developer Sandbox
2. Choose the Developer view
3. Click Add+ -> Developer Catalog -> Operator Backed
4. Filter by "integration"
5. Notice the working namespace is still the user's one. 
6. Select "Integration" (Camel K operator)
7. Click "Create"
8. Notice the working namespace has switched to `openshift`
9. Notice the custom resource in YAML view includes `namespace: openshift`
10. Click "Create"

Actual results:

An error message shows: "Danger alert:An error occurredintegrations.camel.apache.org is forbidden: User "bmesegue" cannot create resource "integrations" in API group "camel.apache.org" in the namespace "openshift""

Expected results:

On step 8, the working directory should remain the user's one
On step 9, in the YAML view, the namespace should be the user's one, or none.
After step 10, the creation process should trigger the creation of a Camel K integration.

Additional info:

https://github.com/openshift/console/pull/13132

Bug OCPBUGS-36769: Machine stuck in Provisioned when the cluster is upgraded from 4.1 to 4.15

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36330~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-28974~~. The following is the description of the original issue:
—
Description of problem:

Machine stuck in Provisioned when the cluster is upgraded from 4.1 to 4.15

Version-Release number of selected component (if applicable):

Upgrade from 4.1 to 4.15
4.1.41-x86_64, 4.2.36-x86_64, 4.3.40-x86_64, 4.4.33-x86_64, 4.5.41-x86_64, 4.6.62-x86_64, 4.7.60-x86_64, 4.8.57-x86_64, 4.9.59-x86_64, 4.10.67-x86_64, 4.11 nightly, 4.12 nightly, 4.13 nightly, 4.14 nightly, 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest

How reproducible:

Seems always, the issue was found in our prow ci, and I also reproduce it.

Steps to Reproduce:

1.Create an aws IPI 4.1 cluster, then upgrade it one by one to 4.14
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2024-01-19-110702   True        True          26m     Working towards 4.12.0-0.nightly-2024-02-04-062856: 654 of 830 done (78% complete), waiting on authentication, openshift-apiserver, openshift-controller-manager
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2024-02-04-062856   True        False         5m12s   Cluster version is 4.12.0-0.nightly-2024-02-04-062856
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2024-02-04-062856   True        True          61m     Working towards 4.13.0-0.nightly-2024-02-04-042638: 713 of 841 done (84% complete), waiting up to 40 minutes on machine-config
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2024-02-04-042638   True        False         10m     Cluster version is 4.13.0-0.nightly-2024-02-04-042638
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2024-02-04-042638   True        True          17m     Working towards 4.14.0-0.nightly-2024-02-02-173828: 233 of 860 done (27% complete), waiting on control-plane-machine-set, machine-api
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2024-02-02-173828   True        False         18m     Cluster version is 4.14.0-0.nightly-2024-02-02-173828     

2.When it upgrade to 4.14, check the machine scale successfully
liuhuali@Lius-MacBook-Pro huali-test %  oc create -f ms1.yaml 
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa created
liuhuali@Lius-MacBook-Pro huali-test % oc get machineset
NAME                                            DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a    1         1         1       1           14h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa   0         0                             3s
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f    2         2         2       2           14h
liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa --replicas=1
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                  PHASE     TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                   Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-master-1                   Running   m6a.xlarge   us-east-1   us-east-1a   15h
ci-op-trzci0vq-8a8c4-dq95h-master-2                   Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt    Running   m6a.xlarge   us-east-1   us-east-1a   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa-mt9kh   Running   m6a.xlarge   us-east-1   us-east-1a   15m
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k    Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb    Running   m6a.xlarge   us-east-1   us-east-1f   15h
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-128-51.ec2.internal    Ready    master   15h     v1.27.10+28ed2d7
ip-10-0-143-198.ec2.internal   Ready    worker   14h     v1.27.10+28ed2d7
ip-10-0-143-64.ec2.internal    Ready    worker   14h     v1.27.10+28ed2d7
ip-10-0-143-80.ec2.internal    Ready    master   15h     v1.27.10+28ed2d7
ip-10-0-144-123.ec2.internal   Ready    master   15h     v1.27.10+28ed2d7
ip-10-0-147-94.ec2.internal    Ready    worker   14h     v1.27.10+28ed2d7
ip-10-0-158-61.ec2.internal    Ready    worker   3m40s   v1.27.10+28ed2d7
liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa --replicas=0
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get node                                                                   
NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-128-51.ec2.internal    Ready    master   15h   v1.27.10+28ed2d7
ip-10-0-143-198.ec2.internal   Ready    worker   15h   v1.27.10+28ed2d7
ip-10-0-143-64.ec2.internal    Ready    worker   15h   v1.27.10+28ed2d7
ip-10-0-143-80.ec2.internal    Ready    master   15h   v1.27.10+28ed2d7
ip-10-0-144-123.ec2.internal   Ready    master   15h   v1.27.10+28ed2d7
ip-10-0-147-94.ec2.internal    Ready    worker   15h   v1.27.10+28ed2d7
liuhuali@Lius-MacBook-Pro huali-test % oc get machine                                                                
NAME                                                 PHASE     TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                  Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-master-1                  Running   m6a.xlarge   us-east-1   us-east-1a   15h
ci-op-trzci0vq-8a8c4-dq95h-master-2                  Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt   Running   m6a.xlarge   us-east-1   us-east-1a   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k   Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb   Running   m6a.xlarge   us-east-1   us-east-1f   15h
liuhuali@Lius-MacBook-Pro huali-test % oc delete machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa 
machineset.machine.openshift.io "ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa" deleted
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2024-02-02-173828   True        False         43m     Cluster version is 4.14.0-0.nightly-2024-02-02-173828     

3.Upgrade to 4.15
As upgrade to 4.15 nightly stuck on operator-lifecycle-manager-packageserver which is a bug https://issues.redhat.com/browse/OCPBUGS-28744  so I build image with the fix pr (job build openshift/operator-framework-olm#679 succeeded) and upgrade to the image, upgrade successfully

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2024-02-02-173828   True        True          7s      Working towards 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest: 10 of 875 done (1% complete)
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         23m     Cluster version is 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest
liuhuali@Lius-MacBook-Pro huali-test % oc get co
NAME                                       VERSION                                                   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
baremetal                                  4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      11h     
cloud-controller-manager                   4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      8h      
cloud-credential                           4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
cluster-autoscaler                         4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
config-operator                            4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      13h     
console                                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      3h19m   
control-plane-machine-set                  4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      5h      
csi-snapshot-controller                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      7h10m   
dns                                        4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
etcd                                       4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      14h     
image-registry                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      33m     
ingress                                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
insights                                   4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
kube-apiserver                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      14h     
kube-controller-manager                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      14h     
kube-scheduler                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      14h     
kube-storage-version-migrator              4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      34m     
machine-api                                4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
machine-approver                           4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      13h     
machine-config                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      10h     
marketplace                                4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      10h     
monitoring                                 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
network                                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
node-tuning                                4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      56m     
openshift-apiserver                        4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
openshift-controller-manager               4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      4h56m   
openshift-samples                          4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      58m     
operator-lifecycle-manager                 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
operator-lifecycle-manager-catalog         4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
operator-lifecycle-manager-packageserver   4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      57m     
service-ca                                 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
storage                                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                 PHASE     TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                  Running   m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-master-1                  Running   m6a.xlarge   us-east-1   us-east-1a   16h
ci-op-trzci0vq-8a8c4-dq95h-master-2                  Running   m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt   Running   m6a.xlarge   us-east-1   us-east-1a   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k   Running   m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb   Running   m6a.xlarge   us-east-1   us-east-1f   16h 

4.Check machine scale stuck in Provisioned, no csr pending

liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 created
liuhuali@Lius-MacBook-Pro huali-test % oc get machineset
NAME                                            DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a    1         1         1       1           16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1   0         0                             6s
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f    2         2         2       2           16h
liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 --replicas=1
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                  PHASE          TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                   Running        m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-master-1                   Running        m6a.xlarge   us-east-1   us-east-1a   16h
ci-op-trzci0vq-8a8c4-dq95h-master-2                   Running        m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt    Running        m6a.xlarge   us-east-1   us-east-1a   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1-5g877   Provisioning   m6a.xlarge   us-east-1   us-east-1a   4s
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k    Running        m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb    Running        m6a.xlarge   us-east-1   us-east-1f   16h
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                  PHASE         TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                   Running       m6a.xlarge   us-east-1   us-east-1f   18h
ci-op-trzci0vq-8a8c4-dq95h-master-1                   Running       m6a.xlarge   us-east-1   us-east-1a   18h
ci-op-trzci0vq-8a8c4-dq95h-master-2                   Running       m6a.xlarge   us-east-1   us-east-1f   18h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt    Running       m6a.xlarge   us-east-1   us-east-1a   18h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1-5g877   Provisioned   m6a.xlarge   us-east-1   us-east-1a   97m
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k    Running       m6a.xlarge   us-east-1   us-east-1f   18h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb    Running       m6a.xlarge   us-east-1   us-east-1f   18h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f1-4ln47   Provisioned   m6a.xlarge   us-east-1   us-east-1f   50m
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-128-51.ec2.internal    Ready    master   18h   v1.28.6+a373c1b
ip-10-0-143-198.ec2.internal   Ready    worker   18h   v1.28.6+a373c1b
ip-10-0-143-64.ec2.internal    Ready    worker   18h   v1.28.6+a373c1b
ip-10-0-143-80.ec2.internal    Ready    master   18h   v1.28.6+a373c1b
ip-10-0-144-123.ec2.internal   Ready    master   18h   v1.28.6+a373c1b
ip-10-0-147-94.ec2.internal    Ready    worker   18h   v1.28.6+a373c1b
liuhuali@Lius-MacBook-Pro huali-test % oc get csr
NAME        AGE   SIGNERNAME                                    REQUESTOR                                  REQUESTEDDURATION   CONDITION
csr-596n7   21m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-147-94.ec2.internal    <none>              Approved,Issued
csr-7nr9m   42m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-147-94.ec2.internal    <none>              Approved,Issued
csr-bc9n7   16m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-128-51.ec2.internal    <none>              Approved,Issued
csr-dmk27   18m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-128-51.ec2.internal    <none>              Approved,Issued
csr-ggkgd   64m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-143-198.ec2.internal   <none>              Approved,Issued
csr-rs9cz   70m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-143-80.ec2.internal    <none>              Approved,Issued
liuhuali@Lius-MacBook-Pro huali-test %

Actual results:

 Machine stuck in Provisioned

Expected results:

  Machine should get Running

Additional info:

Must gather: https://drive.google.com/file/d/1TrZ_mb-cHKmrNMsuFl9qTdYo_eNPuF_l/view?usp=sharing 
I can see the provisioned machine on AWS console: https://drive.google.com/file/d/1-OcsmvfzU4JBeGh5cil8P2Hoe5DQsmqF/view?usp=sharing
System log of ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1-5g877: https://drive.google.com/file/d/1spVT_o0S4eqeQxE5ivttbAazCCuSzj1e/view?usp=sharing 
Some log on the instance: https://drive.google.com/file/d/1zjxPxm61h4L6WVHYv-w7nRsSz5Fku26w/view?usp=sharing

https://github.com/openshift/machine-config-operator/pull/4461

Bug OCPBUGS-5728: "agent wait-for" command is not logging into .openshift_install.log file

View the Description View the linked PRs

Description of problem:

CU has deployed OCP 4.12.rc dev-preview release using Agent based installer. While installation it was observed that [<install-dir>/.openshift_install.log] file only contains the logs of openshift-install agent create image command and other logs are missing [openshift-install agent wait-for].

However with the previous release on IPI/UPI the logs of [wait-for] was available in [.openshift_install.log] file.

CU wants to understand is there is a change in functionality of openshift-install, with agent command and can it be made available ?

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Install OCP 4.12 dev-release using Agent Based installer.
2.Only logs visible of [agent iso create] not for "wait-for" command.
3.

Actual results:

agent wait-for Logs are missing

Expected results:

openshift-install agent wait-for install-complete should logs into .openshift_install.log file

Additional info:

https://github.com/openshift/installer/pull/7452

Bug OCPBUGS-41235: Failed to list secrets when a large number exist on the cluster [4.15]

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41234~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-41233~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39531. The following is the description of the original issue:
—
-> While upgrading the cluster from 4.13.38 -> 4.14.18, it is stuck on CCO, clusterversion is complaining about

"Working towards 4.14.18: 690 of 860 done (80% complete), waiting on cloud-credential".

While checking further we see that CCO deployment is yet to rollout.

-> ClusterOperator status.versions[name=operator] isn't a narrow "CCO Deployment is updated", it's "the CCO asserts the whole CC component is updated", which requires (among other things) a functional CCO Deployment. Seems like you don't have a functional CCO Deployment, because logs have it stuck talking about asking for a leader lease. You don't have Kube API audit logs to say if it's stuck generating the Lease request, or waiting for a response from the Kube API server.

https://github.com/openshift/cloud-credential-operator/pull/754

Bug OCPBUGS-9285: Documentation: Help Explain OpenShift Console List & Detail Resource Pages For Plugin Developers

View the Description View the linked PRs

The issue:

An interesting issue came up on #forum-ui-extensibility. There was an attempt to use extensions to nest a details page under a details page that contained a horizontal nav. This caused an issue with rendering the page content when a sub link was clicked – which caused confusion.

The why:

The reason this happened was the resource details page had a tab that contained a resource list page. This resource list page showed a number of items of CRs that when clicked would try to append their name onto the URL. This confused the navigation, thinking that this path must be another tab, so no tabs were selected and no content was visible. The goal was to reuse this longer path name as a details page of its own with its own horizontal nav. This issue is a conceptual misunderstanding of the way our list & details pages work in OpenShift Console.

List Pages are sometimes found via direct navigation links. List pages are almost all shown on the Search page, allowing a user to navigate to both existing nav items and other non-primary resources.

Details Pages are individual items found in the List Pages (a row). These are stand alone pages that show details of a singular CR and optionally can have tabs that list other resources – but they always transition to a fresh Details page instead of compounding on the currently visible one.

The ask:

If we could document this in a fashion that can help Plugin developers share the same UX that the rest of the Console does then we will have a more unified approach to UX within the Console and through any installed Plugins.

https://github.com/openshift/console/pull/13109

Bug OCPBUGS-19115: Update 4.15 ose-kubevirt-csi-driver-rhel8 image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubevirt-csi-driver/pull/23

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubevirt-csi-driver/pull/23

Bug OCPBUGS-19512: Faster risk cache warming

View the Description View the linked PRs

~~OCPBUGS-5469~~ and backports began prioritizing later target releases, but we still wait 10m between different PromQL evaluations while evaluating conditional update risks. This ticket is tracking work to speed up cache warming, and allows changes that are too invasive to be worth backporting.

Definition of done:

When presented with new risks, the CVO will initially evaluate one PromQL expression every second or so, instead of waiting 10m between different evaluations. Each PromQL expression will still only be evaluated once every hour or so, to avoid excessive load on the PromQL engine.

Acceptance Criteria:

After changing the channel and receiving a new graph conditional risks are evaluated as quickly as possible, ideally less than 500ms per unique risk

https://github.com/openshift/cluster-version-operator/pull/939

Bug OCPBUGS-24081: Update 4.15 ose-cluster-authentication-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-authentication-operator/pull/643

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-authentication-operator/pull/643

Bug OCPBUGS-27486: ECR Image pull fails in-spite of attaching AmazonEC2ContainerRegistryReadOnly policy to the worker nodes.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25662~~. The following is the description of the original issue:
—
Description of problem:

In ROSA/OCP 4.14.z, attaching AmazonEC2ContainerRegistryReadOnly policy to the worker nodes (in ROSA's case, this was attached to the ManagedOpenShift-Worker-Role, which is assigned by the installer to all the worker nodes), has no effect on ECR Image pull. User gets an authentication error. Attaching the policy ideally should avoid the need to provide an image-pull-secret. However, the error is resolved only if the user also provides an image-pull-secret.
This is proven to work correctly in 4.12.z. Seems something has changed in the recent OCP versions.

Version-Release number of selected component (if applicable):

4.14.2 (ROSA)

How reproducible:

The issue is reproducible using the below steps.

Steps to Reproduce:

    1. Create a deployment in ROSA or OCP on AWS, pointing at a private ECR repository
    2. The image pulling will fail with Error: ErrImagePull & authentication required errors
    3.

Actual results:

The image pull fails with "Error: ErrImagePull" & "authentication required" errors. However, the image pull is successful only if the user provides an image-pull-secret to the deployment.

Expected results:

The image should be pulled successfully by virtue of the ECR-read-only policy attached to the worker node role; without needing an image-pull-secret.

Additional info:

In other words:

in OCP 4.13 (and below) if a user adds the ECR:* permissions to the worker instance profile, then the user can specify ECR images and authentication of the worker node to ECR is done using the instance profile. In 4.14 this no longer works.

It is not sufficient as an alternative, to provide a pull secret in a deployment because AWS rotates ECR tokens every 12 hours. That is not a viable solution for customers that until OCP 4.13, did not have to rotate pull secrets constantly.

The experience in 4.14 should be the same as in 4.13 with ECR.

The current AWS policy that's used is this one: `arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly`

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:DescribeImages",
                "ecr:BatchGetImage",
                "ecr:GetLifecyclePolicy",
                "ecr:GetLifecyclePolicyPreview",
                "ecr:ListTagsForResource",
                "ecr:DescribeImageScanFindings"
            ],
            "Resource": "*"
        }
    ]
}

Bug OCPBUGS-36559: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14422

Bug OCPBUGS-24159: Update 4.15 operator-lifecycle-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/623

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/623

Bug OCPBUGS-27656: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/668

Bug OCPBUGS-28756: [4.15] Metal Day-1 When No Hostname is Provided by Either rDNS or DHCP, All Hosts are Named "localhost".

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19008~~. The following is the description of the original issue:
—
Platform:

IPI on Baremetal

What happened?

In cases where no hostname is provided, host are automatically assigned the name "localhost" or "localhost.localdomain".

[kni@provisionhost-0-0 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready master 31m v1.22.1+6859754
master-0-1 Ready master 39m v1.22.1+6859754
master-0-2 Ready master 39m v1.22.1+6859754
worker-0-0 Ready worker 12m v1.22.1+6859754
worker-0-1 Ready worker 12m v1.22.1+6859754

What did you expect to happen?

Having all hosts come up as localhost is the worst possible user experience, because they'll fail to form a cluster but you won't know why.

However, we know the BMH name in the image-customization-controller, it would be possible to configure the ignition to set a default hostname if we don't have one from DHCP/DNS.

If not, we should at least fail the installation with a specific error message to this situation.

----------
30/01/22 - adding how to reproduce
----------

How to Reproduce:

1)prepare and installation with day-1 static ip.

add to install-config uner one of the nodes:
networkConfig:
routes:
config:

destination: 0.0.0.0/0
next-hop-address: 192.168.123.1
next-hop-interface: enp0s4
dns-resolver:
config:
server:
192.168.123.1
interfaces:
name: enp0s4
type: ethernet
state: up
ipv4:
address:
ip: 192.168.123.110
prefix-length: 24
enabled: true

2)Ensure a DNS PTR for the address IS NOT configured.

3)create manifests and cluster from install-config.yaml

installation should either:
1)fail as early as possible, and provide some sort of feed back as to the fact that no hostname was provided.
2)derive the Hostname from the bmh or the ignition files

https://github.com/openshift/ironic-agent-image/pull/108

Story CORS-2525: Azure: remove storage account with bootstrap destroy

View the Description View the linked PRs

User Story:

I would like for the Azure storage account to be destroyed as part of the bootstrap destroy process, so that the storage account is not persisted for the life of the cluster which incurs costs and other management effort.

Acceptance Criteria:

Description of criteria:

Goal: storage account is destroyed with other bootstrap resources

Engineering Details:

The storage account holds three different (types) of resources:
- boot diagnostic logs - we can get rid of these by using a managed storage account
- bootstrap ignition - used to create the bootstrap vm, is already destroyed as part of bootstrap destroy
- RHCOS VHD - used to create a VM image version
The main effort for this story will be to figure out how to create the gallery image (using the VHD in the storage account) before the bootstrap stage, but delete the storage account (and VHD) along with the other bootstrap resources

https://github.com/openshift/installer/pull/7642

Bug OCPBUGS-19134: Update 4.15 ose-gcp-pd-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver/pull/42

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver/pull/42

Bug OCPBUGS-21792: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-gcp/pull/62

Bug OCPBUGS-24215: kube-controller-manager TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/769

Bug OCPBUGS-19143: Update 4.15 kube-rbac-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-rbac-proxy/pull/72

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-19492: Keepalived on bootstrap doesn't start due to missing configuration

View the Description View the linked PRs

Description of problem:

Keepalived constantly fails on bootstrap causing installation failure

Seems like it doesn't have keepalived.conf file and keepalived monitor fails on
Version-Release number of selected component (if applicable):

4.13.12

How reproducible:

Regular installation through assisted installer

Steps to Reproduce:

1.
2.
3.

Actual results:

keepalived fails to start

Expected results:

Success

Additional info:
*

https://github.com/openshift/baremetal-runtimecfg/pull/276

Bug OCPBUGS-23737: Hypershift requires access to cluster-machine-approver metrics

View the Description View the linked PRs

Description of problem:
OCPCLOUD-2277 restricted access to the cma metrics. This led to a regression in hypershift e2e tests. Long term is likely for hypershift to remove that dependency but to get things working again we plan to revert the cma change until the dependency can be removed.

PR removing the probes from hypershift is being worked on.

https://github.com/openshift/hypershift/pull/3227

Bug OCPBUGS-29846: [release-4.15] there is flicker when clicking on perspective switcher after hard refresh

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25890~~. The following is the description of the original issue:
—
Description of problem:

when user clicks on perspective switcher after a hard refresh, the flicker appears

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-25-100326

How reproducible:

Always after user refresh the console

Steps to Reproduce:

1. user login to OCP console
2. refresh the whole console then click perspective switcher 
3.

Actual results:

there is flicker when clicking on perspective switcher

Expected results:

no flickers

Additional info:

screen recording https://drive.google.com/file/d/1_2tPZ0DXNTapFP9sSz27vKbnwxxdWZSV/view?usp=drive_link

https://github.com/openshift/console/pull/13631

Bug OCPBUGS-32807: Rollback state of managed image pull secrets after downgrade.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32729~~. The following is the description of the original issue:
—
When rolling back from 4.16 to 4.15, rollback changes made to the cluster state to allow the 4.15 version of the managed image pull secret generation to take over again.

https://github.com/openshift/openshift-controller-manager/pull/303

Bug OCPBUGS-21874: Update 4.15 ose-agent-installer-utils-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/agent-installer-utils/pull/31

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/agent-installer-utils/pull/31

Bug OCPBUGS-23376: OCP 4.14 IPI on Vsphere fails with "network '/Datacenter/network' not found" error

View the Description View the linked PRs

Description of problem:

I have a customer trying to deploy 4.14.1 IPI on vsphere and running into:

time="2023-11-14T14:30:35+01:00" level=fatal msg="failed to fetch Terraform Variables: failed to generate asset \"Terraform Variables\": network '/Datacenter_name/VLAN2506' not found

A similar configuration works fine with OCP 4.13

The network profile VLAN2506is available in the given network list of installer survey.

The network is available inside '/datacenter/network/VLAN2506' when checked with govc command.

Found this https://bugzilla.redhat.com/show_bug.cgi?id=2063829 however it was reported when the network is nested under a folder however here the network is inside DC.

We tried this with 4,14 installer in our lab env however did not face this issue.

Version-Release number of selected component (if applicable):
4.14.1

https://github.com/openshift/installer/pull/7737

Bug OCPBUGS-29880: Excessive revision history stored for HyperShift control plane components

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28607~~. The following is the description of the original issue:
—
Description of problem:

HyperShift-managed components use the default RevisionHistoryLimit of 10. This significantly impacts etcd load and scalability on the management cluster.

Version-Release number of selected component (if applicable):

4.9, 4.10, 4.11, 4.12, 4.13, 4.14, 4.15, 4.16

How reproducible:

100% (may vary depending on resource availablility on management cluster)

Steps to Reproduce:

    1. Create 375+ HostedCluster
    2. Observe etcd performance on management cluster
    3.

Actual results:

etcd hitting storage space limits

Expected results:

Able to manage HyperShift control planes at scale (375+ HostedClusters)

Additional info:

https://github.com/openshift/hypershift/pull/3632

Bug OCPBUGS-31839: [release-4.15] When starting cluster update, nodes in paused MCPs begin updating

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23319~~. The following is the description of the original issue:
—
Description of problem:

When using canary rollout, paused MCPs begin updating when the user triggers the cluster update.

Version-Release number of selected component (if applicable):

How reproducible:

Approximately 3/10 times that I have witnessed.

Steps to Reproduce:

1. Install cluster
2. Follow canary rollout strategy: https://docs.openshift.com/container-platform/4.11/updating/update-using-custom-machine-config-pools.html 
3. Start cluster update

Actual results:

Worker nodes in paused MCPs begin update

Expected results:

Worker nodes in paused MCPs will not begin update until cluster admin unpauses the MCPs

Additional info:

This has occurred with my customer in their Azure self-managed cluster and their on-prem cluster in vSphere, as well as my lab cluster in vSphere.

https://github.com/openshift/console/pull/13735

Bug OCPBUGS-19109: Update 4.15 ose-vsphere-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-vsphere/pull/17

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-20205: origin test suite should not assume a local image registry

View the Description View the linked PRs

Description of problem

ImageRegistry became a new optional component in 4.14 (docs#64469, api#1572). And even before that, it has long been configurable for managementState: Removed. However the no-capabilities test is currently failing like:

message: Back-off pulling image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest"

in clusters without a local registry. We should teach the origin suite to be more forgiving of a lack of internal registry.

Version-Release number of selected component (if applicable):

4.14 and 4.15. But possibly 4.14 is now stable enough about 4.14 no-capabilities jobs to backport any fixes.

How reproducible:

100%

Steps to Reproduce:

1. Open a recent 4.15 no-cap run and see if it passed.

Actual results:

Lots of test-cases failing to pull from image-registry.openshift-image-registry.svc:5000 , which isn't expected to exist for these clusters, where the ImageRegistry capability is not requested.

Expected results:

Passing CI test-cases .

Additional info:

I'm fuzzy on the relationship between ImageStreams and the local image registry, but at the moment, the tools ImageStreams and such are still part of no-caps runs:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-no-capabilities/1709539450616287232/artifacts/e2e-aws-ovn-no-capabilities/gather-must-gather/artifacts/must-gather.tar | tar xOz 31e3c46d361008f321d02ef278f62b1fc4e5510a9902c8ac16de5b2078fed849/namespaces/openshift/image.openshift.io/imagestreams.yaml | yaml2json | jq -r '.items[] | select(.metadata.name == "tools").status'
{
  "dockerImageRepository": "",
  "tags": [
    {
      "items": [
        {
          "created": "2023-10-04T12:24:42Z",
          "dockerImageReference": "registry.ci.openshift.org/ocp/4.15-2023-10-04-015153@sha256:a83089cbb8a8f4ef868e5f37de5d305c10056e4e9761ad37b7c1ab98f465a553",
          "generation": 2,
          "image": "sha256:a83089cbb8a8f4ef868e5f37de5d305c10056e4e9761ad37b7c1ab98f465a553"
        }
      ],
      "tag": "latest"
    }
  ]
}

https://github.com/openshift/origin/pull/28307

Bug OCPBUGS-25521: Update 4.15 ose-csi-external-snapshotter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/134

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/134

Bug OCPBUGS-28838: Failed to watch Metal3Remediation template

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28835~~. The following is the description of the original issue:
—
Description of problem:

NHC failed to watch Metal3 remediation template

Version-Release number of selected component (if applicable):

OCP4.13 and higher

How reproducible:

    100%

Steps to Reproduce:

    1. Create Metal3RemediationTemplate
    2. Install NHCv.0.7.0
    3. Create NHC with Metal3RemediationTemplate

Actual results:

E0131 14:07:51.603803 1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Failed to watch infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3RemediationTemplate: failed to list infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3RemediationTemplate: metal3remediationtemplates.infrastructure.cluster.x-k8s.io is forbidden: User "system:serviceaccount:openshift-workload-availability:node-healthcheck-controller-manager" cannot list resource "metal3remediationtemplates" in API group "infrastructure.cluster.x-k8s.io" at the cluster scope

E0131 14:07:59.912283 1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Failed to watch infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3Remediation: unknown

W0131 14:08:24.831958 1 reflector.go:539] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: failed to list infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3RemediationTemplate: metal3remediationtemplates.infrastructure.cluster.x-k8s.io is forbidden: User "system:serviceaccount:openshift-workload-availability:node-healthcheck-controller-manager" cannot list resource

Expected results:

    No errors

Additional info:

https://github.com/openshift/cluster-api-provider-baremetal/pull/210

Bug OCPBUGS-19202: Update 4.15 ose-route-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/route-controller-manager/pull/30

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/route-controller-manager/pull/30

Bug OCPBUGS-21733: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/552

Bug OCPBUGS-24090: Update 4.15 ose-openshift-apiserver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-apiserver/pull/406

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-apiserver/pull/406

Bug OCPBUGS-33454: AWS new region ca-west-1 is missing in installer’s survey

View the Description View the linked PRs

Description of problem:

ca-west-1 is supported in 4.14-4.16 now https://issues.redhat.com/browse/OCPSTRAT-1177 , but it’s missing in installer’s survey

? Region  [Use arrows to move, type to filter, ? for more help]
  ap-southeast-3 (Asia Pacific (Jakarta))
  ap-southeast-4 (Asia Pacific (Melbourne))
  ca-central-1 (Canada (Central))
> eu-central-1 (Europe (Frankfurt))
  eu-central-2 (Europe (Zurich))
  eu-north-1 (Europe (Stockholm))
  eu-south-1 (Europe (Milan))

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-05-08-090106
4.14.0-0.nightly-2024-05-08-035346
4.15.0-0.nightly-2024-05-08-090106
4.14.0-0.nightly-2024-05-08-035346

How reproducible:

Always

Steps to Reproduce:

    1. openshift-install create cluster
    2.
    3.

Actual results:

See description

Expected results:

ca-west-1 is listed in the survey.

Additional info:

https://github.com/openshift/installer/pull/8381

Bug OCPBUGS-33473: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28666

Bug OCPBUGS-19783: Channel page shows "Required" message for the default name when navigate to create channel page

View the Description View the linked PRs

Description of problem:

When navigating to create Channel page from add or topology, the default name as "channel" is present but still the Create button is disabled with "Required" showing under the name field

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-09-26-042251

How reproducible:

Always

Steps to Reproduce:

1. Install serverless operator
2. Go to Add page in developer perspective
3. Click on the channel card

Actual results:

The create button is disabled with an error showing "Required" under the name field but the name field contains the default name as "channel"

Expected results:

The create button should be active

Additional info:

If you switch to yaml view the create button becomes active and if you switch back to form view the create button is still active

https://github.com/openshift/console/pull/13222

Bug OCPBUGS-22489: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7643

Bug OCPBUGS-25949: CVO should continue to periodically fetch upstream Cincinnati despite Recommended=Unknown risks

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25708~~. The following is the description of the original issue:
—

Description of problem:

Changes made for faster risk cache-warming (the ~~OCPBUGS-19512~~ series) introduced an unfortunate cycle:

1. Cincinnati serves vulnerable PromQL, like graph-data#4524.
2. Clusters pick up that broken PromQL, try to evaluate, and fail. Re-eval-and-fail loop continues.
3. Cincinnati PromQL fixed, like graph-data#4528.
4. Cases:

- (a) Before the cache-warming changes, and also after this bug's fix, Clusters pick up the fixed PromQL, try to evaluate, and start succeeding. Hooray!
- (b) Clusters with the cache-warming changes but without this bug's fix say "it's been a long time since we pulled fresh Cincinanti information, but it has not been long since my last attempt to eval this broken PromQL, so let me skip the Cincinnati pull and re-eval that old PromQL", which fails. Re-eval-and-fail loop continues.

Version-Release number of selected component (if applicable):

The regression went back via:

Updates from those releases (and later in their 4.y, until this bug lands a fix) to later releases are exposed.

How reproducible:

Likely very reproducible for exposed releases, but only when clusters are served PromQL risks that will consistently fail evaluation.

Steps to Reproduce:

1. Launch a cluster.
2. Point it at dummy Cincinnati data, as described in ~~OTA-520~~. Initially declare a risk with broken PromQL in that data, like cluster_operator_conditions.
3. Wait until the cluster is reporting Recommended=Unknown for those risks (oc adm upgrade --include-not-recommended).
4. Update the risk to working PromQL, like group(cluster_operator_conditions). Alternatively, update anything about the update-service data (e.g. adding a new update target with a path from the cluster's version).
5. Wait 10 minutes for the CVO to have plenty of time to pull that new Cincinnati data.
6. oc get -o json clusterversion version | jq '.status.conditionalUpdates[].risks[].matchingRules[].promql.promql' | sort | uniq | jq -r .

Actual results:

Exposed releases will still have the broken PromQL in their output (or will lack the new update target you added, or whatever the Cincinnati data change was).

Expected results:

Fixed releases will have picked up the fixed PromQL in their output (or will have the new update target you added, or whatever the Cincinnati data change was).

Additional info:

Identification

To detect exposure in collected Insights, look for EvaluationFailed conditionalUpdates like:

$ oc get -o json clusterversion version | jq -r '.status.conditionalUpdates[].conditions[] | select(.type == "Recommended" and .status == "Unknown" and .reason == "EvaluationFailed" and (.message | contains("invalid PromQL")))'
{
  "lastTransitionTime": "2023-12-15T22:00:45Z",
  "message": "Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34\nAdding a new worker node will fail for clusters running on ARO. https://issues.redhat.com/browse/MCO-958",
  "reason": "EvaluationFailed",
  "status": "Unknown",
  "type": "Recommended"
}

To confirm in-cluster vs. other EvaluationFailed invalid PromQL issues, you can look for Cincinnati retrieval attempts in CVO logs. Example from a healthy cluster:

$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from\|PromQL' | tail
I1221 20:36:39.783530       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:36:39.831358       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:40:19.674925       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:40:19.727998       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:43:59.567369       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:43:59.620315       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:47:39.457582       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:47:39.509505       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:51:19.348286       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:51:19.401496       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"

showing fetch lines every few minutes. And from an exposed cluster, only showing PromQL eval lines:

$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from\|PromQL' | tail
I1221 20:50:10.165101       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:11.166170       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:12.166314       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:13.166517       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:14.166847       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:15.167737       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:16.168486       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:17.169417       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:18.169576       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:19.170544       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from' | tail
...no hits...

Recovery

If bitten, the remediation is to address the invalid PromQ. For example, we fixed that AROBrokenDNSMasq expression in graph-data#4528. And after that the local cluster administrator should restart their CVO, such as with:

$ oc -n openshift-cluster-version delete -l k8s-app=cluster-version-operator pods

https://github.com/openshift/cluster-version-operator/pull/1013

Bug OCPBUGS-28594: HyperShift Certificates Objects In Incorrect API Group

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28576~~. The following is the description of the original issue:
—
Description of problem:

    Certificate related objects should be in certificates.hypershift.openshift.io/v1alpha1

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    always

Steps to Reproduce:

    1. oc api-resources
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3482

Task HOSTEDCP-1305: Refactor calls to get HostedCluster namespace

View the Description View the linked PRs

General code cleanup and improvement

https://github.com/openshift/hypershift/pull/2619

Bug OCPBUGS-19527: [bz-monitoring][invariant] alert/Watchdog must have no gaps or changes

View the Description View the linked PRs

Description of problem:

Hi observing below testcase failure in 4.14 powerVS continuously ,which causes success rate of prod CI less.

[bz-XXXitoring][invariant] alert/Watchdog must have no gaps or changes

There is no error message apart from the following line, couldn't gather any more related logs

{ Watchdog alert not found}

https://github.com/openshift/origin/pull/28323

Bug OCPBUGS-26005: Bump to kubernetes 1.28.5

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.5:

Changelog:
v1.28.4: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1284

https://github.com/openshift/kubernetes/pull/1837

Bug OCPBUGS-19050: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1952

Bug OCPBUGS-19097: Update 4.15 openshift-proxy-pull-test image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-config-operator/pull/3918

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-config-operator/pull/3918

Bug OCPBUGS-36451: [4.15] Can't install operator on 4.15 after uninstalling it on a prior version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31073~~. The following is the description of the original issue:
—
Description of problem

I had a version of MTC installed on my cluster when it was running a prior version. I had deleted it some time ago, long before upgrading to 4.15. I upgraded it to 4.15 and needed to reinstall to take a look at something, but found the operator would not install.

I originally tried with 4.15.0, but on failure upgraded to 4.15.3 to see if it would resolve the issue, but it did no.

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.15.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.3
Kubernetes Version: v1.28.7+6e2789b

How reproducible:

Always as far as I can tell. I have at least two clusters where I was able to reproduce it.

Steps to Reproduce:

    1. Install Migration Toolkit for Containers on OpenShift 4.14
    2. Uninstall it
    3. Upgrade to 4.15
    4. Try to install it again

Actual results:

The operator never installs. UI just shows "Upgrade status: Unkown Failure"

Observe the catalog operator logs and note errors like:
E0319 21:35:57.350591       1 queueinformer_operator.go:319] sync {"update" "openshift-migration"} failed: bundle unpacking failed with an error: [roles.rbac.authorization.k8s.io "c1572438804f004fb90b6768c203caad96c47331f7ecc4f68c3cf6b43b0acfd" already exists, roles.rbac.authorization.k8s.io "724788f6766aa5ba19b24ef4619b6a8e8e856b8b5fb96e1380f0d3f5b9dcb7a" already exists]

If you delete the roles, you'll get the same for rolebindings, then the same for jobs.batch, and then configmaps.

Expected results:

Operator just installs

Additional info:

If you clean up all these resources the operator will install successfully.

https://github.com/openshift/operator-framework-olm/pull/810

Bug OCPBUGS-42943: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2342

Vulnerability OCPBUGS-46641: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/227

Bug OCPBUGS-16483: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/590

Bug OCPBUGS-26412: CPO Failing to delete default worker security group, but not reflected in HostedCluster status condition

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23362~~. The following is the description of the original issue:
—

A hostedcluster/hostedcontrolplane were stuck uninstalling. Inspecting the CPO logs, it showed that

"error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"

Unfortunately, I do not have enough access to the AWS account to inspect this security group, though I know it is the default worker security group because it's recorded in the hostedcluster .status.platform.aws.defaultWorkerSecurityGroupID

Version-Release number of selected component (if applicable):

4.14.1

How reproducible:

I haven't tried to reproduce it yet, but can do so and update this ticket when I do. My theory is:

Steps to Reproduce:

1. Create an AWS HostedCluster, wait for it to create/populate defaultWorkerSecurityGroupID
2. Attach the defaultWorkerSecurityGroupID to anything else in the AWS account unrelated to the HCP cluster
3. Attempt to delete the HostedCluster

Actual results:

CPO logs:
"error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"

HostedCluster Status Condition
  - lastTransitionTime: "2023-11-09T22:18:09Z"
    message: ""
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: CloudResourcesDestroyed

Expected results:

I would expect that the CloudResourcesDestroyed status condition on the hostedcluster would reflect this security group as holding up the deletion instead of having to parse through logs.

Additional info:

https://github.com/openshift/hypershift/pull/3381

Bug OCPBUGS-29153: Cluster Baremetal operator should use a leader lock

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25766~~. The following is the description of the original issue:
—

Description of problem:

Seen in this 4.15 to 4.16 CI run:

: [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers	0s
{  event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 26 times
event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 51 times}

The operator recovered, and the update completed, but it's still probably worth cleaning up whatever's happening to avoid alarming anyone.

Version-Release number of selected component (if applicable):

Seems like all recent CI runs that match this string touch 4.15, 4.16, or development branches:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=Back-off+restarting+failed+container+cluster-baremetal-operator+in+pod+cluster-baremetal-operator' | grep 'failures match'
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade-local-gateway (all) - 11 runs, 36% failed, 25% of failures match = 9% impact
periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 15 runs, 20% failed, 33% of failures match = 7% impact
pull-ci-openshift-kubernetes-master-e2e-aws-ovn-downgrade (all) - 3 runs, 67% failed, 50% of failures match = 33% impact
periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 15 runs, 27% failed, 25% of failures match = 7% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-azure-sdn-upgrade (all) - 32 runs, 91% failed, 7% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 40 runs, 25% failed, 20% of failures match = 5% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 3 runs, 33% failed, 100% of failures match = 33% impact
pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-ovn-upgrade-out-of-change (all) - 4 runs, 25% failed, 100% of failures match = 25% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade (all) - 40 runs, 8% failed, 33% of failures match = 3% impact
pull-ci-openshift-azure-file-csi-driver-operator-main-e2e-azure-ovn-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-okd-4.15-e2e-aws-ovn-upgrade (all) - 7 runs, 43% failed, 33% of failures match = 14% impact
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade (all) - 10 runs, 30% failed, 33% of failures match = 10% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-gcp-ovn-arm64 (all) - 6 runs, 33% failed, 50% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 11 runs, 18% failed, 50% of failures match = 9% impact

How reproducible:


Looks like ~8% impact.

h2. Steps to Reproduce:

1.  Run ~20 exposed job types.
2. Check for {{: [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers}} failures with {{Back-off restarting failed container cluster-baremetal-operator}} messages.

h2. Actual results:

~8% impact.

h2. Expected results:

~0% impact.

h2. Additional info:

Dropping into Loki for the run I'd picked:

{code:none}
{invoker="openshift-internal-ci/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1737335551998038016"} | unpack | pod="cluster-baremetal-operator-574577fbcb-z8nd4" container="cluster-baremetal-operator" |~ "220 06:0"

includes:

E1220 06:04:18.794548       1 main.go:131] "unable to create controller" err="unable to put \"baremetal\" ClusterOperator in Available state: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"baremetal\": the object has been modified; please apply your changes to the latest version and try again" controller="Provisioning"
I1220 06:05:40.753364       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080"
I1220 06:05:40.766200       1 webhook.go:104] WebhookDependenciesReady: everything ready for webhooks
I1220 06:05:40.780426       1 clusteroperator.go:217] "new CO status" reason="WaitingForProvisioningCR" processMessage="" message="Waiting for Provisioning CR on BareMetal Platform"
E1220 06:05:40.795555       1 main.go:131] "unable to create controller" err="unable to put \"baremetal\" ClusterOperator in Available state: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"baremetal\": the object has been modified; please apply your changes to the latest version and try again" controller="Provisioning"
I1220 06:08:21.730591       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080"
I1220 06:08:21.747466       1 webhook.go:104] WebhookDependenciesReady: everything ready for webhooks
I1220 06:08:21.768138       1 clusteroperator.go:217] "new CO status" reason="WaitingForProvisioningCR" processMessage="" message="Waiting for Provisioning CR on BareMetal Platform"
E1220 06:08:21.781058       1 main.go:131] "unable to create controller" err="unable to put \"baremetal\" ClusterOperator in Available state: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"baremetal\": the object has been modified; please apply your changes to the latest version and try again" controller="Provisioning"

So some kind of ClusterOperator-modification race?

https://github.com/openshift/cluster-baremetal-operator/pull/402

Bug OCPBUGS-34580: HCP: imagesStreams on hosted-clusters pointing to image on private registries are failing due to tls verification although the registry is correctly trusted

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34390~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-31446~~. The following is the description of the original issue:
—
Description of problem:

    imagesStreams on hosted-clusters pointing to image on private registries are failing due to tls verification although the registry is correctly trusted.

example:
$ oc create namespace e2e-test

$ oc --namespace=e2e-test tag virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ busybox:latest

$ oc --namespace=e2e-test  set image-lookup busybox

stirabos@t14s:~$ oc get imagestream -n e2e-test 
NAME      IMAGE REPOSITORY                                                    TAGS     UPDATED
busybox   image-registry.openshift-image-registry.svc:5000/e2e-test/busybox   latest   
stirabos@t14s:~$ oc get imagestream -n e2e-test busybox -o yaml
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    openshift.io/image.dockerRepositoryCheck: "2024-03-27T12:43:56Z"
  creationTimestamp: "2024-03-27T12:43:56Z"
  generation: 3
  name: busybox
  namespace: e2e-test
  resourceVersion: "49021"
  uid: 847281e7-e307-4057-ab57-ccb7bfc49327
spec:
  lookupPolicy:
    local: true
  tags:
  - annotations: null
    from:
      kind: DockerImage
      name: virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ
    generation: 2
    importPolicy:
      importMode: Legacy
    name: latest
    referencePolicy:
      type: Source
status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/e2e-test/busybox
  tags:
  - conditions:
    - generation: 2
      lastTransitionTime: "2024-03-27T12:43:56Z"
      message: 'Internal error occurred: virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ:
        Get "https://virthost.ostest.test.metalkube.org:5000/v2/": tls: failed to
        verify certificate: x509: certificate signed by unknown authority'
      reason: InternalError
      status: "False"
      type: ImportSuccess
    items: null
    tag: latest

While image virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ can be properly consumed if directly used for a container on a pod on the same cluster.

user-ca-bundle config map is properly propagated from hypershift:

$ oc get configmap -n openshift-config user-ca-bundle
NAME             DATA   AGE
user-ca-bundle   1      3h32m

$ openssl x509 -text -noout -in <(oc get cm -n openshift-config user-ca-bundle -o json | jq -r '.data["ca-bundle.crt"]')
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            11:3f:15:23:97:ac:c2:d5:f6:54:06:1a:9a:22:f2:b5:bf:0c:5a:00
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, ST = NC, L = Raleigh, O = Test Company, OU = Testing, CN = test.metalkube.org
        Validity
            Not Before: Mar 27 08:28:07 2024 GMT
            Not After : Mar 27 08:28:07 2025 GMT
        Subject: C = US, ST = NC, L = Raleigh, O = Test Company, OU = Testing, CN = test.metalkube.org
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:c1:49:1f:18:d2:12:49:da:76:05:36:3e:6b:1a:
                    82:a7:22:0d:be:f5:66:dc:97:44:c7:ca:31:4d:f3:
                    7f:0a:d3:de:df:f2:b6:23:f9:09:b1:7a:3f:19:cc:
                    22:c9:70:90:30:a7:eb:49:28:b6:d1:e0:5a:14:42:
                    02:93:c4:ac:cc:da:b1:5a:8f:9c:af:60:19:1a:e3:
                    b1:34:c2:b6:2f:78:ec:9f:fe:38:75:91:0f:a6:09:
                    78:28:36:9e:ab:1c:0d:22:74:d5:52:fe:0a:fc:db:
                    5a:7c:30:9d:84:7d:f7:6a:46:fe:c5:6f:50:86:98:
                    cc:35:1f:6c:b0:e6:21:fc:a5:87:da:81:2c:7b:e4:
                    4e:20:bb:35:cc:6c:81:db:b3:95:51:cf:ff:9f:ed:
                    00:78:28:1d:cd:41:1d:03:45:26:45:d4:36:98:bd:
                    bf:5c:78:0f:c7:23:5c:44:5d:a6:ae:85:2b:99:25:
                    ae:c0:73:b1:d2:87:64:3e:15:31:8e:63:dc:be:5c:
                    ed:e3:fe:97:29:10:fb:5c:43:2f:3a:c2:e4:1a:af:
                    80:18:55:bc:40:0f:12:26:6b:f9:41:da:e2:a4:6b:
                    fd:66:ae:bc:9c:e8:2a:5a:3b:e7:2b:fc:a6:f6:e2:
                    73:9b:79:ee:0c:86:97:ab:2e:cc:47:e7:1b:e5:be:
                    0c:9f
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints: 
                CA:TRUE, pathlen:0
            X509v3 Subject Alternative Name: 
                DNS:virthost.ostest.test.metalkube.org
    Signature Algorithm: sha256WithRSAEncryption
    Signature Value:
        58:d2:da:f9:2a:c0:2d:7a:d9:9f:1f:97:e1:fd:36:a7:32:d3:
        ab:3f:15:cd:68:8e:be:7c:11:ec:5e:45:50:c4:ec:d8:d3:c5:
        22:3c:79:5a:01:63:9e:5a:bd:02:0c:87:69:c6:ff:a2:38:05:
        21:e4:96:78:40:db:52:c8:08:44:9a:96:6a:70:1e:1e:ae:74:
        e2:2d:fa:76:86:4d:06:b1:cf:d5:5c:94:40:17:5d:9f:84:2c:
        8b:65:ca:48:2b:2d:00:3b:42:b9:3c:08:1b:c5:5d:d2:9c:e9:
        bc:df:9a:7c:db:30:07:be:33:2a:bb:2d:69:72:b8:dc:f4:0e:
        62:08:49:93:d5:0f:db:35:98:18:df:e6:87:11:ce:65:5b:dc:
        6f:f7:f0:1c:b0:23:40:1e:e3:45:17:04:1a:bc:d1:57:d7:0d:
        c8:26:6d:99:fe:28:52:fe:ba:6a:a1:b8:d1:d1:50:a9:fa:03:
        bb:b7:ad:0e:82:d2:e8:34:91:fa:b4:f9:81:d1:9b:6d:0f:a3:
        8c:9d:c4:4a:1e:08:26:71:b9:1a:e8:49:96:0f:db:5c:76:db:
        ae:c7:6b:2e:ea:89:5d:7f:a3:ba:ea:7e:12:97:12:bc:1e:7f:
        49:09:d4:08:a6:4a:34:73:51:9e:a2:9a:ec:2a:f7:fc:b5:5c:
        f8:20:95:ad

This is probably a side effect of https://issues.redhat.com/browse/RFE-3093 - imagestream to trust CA added during the installation, that is also affecting imagestreams that requires a CA cert injected by hypershift during hosted-cluster creation in the disconnected use case.

Version-Release number of selected component (if applicable):

    v4.14, v4.15, v4.16

How reproducible:

    100%

Steps to Reproduce:

once connected to a disconnected hosted cluster, create an image stream pointing to an image on the internal mirror registry:
    1. $ oc --namespace=e2e-test tag virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ busybox:latest

    2. $ oc --namespace=e2e-test  set image-lookup busybox
    3. then check the image stream

Actual results:

    status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/e2e-test/busybox
  tags:
  - conditions:
    - generation: 2
      lastTransitionTime: "2024-03-27T12:43:56Z"
      message: 'Internal error occurred: virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ:
        Get "https://virthost.ostest.test.metalkube.org:5000/v2/": tls: failed to
        verify certificate: x509: certificate signed by unknown authority'

although the same image can be directly consumed by a pod on the same cluster

Expected results:

    status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/e2e-test/busybox
  tags:
  - conditions:
    - generation: 8
      lastTransitionTime: "2024-03-27T13:30:46Z"
      message: dockerimage.image.openshift.io "virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ"
        not found
      reason: NotFound
      status: "False"
      type: ImportSuccess

Additional info:

    This is probably a side effect of https://issues.redhat.com/browse/RFE-3093

Marking the imagestream as:
    importPolicy:
      importMode: Legacy
      insecure: true
is enough to workaround this.

https://github.com/openshift/hypershift/pull/4211

Bug OCPBUGS-21735: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-apiserver/pull/396

Bug OCPBUGS-25192: [azure] bootstrap failed to be provisioned when vm type is set to Standard_NP10s

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7827

Bug OCPBUGS-5755: GCP XPN private cluster install attempts to add masters to k8s-ig-xxxx instance groups

View the Description View the linked PRs

Description of problem:

Attempting to perform a GCP XPN internal cluster installation, the install fails when the master nodes are added to a second [internal] instance group (k8s-ig-xxxx).

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. The following install config was used:

additionalTrustBundlePolicy: Proxyonly
apiVersion: v1
baseDomain: installer.gcp.devcluster.openshift.com
credentialsMode: Passthrough
featureSet: TechPreviewNoUpgrade
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: bbarbach-xpn
networking:
  clusterNetwork:
  - cidr: 10.124.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.128.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  gcp:
    projectID: openshift-installer-shared-vpc
    region: us-central1
    network: bbarbach-internal-vpc
    computeSubnet: bbarbach-internal-vpc
    controlPlaneSubnet: bbarbach-internal-vpc
    networkProjectID: openshift-dev-installer
publish: Internal

2. This is a shared VPC install so the service and host projects need to be used in the install-config above.

3. Set the release image to 4.13-nightly

4. openshift-install create cluster --log-level=DEBUG

Actual results:

ERROR                                              
ERROR Error: Error waiting for Updating RegionBackendService: Validation failed for instance 'projects/openshift-installer-shared-vpc/zones/us-central1-a/instances/bbarbach-xpn-4t8zl-master-0': instance may belong to at most one load-balanced instance group. 
ERROR                                              
ERROR                                              
ERROR   with google_compute_region_backend_service.api_internal, 
ERROR   on main.tf line 13, in resource "google_compute_region_backend_service" "api_internal": 
ERROR   13: resource "google_compute_region_backend_service" "api_internal" { 
ERROR                                              
FATAL failed disabling bootstrap load balancing: failed to apply Terraform: exit status 1 
FATAL                                              
FATAL Error: Error waiting for Updating RegionBackendService: Validation failed for instance 'projects/openshift-installer-shared-vpc/zones/us-central1-a/instances/bbarbach-xpn-4t8zl-master-0': instance may belong to at most one load-balanced instance group. 
FATAL                                              
FATAL                                              
FATAL   with google_compute_region_backend_service.api_internal, 
FATAL   on main.tf line 13, in resource "google_compute_region_backend_service" "api_internal": 
FATAL   13: resource "google_compute_region_backend_service" "api_internal" { 
FATAL                                              
FATAL

Expected results:

Successful install

Additional info:

The normal GCP internal cluster installation succeeds. Checking the instance groups, the internal cluster creates the k8s-ig-xxxx instance groups where the workers are added to each respective group. The masters are NOT added to the instance groups. The failure during the xpn install occurs because these masters are added to the instance groups.

https://github.com/openshift/cloud-provider-gcp/pull/35

Story MGMT-15405: Publish the static network download URL to the infraenv debug info

View the Description View the linked PRs

~~MGMT-11443~~ added an API for users to download the rendered nmconnection files used in the ISO, but when using the kube-api that URL isn't given to the user.

This should be added to the infrenv status in the debug info section

https://github.com/openshift/assisted-service/pull/5638

Bug OCPBUGS-19429: oc-mirror failed with a ImageSetConfiguration yaml containing two EUS channels

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/757

Bug OCPBUGS-21635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/node_exporter/pull/133

Bug OCPBUGS-24695: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1620

Bug OCPBUGS-41809: [4.15] AdditionalTrustedCA in ImageConfig is not wired correctly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39293~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-39225~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38474. The following is the description of the original issue:
—
Description of problem:

    AdditionalTrustedCA is not wired correctly so the configmap is not found my its operator. This feature is meant to be exposed by XCMSTRAT-590, but at the moment it seems to be broken

Version-Release number of selected component (if applicable):

    4.16.5

How reproducible:

    Always

Steps to Reproduce:

1. Create a configmap containing a registry and PEM cert, like https://github.com/openshift/openshift-docs/blob/ef75d891786604e78dcc3bcb98ac6f1b3a75dad1/modules/images-configuration-cas.adoc#L17  
2. Refer to it in .spec.configuration.image.additionalTrustedCA.name     
3. image-registry-config-operator is not able to find the cm and the CO is degraded

Actual results:

   CO is degraded

Expected results:

    certs are used.

Additional info:

I think we may miss a copy of the configmap from the cluster NS to the target ns. It should be also deleted if it is deleted.

 % oc get hc -n ocm-adecorte-2d525fsstsvtbv1h8qss14pkv171qhdd -o jsonpath="{.items[0].spec.configuration.image.additionalTrustedCA}" | jq
{
  "name": "registry-additional-ca-q9f6x5i4"
}

% oc get cm -n ocm-adecorte-2d525fsstsvtbv1h8qss14pkv171qhdd registry-additional-ca-q9f6x5i4
NAME                              DATA   AGE
registry-additional-ca-q9f6x5i4   1      16m

logs of cluster-image-registry operator

E0814 13:22:32.586416       1 imageregistrycertificates.go:141] ImageRegistryCertificatesController: unable to sync: failed to update object *v1.ConfigMap, Namespace=openshift-image-registry, Name=image-registry-certificates: image-registry-certificates: configmap "registry-additional-ca-q9f6x5i4" not found, requeuing

CO is degraded

% oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console                                    4.16.5    True        False         False      3h58m
csi-snapshot-controller                    4.16.5    True        False         False      4h11m
dns                                        4.16.5    True        False         False      3h58m
image-registry                             4.16.5    True        False         True       3h58m   ImageRegistryCertificatesControllerDegraded: failed to update object *v1.ConfigMap, Namespace=openshift-image-registry, Name=image-registry-certificates: image-registry-certificates: configmap "registry-additional-ca-q9f6x5i4" not found
ingress                                    4.16.5    True        False         False      3h59m
insights                                   4.16.5    True        False         False      4h
kube-apiserver                             4.16.5    True        False         False      4h11m
kube-controller-manager                    4.16.5    True        False         False      4h11m
kube-scheduler                             4.16.5    True        False         False      4h11m
kube-storage-version-migrator              4.16.5    True        False         False      166m
monitoring                                 4.16.5    True        False         False      3h55m

https://github.com/openshift/hypershift/pull/4706

Bug OCPBUGS-19674: Wrong port reported in HostedCluster .status.controlPlaneEndpoint.port

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

When using a route to expose the API server endpoint in a HostedCluster, the .status.controlPlaneEndpoint.port is reported as 6443 (the internal port) instead of 443 which is the port that is externally exposed via the route.

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster with a custom dns name using route as the strategy
3. Inspect .status.controlPlaneEndpoint

Actual results:

It has 6443 as the port

Expected results:

It has 443 as the port

Additional info:

https://github.com/openshift/hypershift/pull/3037

Bug OCPBUGS-29155: Redundant reconciles by CCO's status controller

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28388~~. The following is the description of the original issue:
—
Description of problem:

The status controller of CCO reconciles 500+ times/h on average on a resting 6-node mint-mode OCP cluster on AWS.

Steps to Reproduce:

1. Install a 6-node mint-mode OCP cluster on AWS
2. Do nothing with it and wait for a couple of hours
3. Plot the following metric in the metrics dashboard of OCP console:
rate(controller_runtime_reconcile_total{controller="status"}[1h]) * 3600

Actual results:

500+ reconciles/h on a resting cluster

Expected results:

12-50 reconciles/h on a resting cluster
Note: the reconcile() function always requeues after 5min so the theoretical minimum is 12 reconciles/h

https://github.com/openshift/cloud-credential-operator/pull/675

Bug OCPBUGS-32900: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/696

Bug OCPBUGS-38065: [release-4.15] LDAP communication going through HTTP(S) proxy

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38062~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-37052~~. The following is the description of the original issue:
—
Description of problem:

This is a followup of https://issues.redhat.com/browse/OCPBUGS-34996, in which comments led us to better understand the issue customers are facing.

LDAP IDP traffic from the oauth pod seems to be going through the configured HTTP(S) proxy, while it should not due to it being a different protocol. This results in customers adding the ldap endpoint to their no-proxy config to circumvent the issue.

Version-Release number of selected component (if applicable):

4.15.11

How reproducible:

Steps to Reproduce:

 (From the customer)   
    1. Configure LDAP IDP
    2. Configure Proxy
    3. LDAP IDP communication from the control plane oauth pod goes through proxy instead of going to the ldap endpoint directly

Actual results:

    LDAP IDP communication from the control plane oauth pod goes through proxy

Expected results:

    LDAP IDP communication from the control plane oauth pod should go to the ldap endpoint directly using the ldap protocol, it should not go through the proxy settings

Additional info:

For more information, see linked tickets.

Bug OCPBUGS-30286: oc adm catalog mirror does not work on windows

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30212~~. The following is the description of the original issue:
—
The command does not honor Windows path separators.

Related to https://issues.redhat.com//browse/OCPBUGS-28864 (access restricted and not publicly visible). This report serves as a target issue for the fix and its backport to older OCP versions. Please see more details in https://issues.redhat.com//browse/OCPBUGS-28864.

https://github.com/openshift/oc/pull/1698

Bug OCPBUGS-41991: Values entered into the Instantiate Template form are automatically cleared

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38911~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-38412~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-32773. The following is the description of the original issue:
—
Description of problem:

In the OpenShift WebConsole, when using the Instantiate Template screen, the values entered into the form are automatically cleared.

This issue occurs for users with developer roles who do not have administrator privileges, but does not occur for users with the cluster-admin cluster role. 


Additionally, using the developer tools of the web browser, I observed the following console logs when the values were cleared:


https://console-openshift-console.apps.mmatsuta-blue.apac.aws.cee.support/api/prometheus/api/v1/rules 403 (Forbidden)
https://console-openshift-console.apps.mmatsuta-blue.apac.aws.cee.support/api/alertmanager/api/v2/silences 403 (Forbidden)


It appears that a script attempting to fetch information periodically from PrometheusRule and Alertmanager's silences encounters a 403 error due to insufficient permissions, which causes the script to halt and the values in the form to be reset and cleared.


This bug prevents users from successfully creating instances from templates in the WebConsole.

Version-Release number of selected component (if applicable):

4.15 4.14

How reproducible:

YES

Steps to Reproduce:

1. Log in with a non-administrator account.
2. Select a template from the developer catalog and click on Instantiate Template.
3. Enter values into the initially empty form.
4. Wait for several seconds, and the entered values will disappear.

Actual results:

Entered values are disappeard

Expected results:

Entered values are appeard

Additional info:

I could not find the appropriate component to report this issue. I reluctantly chose Dev Console, but please adjust it to the correct component.

https://github.com/openshift/console/pull/14294

Bug OCPBUGS-10652: hybrid overlay VXLAN traffic should skip conntrack like GENEVE does

View the Description View the linked PRs

All our tunnel traffic, whether GENEVE or VXLAN, should skip conntrack in the host network namespace because it's pointless to track it. It's UDP and it's point-to-point; there are no connections to care about.

We already skip the GENEVE traffic in OVN-K and the VXLAN traffic in SDN, but we aren't skipping the VXLAN traffic that Hybrid Overlay and ICNIv1 generate.

CNO's ovnkube-node YAML should add a couple lines to, if Hybrid Overlay is enabled, -j NOTRACK for .OVNHybridOverlayVXLANPort. Note that .OVNHybridOverlayVXLANPort will be empty if the default VXLAN port is used, so we'd need a bit of if/else logical to -j NOTRACK the default port if .OVNHybridOverlayVXLANPort is empty.

https://github.com/openshift/cluster-network-operator/pull/1819

Bug OCPBUGS-23768: After PatternFly5 update: Navigation: Extra space after divider

View the Description View the linked PRs

Issue 33 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

In left navigation menu in dev perspective, after divider, there is extra space.

Screenshot: https://drive.google.com/file/d/1ROcHXCLmPPhr30nGTUblMTL-JQqKEsCY/view?usp=drive_link

https://github.com/openshift/console/pull/13362

Bug OCPBUGS-24091: Update 4.15 ose-azure-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-azure/pull/291

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-azure/pull/291

Bug OCPBUGS-24140: Update 4.15 ose-cluster-control-plane-machine-set-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/266

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/266

Bug OCPBUGS-28537: [UI] in Openshift-storage-client namespace, 'RWX' access mode RBD PVC with Volume mode 'Filesystem' is not blocked, it attempt to create and stuck in pending state

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25881~~. The following is the description of the original issue:
—
Description of problem:

Copying BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2250911 on OCP side (as fix is needed on console).

[UI] In Openshift-storage-client namespace, 'RWX' access  mode RBD PVC with volumemode'Filesystem' can be created from Client. However, this is an invalid combination for RBD PVC creation From ODF Operator UI of other Platforms. Volume mode is not available when Cepfrbd storageclass and RWX access mode selected on other platform. This is visible in client operator view.  This attempt to create PVc and stuck in pending state

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Deploy Provider Client setup.
2. From UI Create PVC, select storage class : ceph-rbd, RWX access mode, check filemode : in case of this bug 'Filesystem' and 'block' volume mode is visible on UI, select volumemode: Filesystem and create the PVC.

Actual results:

PVC Created and stuck in pending status. 
PVC event shows error like:
 Generated from openshift-storage-client.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6d9dcb9fc7-vjj22_2bd4ede5-9418-4c8e-80ae-169b5cb4fa8012 times in the last 13 minutes
failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = multi node access modes are only supported on rbd `block` type volumes

Expected results:

Volumemode should not be visible on page when PVC with RWX access mode and RBD storage class is selected.

Additional info:

Screenshots are attached to the BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2250911

https://bugzilla.redhat.com/show_bug.cgi?id=2250911#c3

https://github.com/openshift/console/pull/13548

Bug OCPBUGS-35543: IPv6 ingress VIP not configured in keepalived on vSphere Dual-stack

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35486~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34706~~. The following is the description of the original issue:
—
Description of problem:

Regression of ~~OCPBUGS-12739~~

level=warning msg="Couldn't unmarshall OVN annotations: ''. Skipping." err="unexpected end of JSON input"

Upstream OVN changed the node annotation from "k8s.ovn.org/host-addresses" to "k8s.ovn.org/host-cidrs" in OpenShift 4.14

https://github.com/ovn-org/ovn-kubernetes/pull/3915

We might need to fix baremetal-runtimecfg

diff --git a/pkg/config/node.go b/pkg/config/node.go
index 491dd4f..078ad77 100644
--- a/pkg/config/node.go
+++ b/pkg/config/node.go
@@ -367,10 +367,10 @@ func getNodeIpForRequestedIpStack(node v1.Node, filterIps []string, machineNetwo
                log.Debugf("For node %s can't find address using NodeInternalIP. Fallback to OVN annotation.", node.Name)
 
                var ovnHostAddresses []string
-               if err := json.Unmarshal([]byte(node.Annotations["k8s.ovn.org/host-addresses"]), &ovnHostAddresses); err != nil {
+               if err := json.Unmarshal([]byte(node.Annotations["k8s.ovn.org/host-cidrs"]), &ovnHostAddresses); err != nil {
                        log.WithFields(logrus.Fields{
                                "err": err,
-                       }).Warnf("Couldn't unmarshall OVN annotations: '%s'. Skipping.", node.Annotations["k8s.ovn.org/host-addresses"])
+                       }).Warnf("Couldn't unmarshall OVN annotations: '%s'. Skipping.", node.Annotations["k8s.ovn.org/host-cidrs"])
                }

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-30-130713

How reproducible:

Frequent

Steps to Reproduce:

    1. Deploy vsphere IPv4 cluster
    2. Convert to Dualstack IPv4/IPv6
    3. Add machine network and IPv6 apiServerInternalIPs and ingressIPs
    4. Check keepalived.conf
for f in $(oc get pods -n openshift-vsphere-infra -l app=vsphere-infra-vrrp --no-headers -o custom-columns=N:.metadata.name  ) ; do oc -n openshift-vsphere-infra exec -c keepalived $f -- cat /etc/keepalived/keepalived.conf | tee $f-keepalived.conf ; done

Actual results:

IPv6 VIP is not in keepalived.conf

Expected results:
Something like:

vrrp_instance rbrattai_INGRESS_1 {
    state BACKUP
    interface br-ex
    virtual_router_id 129
    priority 20
    advert_int 1

    unicast_src_ip fd65:a1a8:60ad:271c::cc
    unicast_peer {
        fd65:a1a8:60ad:271c:9af:16a9:cb4f:d75c
        fd65:a1a8:60ad:271c:86ec:8104:1bc2:ab12
        fd65:a1a8:60ad:271c:5f93:c9cf:95f:9a6d
        fd65:a1a8:60ad:271c:bb4:de9e:6d58:89e7
        fd65:a1a8:60ad:271c:3072:2921:890:9263
    }
...
    virtual_ipaddress {
        fd65:a1a8:60ad:271c::1117/128
    }
...
}

https://github.com/openshift/baremetal-runtimecfg/pull/319

Bug OCPBUGS-37608: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-monitoring-operator)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the cluster-monitoring-operator.

Namespaces with workloads that need pinning:

openshift-monitoring
openshift-user-workload-monitoring

https://github.com/openshift/cluster-monitoring-operator/pull/2420

Bug OCPBUGS-37695: Cannot reach to kubernetes.default.svc.cluster.local from workers of Hosted Cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37645~~. The following is the description of the original issue:
—
Description of problem

Debug into one of the worker nodes on the hosted cluster:

oc debug node/ip-10-1-0-97.ca-central-1.compute.internal

nslookup kubernetes.default.svc.cluster.local
Server:         10.1.0.2
Address:        10.1.0.2#53

** server can't find kubernetes.default.svc.cluster.local: NXDOMAIN

curl -k https://172.30.0.1:443/readyz
curl: (7) Failed to connect to 172.30.0.1 port 443: Connection refused

sh-5.1# curl -k https://172.20.0.1:443/readyz
ok

Version-Release number of selected component (if applicable):

4.15.20

Steps to Reproduce:

Unknown

Actual results:

Pods on a hosted cluster's workers unable to connect to their internal kube apiserver via the service IP.

Expected results:

Pods on a hosted cluster's workers have connectivity to their kube apiserver via the service IP.

Additional info:

Checked the "Konnectivity server" logs on Dynatrace and found the error below occurs repeatedly

E0724 01:02:00.223151       1 server.go:895] "DIAL_RSP contains failure" err="dial tcp 172.30.176.80:8443: i/o timeout" dialID=8375732890105363305 agentID="1eab211f-6ea1-46ea-bc78-14d75d6ba325"

E0724 01:02:00.223482       1 tunnel.go:150] "Received failure on connection" err="read tcp 10.128.17.15:8090->10.128.82.107:52462: use of closed network connection"

Looks the konnectivity server is trying to establish a connection to 172.30.176.80:8443 but is timing out
also the 2nd error indicates that an existing network connection was closed unexpectedly

Relevant OHSS Ticket: https://issues.redhat.com/browse/OHSS-36053

Slack thread discussion

https://github.com/openshift/hypershift/pull/4441

Bug OCPBUGS-22956: When build capability is disabled, ConfigObserver controller does not run

View the Description View the linked PRs

Description of problem:

ConfigObserver controller waits until the all given informers are marked as synced including the build informer. However, when build capability is disabled, that causes ConfigObserver's blockage and never runs.

This is likely only happening on 4.15 because capability watching mechanism was bound to ConfigObserver in 4.15.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Launch cluster-bot cluster via "launch 4.15.0-0.nightly-2023-11-05-192858,openshift/cluster-openshift-controller-manager-operator#315 no-capabilities"

Steps to Reproduce:

1.
2.
3.

Actual results:

ConfigObserver controller stuck in failure

Expected results:

ConfigObserver controller runs and successfully clear all deployer service accounts when deploymentconfig capability is disabled.

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/315

Bug OCPBUGS-23094: [gcp] IPI or UPI private cluster on GCP failed due to ingress LB stuck in Pending

View the Description View the linked PRs

Description of problem:

IPI or UPI installing a private cluster on GCP always fail, with the cluster operator ingress telling LoadBalancerPending and CanaryChecksRepetitiveFailures

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-07-233748

How reproducible:

Always

Steps to Reproduce:

1. create a private cluster on GCP, either IPI or UPI

Actual results:

The installation failed, with ingress operator degraded.

Expected results:

The installation can succeed.

Additional info:

Some PROW CI tests: 

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-arm64-nightly-gcp-ipi-private-f28-longduration-cloud/1722352860160593920 (Must-gather https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-arm64-nightly-gcp-ipi-private-f28-longduration-cloud/1722352860160593920/artifacts/gcp-ipi-private-f28-longduration-cloud/gather-must-gather/artifacts/must-gather.tar)

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-gcp-ipi-xpn-private-f28/1722176483704705024

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-gcp-ipi-private-fips-f6-disasterrecovery/1722066338567950336


FYI QE Flexy-install jobs: IPI Flexy-install/245364/, UPI Flexy-install/245524/

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          14h     Unable to apply 4.15.0-0.nightly-2023-11-07-233748: some cluster operators are not available
$ oc get nodes
NAME                                                           STATUS   ROLES                  AGE   VERSION
jiwei-1108-priv-kx7b4-master-0.c.openshift-qe.internal         Ready    control-plane,master   14h   v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-master-1.c.openshift-qe.internal         Ready    control-plane,master   14h   v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-master-2.c.openshift-qe.internal         Ready    control-plane,master   14h   v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-worker-a-l28pl.c.openshift-qe.internal   Ready    worker                 14h   v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-worker-b-84bx5.c.openshift-qe.internal   Ready    worker                 14h   v1.28.3+4cbdd29
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2023-11-07-233748   False       False         True       14h     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1108-priv.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1108-priv.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
baremetal                                  4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
cloud-controller-manager                   4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
cloud-credential                           4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
cluster-autoscaler                         4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
config-operator                            4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
console                                    4.15.0-0.nightly-2023-11-07-233748   False       True          False      14h     DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
csi-snapshot-controller                    4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
dns                                        4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
etcd                                       4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
image-registry                             4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
ingress                                                                         False       True          True       7h37m   The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
insights                                   4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
kube-apiserver                             4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
kube-controller-manager                    4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
kube-scheduler                             4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
kube-storage-version-migrator              4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
machine-api                                4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
machine-approver                           4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
machine-config                             4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
marketplace                                4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
monitoring                                 4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
network                                    4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
node-tuning                                4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
openshift-apiserver                        4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
openshift-controller-manager               4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
openshift-samples                          4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
operator-lifecycle-manager                 4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
operator-lifecycle-manager-catalog         4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
operator-lifecycle-manager-packageserver   4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
service-ca                                 4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
storage                                    4.15.0-0.nightly-2023-11-07-233748   True        False         False      14h     
$ oc describe co ingress
Name:         ingress
Namespace:    
Labels:       <none>
Annotations:  include.release.openshift.io/ibm-cloud-managed: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2023-11-08T10:38:15Z
  Generation:          1
  Owner References:
    API Version:     config.openshift.io/v1
    Controller:      true
    Kind:            ClusterVersion
    Name:            version
    UID:             dbaae892-1b6d-480d-a201-0549d0a3149d
  Resource Version:  172514
  UID:               3922a9fe-584f-458f-ac4f-b62b4842758e
Spec:
Status:
  Conditions:
    Last Transition Time:  2023-11-08T17:49:01Z
    Message:               The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
    Reason:                IngressUnavailable
    Status:                False
    Type:                  Available
    Last Transition Time:  2023-11-08T11:02:27Z
    Message:               Not all ingress controllers are available.
    Reason:                Reconciling
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2023-11-08T17:51:01Z
    Message:               The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
    Reason:                IngressDegraded
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2023-11-08T10:52:36Z
    Reason:                IngressControllersUpgradeable
    Status:                True
    Type:                  Upgradeable
    Last Transition Time:  2023-11-08T10:52:36Z
    Reason:                AsExpected
    Status:                False
    Type:                  EvaluationConditionsDetected
  Extension:               <nil>
  Related Objects:
    Group:      
    Name:       openshift-ingress-operator
    Resource:   namespaces
    Group:      operator.openshift.io
    Name:       
    Namespace:  openshift-ingress-operator
    Resource:   ingresscontrollers
    Group:      ingress.operator.openshift.io
    Name:       
    Namespace:  openshift-ingress-operator
    Resource:   dnsrecords
    Group:      
    Name:       openshift-ingress
    Resource:   namespaces
    Group:      
    Name:       openshift-ingress-canary
    Resource:   namespaces
Events:         <none>
$ oc get pods -n openshift-ingress-operator -o wide
NAME                                READY   STATUS    RESTARTS      AGE   IP            NODE                                                     NOMINATED NODE   READINESS GATES
ingress-operator-57c555c75b-gqbk6   2/2     Running   2 (14h ago)   14h   10.129.0.36   jiwei-1108-priv-kx7b4-master-1.c.openshift-qe.internal   <none>           <none>
$ oc -n openshift-ingress-operator logs ingress-operator-57c555c75b-gqbk6
...output omitted...
2023-11-08T10:56:53.715Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "1m0s", "error": "IngressController is degraded: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1: Some pods are not scheduled: Pod \"router-default-7c86c4f4b5-jsljz\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.. Pod \"router-default-7c86c4f4b5-pltz4\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.. Make sure you have sufficient worker nodes.), LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: INSTANCE_IN_MULTIPLE_LOAD_BALANCED_IGS - Validation failed for instance 'projects/openshift-qe/zones/us-central1-a/instances/jiwei-1108-priv-kx7b4-master-0': instance may belong to at most one load-balanced instance group.\nThe kube-controller-manager logs may contain more details.)"}
...output omitted...
2023-11-08T15:13:41.323Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: Resource 'projects/openshift-qe/zones/us-central1-b/instances/jiwei-1108-priv-kx7b4-worker-b-84bx5' is expected to be in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/jiwei-1108-priv-master-subnet' but is in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/jiwei-1108-priv-worker-subnet'., wrongSubnetwork\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
...output omitted...
$ 

Must-gather https://drive.google.com/file/d/1zwhJ4ga0-tQuRorha4XnUGUKbSTx1fx4/view?usp=drive_link

https://github.com/openshift/cloud-provider-gcp/pull/41

Bug OCPBUGS-27750: Autoscaler should scale-from zero MachineSets that declare taints

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27509~~. The following is the description of the original issue:
—

Description of problem

When a MachineAutoscaler references a currently-zero-Machine MachineSet that includes spec.template.spec.taints, the autoscaler fails to deserialize that MachineSet, which causes it to fail to autoscale that MachineSet. The autoscaler's deserialization logic should be improved to avoid failing on the presence of taints.

Version-Release number of selected component

Reproduced on 4.14.10 and 4.16.0-ec.1. Expected to be every release going back to at least 4.12, based on code inspection.

How reproducible

Always.

Steps to Reproduce

With a launch 4.14.10 gcp Cluster Bot cluster (logs):

$ oc adm upgrade
Cluster version is 4.14.10

Upstream: https://api.integration.openshift.com/api/upgrades_info/graph
Channel: candidate-4.14 (available channels: candidate-4.14, candidate-4.15)
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.
$ oc -n openshift-machine-api get machinesets.machine.openshift.io
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-s48f02k-72292-5z2hn-worker-a   1         1         1       1           29m
ci-ln-s48f02k-72292-5z2hn-worker-b   1         1         1       1           29m
ci-ln-s48f02k-72292-5z2hn-worker-c   1         1         1       1           29m
ci-ln-s48f02k-72292-5z2hn-worker-f   0         0                             29m

Pick that set with 0 nodes. They don't come with taints by default:

$ oc -n openshift-machine-api get -o json machineset.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f | jq '.spec.template.spec.taints'
null

So patch one in:

$ oc -n openshift-machine-api patch machineset.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f --type json -p '[{"op": "add", "path": "/spec/template/spec/taints", "value": [{"effect":"NoSchedule","key":"node-role.kubernetes.io/ci","value":"ci"}
]}]'
machineset.machine.openshift.io/ci-ln-s48f02k-72292-5z2hn-worker-f patched

And set up autoscaling:

$ cat cluster-autoscaler.yaml
apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec:
  maxNodeProvisionTime: 30m
  scaleDown:
    enabled: true
$ oc apply -f cluster-autoscaler.yaml 
clusterautoscaler.autoscaling.openshift.io/default created

I'm not all that familiar with autoscaling. Maybe the ClusterAutoscaler doesn't matter, and you need a MachineAutoscaler aimed at the chosen MachineSet?

$ cat machine-autoscaler.yaml 
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: test
  namespace: openshift-machine-api
spec:
  maxReplicas: 2
  minReplicas: 1
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: ci-ln-s48f02k-72292-5z2hn-worker-f
$ oc apply -f machine-autoscaler.yaml 
machineautoscaler.autoscaling.openshift.io/test created

Checking the autoscaler's logs:

$ oc -n openshift-machine-api logs -l k8s-app=cluster-autoscaler --tail -1 | grep taint
W0122 19:18:47.246369       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
W0122 19:18:58.474000       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
W0122 19:19:09.703748       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
W0122 19:19:20.929617       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
...

And the MachineSet is failing to scale:

$ oc -n openshift-machine-api get machinesets.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-s48f02k-72292-5z2hn-worker-f   0         0                             50m

While if I remove the taint:

$ oc -n openshift-machine-api patch machineset.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f --type json -p '[{"op": "remove", "path": "/spec/template/spec/taints"}]'
machineset.machine.openshift.io/ci-ln-s48f02k-72292-5z2hn-worker-f patched

The autoscaler... well, it's not scaling up new Machines like I'd expected, but at least it seems to have calmed down about the taint deserialization issue:

$ oc -n openshift-machine-api get machines.machine.openshift.io
NAME                                       PHASE     TYPE                REGION        ZONE            AGE
ci-ln-s48f02k-72292-5z2hn-master-0         Running   e2-custom-6-16384   us-central1   us-central1-a   53m
ci-ln-s48f02k-72292-5z2hn-master-1         Running   e2-custom-6-16384   us-central1   us-central1-b   53m
ci-ln-s48f02k-72292-5z2hn-master-2         Running   e2-custom-6-16384   us-central1   us-central1-c   53m
ci-ln-s48f02k-72292-5z2hn-worker-a-fwskf   Running   e2-standard-4       us-central1   us-central1-a   45m
ci-ln-s48f02k-72292-5z2hn-worker-b-qkwlt   Running   e2-standard-4       us-central1   us-central1-b   45m
ci-ln-s48f02k-72292-5z2hn-worker-c-rlw4m   Running   e2-standard-4       us-central1   us-central1-c   45m
$ oc -n openshift-machine-api get machinesets.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-s48f02k-72292-5z2hn-worker-f   0         0                             53m
$ oc -n openshift-machine-api logs -l k8s-app=cluster-autoscaler --tail 50
I0122 19:23:17.284762       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:23:17.687036       1 legacy.go:296] No candidates for scale down
W0122 19:23:27.924167       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
I0122 19:23:28.510701       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:23:28.909507       1 legacy.go:296] No candidates for scale down
W0122 19:23:39.148266       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
I0122 19:23:39.737359       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:23:40.135580       1 legacy.go:296] No candidates for scale down
W0122 19:23:50.376616       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
I0122 19:23:50.963064       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:23:51.364313       1 legacy.go:296] No candidates for scale down
W0122 19:24:01.601764       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
I0122 19:24:02.191330       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:02.589766       1 legacy.go:296] No candidates for scale down
I0122 19:24:13.415183       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:13.815851       1 legacy.go:296] No candidates for scale down
I0122 19:24:24.641190       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:25.040894       1 legacy.go:296] No candidates for scale down
I0122 19:24:35.867194       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:36.266400       1 legacy.go:296] No candidates for scale down
I0122 19:24:47.097656       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:47.498099       1 legacy.go:296] No candidates for scale down
I0122 19:24:58.326025       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:58.726034       1 legacy.go:296] No candidates for scale down
I0122 19:25:04.927980       1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0122 19:25:04.938213       1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 10.036399ms
I0122 19:25:09.552086       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:09.952094       1 legacy.go:296] No candidates for scale down
I0122 19:25:20.778317       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:21.178062       1 legacy.go:296] No candidates for scale down
I0122 19:25:32.005246       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:32.404966       1 legacy.go:296] No candidates for scale down
I0122 19:25:43.233637       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:43.633889       1 legacy.go:296] No candidates for scale down
I0122 19:25:54.462009       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:54.861513       1 legacy.go:296] No candidates for scale down
I0122 19:26:05.688410       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:06.088972       1 legacy.go:296] No candidates for scale down
I0122 19:26:16.915156       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:17.315987       1 legacy.go:296] No candidates for scale down
I0122 19:26:28.143877       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:28.543998       1 legacy.go:296] No candidates for scale down
I0122 19:26:39.369085       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:39.770386       1 legacy.go:296] No candidates for scale down
I0122 19:26:50.596923       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:50.997262       1 legacy.go:296] No candidates for scale down
I0122 19:27:01.823577       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:27:02.223290       1 legacy.go:296] No candidates for scale down
I0122 19:27:04.938943       1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0122 19:27:04.947353       1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 8.319938ms

Actual results

Scale-from-zero MachineAutoscaler fails on taint-deserialization when the referenced MachineSet contains spec.template.spec.taints.

Expected results

Scale-from-zero MachineAutoscaler works, even when the referenced MachineSet contains spec.template.spec.taints.

https://github.com/openshift/kubernetes-autoscaler/pull/282

Bug OCPBUGS-28904: [4.15] Live migration should only work in standalone managed clusters

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2243

Bug OCPBUGS-32716: Helm Plugin's Catalog incorrectly renders a single index entry into multiple tiles

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32059~~. The following is the description of the original issue:
—
Description of problem:

The Helm Plugin's index view is parsing a given chart entry's into multiple tiles if the individual entry names vary.

This is inconsistent with the Helm CLI experience, which treats all items in an index entry (i.e. all versions of a given chart) to be a part of the same chart.

Version-Release number of selected component (if applicable):

All

How reproducible:

100%

Steps to Reproduce:

    1. Open the Developer Console, Helm Plugin
    2. Select a namespace and Click to create a helm release
    3. Search for the developer-hub chart in the catalog (this is an example demonstrating the problem)

Actual results:

There are two tiles for Developer Hub, but only one index entry in the corresponding index (https://charts.openshift.io)

Expected results:

A single tile should exist for this single index entry.

Additional info:

The cause of this is an expected indexing inconsistency, but the experience should align with the Helm CLI's behavior, and should still represent a single catalog tile per index entry.

https://github.com/openshift/console/pull/13788

Bug OCPBUGS-42571: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/6834

Vulnerability OCPBUGS-43242: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/monitoring-plugin/pull/218

Bug OCPBUGS-18494: Upgrade DomainMapping CRD to API version v1beta1

View the Description View the linked PRs

Description of problem:

DomainMapping CRD is still using API version v1alpha1 but v1alpha1 will be removed from the Serverless Operator version 1.33. So, upgrade the API version to v1beta1 and it is available since Serverless operator 1.21.

Additional info:

NOTE: This should be backported to 4.11 and also check min Serverless operator version supported in 4.11

slack thread: https://redhat-internal.slack.com/archives/CJYKV1YAH/p1693809331579619

https://github.com/openshift/console/pull/13133

Bug OCPBUGS-23565: Bump to kubernetes 1.28.4

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.4:

Changelog:
v1.28.4: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1283

https://github.com/openshift/kubernetes/pull/1806

Story STOR-1281: Make Cinder CSI Driver Topology feature configurable

View the Description View the linked PRs

Currently, the Topology Feature is enabled by default by the openstack-cinder-csi-driver-operator. As seen in ~~OCPBUGS-4697~~, this is problematic in environments where there is a mismatch between Nova and Cinder AZs, such as in DCN environments where there may be multiple nova AZs but only a single cinder AZ. On initial read, the [BlockStorage] ignore-volume-az would appear to offer a way out, but as I noted in ~~OCPBUGS-4697~~ and upstream, this doesn't actually do what you'd think it does.

We should allow the user to configure this functionality via an operator-level configurable. We may wish to go one step further and also attempt to auto-detect the correct value by inspecting the available Nova and Cinder AZs. The latter step would require OpenStack API access from the operator, but both services do provide non-admin APIs to retrieve this information.

We also explored other options:

Make the AZ reported by the CSI node configurable by the user/operator, like the Manila CSI driver appears to do via a --nodeaz flag. This would require upstream changes and wouldn't actually address the scenario where Nova and Cinder AZs are distinct sets: an AZ that was valid for Cinder would allow us to create the volume in Cinder, but it would still be applied to the resulting PV as a nodeAffinity field unless you also set ignore-volume-az. This would conflict with the Nova-based AZs assigned to Nodes and prevent scheduling of the PV to any Node.
Change the meaning of [BlockStorage] ignore-volume-az to also avoid sending AZ information to Cinder, causing Cinder to fallback to [DEFAULT] default_availability_zone. This would also require upstream changes and would effectively give users two ways to disable the Topology feature. This would increase confusion and result in a worse UX.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/127

Bug OCPBUGS-18088: Remove 90s readiness probe initial delay for OVN-IC

View the Description View the linked PRs

OVN-IC doesn't use RAFT and doesn't need to wait a while for the cluster to converge. So we don't need the 90s delay for the readiness probe on the NB and SB containers anymore.

I think we only want to do this for multi-zone-interconnect though since the other deployment types would still use some RAFT.

Bug OCPBUGS-21719: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-authentication-operator/pull/635

Bug OCPBUGS-24157: Update 4.15 ose-vsphere-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-vsphere/pull/58

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-vsphere/pull/58

Bug OCPBUGS-29029: HyperShift KAS config should set ValidatingAdmissionPolicy plugin

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28659~~. The following is the description of the original issue:
—
Description of problem:

    The ValidatingAdmissionPolicy admission plugin is set in OpenShift 4.14+ kube-apiserver config, but is missing from the HyperShift config. It should be set.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    4.15: https://github.com/openshift/hypershift/blob/release-4.15/control-plane-operator/controllers/hostedcontrolplane/kas/config.go#L293-L341

    4.14: https://github.com/openshift/hypershift/blob/release-4.14/control-plane-operator/controllers/hostedcontrolplane/kas/config.go#L283-L331

Expected results:

    Expect to see ValidatingAdmissionPolicy

Additional info:

https://github.com/openshift/hypershift/pull/3523

Bug OCPBUGS-29793: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubevirt-csi-driver/pull/33

Bug OCPBUGS-33466: [4.15] vsphere-problem-detector-operator does not respect cluster wide proxy

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28879~~. The following is the description of the original issue:
—
Description of problem:

In a recently installed cluster running 4.13.29, after configuring the cluster-wide-proxy, the "vsphere-problem-detector" is not taking the proxy configuration.
As the pod cannot reach vSphere  it's failing to run checks:
2024-02-01T09:28:00.150332407Z E0201 09:28:00.150292       1 operator.go:199] failed to run checks: failed to connect to vsphere.local: Post "https://vsphere.local/sdk": dial tcp 172.16.1.3:443: i/o timeout  

The pod doesn't get the cluster proxy settings as expected:
   - name: HTTPS_PROXY
     value: http://proxy.local:3128
   - name: HTTP_PROXY
     value: http://proxy.local:3128

Other storage related pods get the configuration expected as above.

This causes the vsphere-problem-detector to fail connections to vSphere, hence failing the health checks.

Version-Release number of selected component (if applicable):

  4.13.29

How reproducible:

   Always

Steps to Reproduce:

    1.Configure cluster-wide proxy in the environment. 
    2. Wait for the change
    3. Check the pod configuration

Actual results:

    vSphere health checks failing

Expected results:

    vSphere health checks working through the cluster proxy

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/471

Bug OCPBUGS-39018: Ironic inspection fails due to utf-8 decoding issue on Disk serial

View the Description View the linked PRs

Description of problem:

Inspection is failing on hosts which special characters found in serial number of block devices:

Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: 2024-07-03 09:16:11.325 1 DEBUG ironic_python_agent.inspector [-] collected data: {'inventory'....'error': "The following errors were encountered:\n* collector logs failed: 'utf-8' codec can't decode byte 0xff in position 12: invalid start byte"} call_inspector /usr/lib/python3.9/site-packages/ironic_python_agent/inspector.py:128

Serial found:
"serial": "2HC015KJ0000\udcff\udcff\udcff\udcff\udcff\udcff\udcff\udcff"

Interesting stacktrace error:
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1260-1267: surrogates not allowed

Full stack trace:
~~~
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: 2024-07-03 09:16:11.628 1 DEBUG oslo_concurrency.processutils [-] CMD "lsblk -bia --json -oKNAME,MODEL,SIZE,ROTA,TYPE,UUID,PARTUUID,SERIAL" returned: 0 in 0.006s e
xecute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:422
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: --- Logging error ---
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: --- Logging error ---
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: Traceback (most recent call last):
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]:   File "/usr/lib64/python3.9/logging/__init__.py", line 1086, in emit
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: Traceback (most recent call last):
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib64/python3.9/logging/__init__.py", line 1086, in emit
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     stream.write(msg + self.terminator)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1260-1267: surrogates not allowed
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: Call stack:
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]:     stream.write(msg + self.terminator)
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1260-1267: surrogates not allowed
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/bin/ironic-python-agent", line 10, in <module>
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     sys.exit(run())
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/cmd/agent.py", line 50, in run
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     agent.IronicPythonAgent(CONF.api_url,
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: Call stack:
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/agent.py", line 485, in run
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     self.process_lookup_data(content)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/agent.py", line 400, in process_lookup_data
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     hardware.cache_node(self.node)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 3179, in cache_node
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     dispatch_to_managers('wait_for_disks')
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 3124, in dispatch_to_managers
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     return getattr(manager, method)(*args, **kwargs)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 997, in wait_for_disks
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     self.get_os_install_device()
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 1518, in get_os_install_device
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     block_devices = self.list_block_devices_check_skip_list(
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 1495, in list_block_devices_check_skip_list
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     block_devices = self.list_block_devices(
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 1460, in list_block_devices
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     block_devices = list_all_block_devices()
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 526, in list_all_block_devices
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     report = il_utils.execute('lsblk', '-bia', '--json',
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_lib/utils.py", line 111, in execute
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     _log(result[0], result[1])
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_lib/utils.py", line 99, in _log
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     LOG.debug('Command stdout is: "%s"', stdout)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: Message: 'Command stdout is: "%s"'
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: Arguments: ('{\n   "blockdevices": [\n      {\n         "kname": "loop0",\n         "model": null,\n         "size": 67467313152,\n         "rota": false,\n         "type": "loop",\n         "uuid": "28f5ff52-7f5b-4e5a-bcf2-59813e5aef5a",\n         "partuuid": null,\n         "serial": null\n      },{\n         "kname": "loop1",\n         "model": null,\n         "size": 1027846144,\n         "rota": false,\n         "type": "loop",\n         "uuid": null,\n         "partuuid": null,\n         "serial": null\n      },{\n         "kname": "sda",\n         "model": "LITEON IT ECE-12",\n         "size": 120034123776,\n         "rota": false,\n         "type": "disk",\n         "uuid": null,\n         "partuuid": null,\n         "serial": "XXXXXXXXXXXXXXXXXX"\n      },{\n         "kname": "sdb",\n         "model": "LITEON IT ECE-12",\n         "size": 120034123776,\n         "rota": false,\n         "type": "disk",\n         "uuid": null,\n         "partuuid": null,\n         "serial": "XXXXXXXXXXXXXXXXXXXX"\n      },{\n         "kname": "sdc",\n         "model": "External",\n         "size": 0,\n         "rota": true,\n         "type": "disk",\n         "uuid": null,\n         "partuuid": null,\n         "serial": "2HC015KJ0000\udcff\udcff\udcff\udcff\udcff\udcff\udcff\udcff"\n      }\n   ]\n}\n',)
~~~

Version-Release number of selected component (if applicable):

OCP 4.14.28

How reproducible:

Always

Steps to Reproduce:

    1. Add a BMH with a bad utf-8 characters in serial
    2.
    3.

Actual results:

Inspection fail

Expected results:

Inspection works

Additional info:

Bug OCPBUGS-21781: [gcp] please clarify what's wrong with the userLabel key "a"

View the Description View the linked PRs

Description of problem:

setting key beging "a" for platform.gcp.userLabels got error message which doesn't explain what's wrong exactly

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-15-164249

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit the install-config.yaml to insert userLabels settings (see [1])
3. "create cluster"

Actual results:

Error message shows up telling the label key "a" is invalid.

Expected results:

There should be no error, according to the statement "A label key can have a maximum of 63 characters and cannot be empty. Label must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-`".

Additional info:

$ openshift-install version
openshift-install 4.14.0-0.nightly-2023-10-15-164249
built from commit 359866f9f6d8c86e566b0aea7506dad22f59d860
release image registry.ci.openshift.org/ocp/release@sha256:3c5976a39479e11395334f1705dbd3b56580cd1dcbd514a34d9c796b0a0d9f8e
release architecture amd64
$ openshift-install explain installconfig.platform.gcp.userLabels
KIND:     InstallConfig
VERSION:  v1

RESOURCE: <[]object>
  userLabels has additional keys and values that the installer will add as labels to all resources that it creates on GCP. Resources created by the cluster itself may not include these labels. This is a TechPreview feature and requires setting CustomNoUpgrade featureSet with GCPLabelsTags featureGate enabled or TechPreviewNoUpgrade featureSet to configure labels.

FIELDS:
    key <string> -required-
      key is the key part of the label. A label key can have a maximum of 63 characters and cannot be empty. Label must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-`.    value <string> -required-
      value is the value part of the label. A label value can have a maximum of 63 characters and cannot be empty. Value must contain only lowercase letters, numeric characters, and the following special characters `_-`.

$ 

[1]
$ yq-3.3.0 r test12/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
  userLabels:
  - key: createdby
    value: installer-qe
  - key: a
    value: hello
$ yq-3.3.0 r test12/install-config.yaml featureSet
TechPreviewNoUpgrade
$ yq-3.3.0 r test12/install-config.yaml credentialsMode
Passthrough
$ openshift-install create cluster --dir test12
ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.gcp.userLabels[a]: Invalid value: "hello": label key is invalid or contains invalid characters. Label key can have a maximum of 63 characters and cannot be empty. Label key must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-` 
$

Bug OCPBUGS-17003: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/365

Bug OCPBUGS-17987: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-authentication-operator/pull/629

Bug OCPBUGS-24089: Update 4.15 ose-powervs-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-powervs/pull/66

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-powervs/pull/66

Bug OCPBUGS-25697: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver/pull/50

Bug OCPBUGS-36096: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/89

Bug OCPBUGS-36870: NTO operand reloads TuneD unnecessarily twice

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32469~~. The following is the description of the original issue:
—
Description of problem:


TuneD unnecessarily restarts twice when both current TuneD profile changes and when a new TuneD profile is selected.

Version-Release number of selected component (if applicable):


All NTO versions are affected.

How reproducible:


Depends on the order of k8s object updates (races), but nearly 100% reproducible.

Steps to Reproduce:

    1. Install SNO 
    2. Label your SNO node with label "profile"
    3. Create the following CR:

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift profile 1
      include=openshift-node
      [sysctl]
      kernel.pty.max=4096
    name: openshift-profile-1
  - data: |
      [main]
      summary=Custom OpenShift profile 2
      include=openshift-node
      [sysctl]
      kernel.pty.max=8192
    name: openshift-profile-2
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile-1

    4. Apply the following CR:

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift profile 1
      include=openshift-node
      [sysctl]
      kernel.pty.max=8192
    name: openshift-profile-1
  - data: |
      [main]
      summary=Custom OpenShift profile 2
      include=openshift-node
      [sysctl]
      kernel.pty.max=8192
    name: openshift-profile-2
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile-2

Actual results:


You'll see two restarts/applications of the openshift-profile-1

$ cat tuned-operand.log |grep "profile-1' applied"
2024-04-19 06:10:54,685 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-1' applied
2024-04-19 06:13:23,627 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-1' applied

Expected results:


Only 1 application of openshift-profile-1:

$ cat tuned-operand.log |grep "profile-1' applied"
2024-04-19 07:20:31,600 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-1' applied

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/1110

Bug OCPBUGS-18841: OCP-57089 and OCP-24504 failed in 4.14 azure platform for the load-balancer service couldn't get an external-IP address

View the Description View the linked PRs

Description of problem:

Failed to run auto OCP-57089 on a 4.14 azure platform, manually checked it, the created load-balancer service couldn't get an external-IP address

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-09-164123

How reproducible:

100% on the cluster

Steps to Reproduce:

1. Add a wait in the auto script, then run the case
      g.By("check if the lb services have obtained the EXTERNAL-IPs")
      regExp := "([0-9]+.[0-9]+.[0-9]+.[0-9]+)"
      time.Sleep(3600 * time.Second) 
% ./bin/extended-platform-tests run all --dry-run | grep 57089 | ./bin/extended-platform-tests run -f -

2.
% oc get ns | grep e2e-test-router
e2e-test-router-ingressclass-n2z2c                 Active   2m51s 

3. It was pending in EXTERNAL-IP column for internal-lb-57089 service
% oc -n e2e-test-router-ingressclass-n2z2c get svc
NAME                TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
external-lb-57089   LoadBalancer   172.30.198.7    20.42.34.61   28443:30193/TCP   3m6s
internal-lb-57089   LoadBalancer   172.30.214.30   <pending>     29443:31507/TCP   3m6s
service-secure      ClusterIP      172.30.47.70    <none>        27443/TCP         3m13s
service-unsecure    ClusterIP      172.30.175.59   <none>        27017/TCP         3m13s
% 

4.
% oc -n e2e-test-router-ingressclass-n2z2c get svc internal-lb-57089 -oyaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
  creationTimestamp: "2023-09-12T07:56:42Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  name: internal-lb-57089
  namespace: e2e-test-router-ingressclass-n2z2c
  resourceVersion: "209376"
  uid: b163bc03-b1c6-4e7b-b4e1-c996e9d135f4
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 172.30.214.30
  clusterIPs:
  - 172.30.214.30
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: https
    nodePort: 31507
    port: 29443
    protocol: TCP
    targetPort: 8443
  selector:
    name: web-server-rc
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}
%

Actual results:

internal-lb-57089 service couldn't get an external-IP address

Expected results:

internal-lb-57089 service can get an external-IP address

Additional info:

Bug OCPBUGS-27186: CNO IPsec API

View the Description View the linked PRs

This bug is to track the work needed to merge the CNO IPsec API backports

https://github.com/openshift/api/pull/1667

https://github.com/openshift/cluster-network-operator/pull/2200

Bug OCPBUGS-42355: Topology view shows "TypeError: Cannot read properties of null (reading 'metadata')"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37046~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35879~~. The following is the description of the original issue:
—
Description of problem:

Customer reports that in the OpenShift Container Platform for a single namespace they are seeing a "TypeError: Cannot read properties of null (reading 'metadata')" error when navigating to the Topology view (Developer Console):

TypeError: Cannot read properties of null (reading 'metadata')
    at s (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1220454)
    at s (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:424007)
    at t.a (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:330465)
    at na (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:58879)
    at Hs (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:111315)
    at xl (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:98327)
    at Cl (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:98255)
    at _l (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:98118)
    at pl (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:95105)
    at https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:44774

Screenshot is available in the linked Support Case. The following Stack Trace is shown:

at t.a (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:330387)
    at g
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at a (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:245070)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at t.a (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:426770)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at a (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:242507)
    at svg
    at div
    at https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:603940
    at u (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:602181)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at e.a (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:398426)
    at div
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:353461
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:354168
    at s (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1405970)
    at S (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:98:86864)
    at i (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:452052)
    at withFallback(Connect(withUserSettingsCompatibility(undefined)))
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:62178)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:545565)
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:775077)
    at div
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:458280)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:719437)
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:9899)
    at div
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:512628
    at S (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:98:86864)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:123:75018)
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:511867
    at https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:150:220157
    at https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:375316
    at div
    at R (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:183146)
    at N (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:183594)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:509351
    at https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:548866
    at S (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:98:86864)
    at div
    at div
    at t.b (https://console.apps.example.com/static/dev-console/code-refs/common-chunk-5e4f38c02bde64a97ae5.min.js:1:113711)
    at t.a (https://console.apps.example.com/static/dev-console/code-refs/common-chunk-5e4f38c02bde64a97ae5.min.js:1:116541)
    at u (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:305613)
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:509656
    at i (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:452052)
    at withFallback()
    at t.a (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:553554)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:67625)
    at I (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1533554)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:69670)
    at Suspense
    at i (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:452052)
    at section
    at m (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:720427)
    at div
    at div
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1533801)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:545565)
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:775077)
    at div
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:458280)
    at l (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1175827)
    at https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:458912
    at S (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:98:86864)
    at main
    at div
    at v (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:264220)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:62178)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:545565)
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:775077)
    at div
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:458280)
    at Un (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:36:183620)
    at t.default (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:880042)
    at e.default (https://console.apps.example.com/static/quick-start-chunk-794085a235e14913bdf3.min.js:1:3540)
    at s (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:239711)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1610459)
    at ee (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628636)
    at _t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:36:142374)
    at ee (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628636)
    at ee (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628636)
    at ee (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628636)
    at i (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:830807)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1604651)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1604840)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1602256)
    at te (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628767)
    at https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1631899
    at r (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:36:121910)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:67625)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:69670)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:64230)
    at re (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1632210)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:804787)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1079398)
    at s (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:654118)
    at t.a (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:150:195887)
    at Suspense

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.13.38
Developer Console

How reproducible:

Only on customer side, in a single namespace on a single cluster

Steps to Reproduce:

1. On a particular cluster, enter the Developer Console
2. Navigate to "Topology"

Actual results:

Loading the page fails with the error "TypeError: Cannot read properties of null (reading 'metadata')"

Expected results:

No error is shown. The Topology view is shown

Additional info:

- Screenshot available in linked Support Case
- HAR file is available in linked Support Case

https://github.com/openshift/console/pull/14319

Bug OCPBUGS-17724: Unable to destroy cluster when AWS Organization SCP prevents use of iam:GetUser

View the Description View the linked PRs

Environment: OCP 4.12.24
Installation Method: IPI: Manual Mode + STS using a customer provider AWS IAM Role

I am trying to deploy an OCP4 cluster on AWS for my customer. The customer does not permit creation of IAM users so I am performing a Manual Mode with STS IPI installation instead. I have been given an IAM role to assume for the OCP installation, but unfortunately the customer's AWS Organizational Service Control Policy (SCP) does not permit the use of the iam:GetUser{} permission.

(I have informed my customer that iam:GetUser is an installation requirement - it's clearly documented in our docs, and I have raised a ticket with their internal support team requesting that their SCP is amended to include iam:getUser, however I have been informed that my request is likely to be rejected).

With this limitation understood, I still attempted to install OCP4. Surprisingly, I was able to deploy an OCP (4.12) cluster without any apparent issues, however when I tried to destroy the cluster I encountered the following error from the installer (note: fields in brackets <> have been redacted):

DEBUG search for IAM roles
DEBUG iterating over a page of 74 IAM roles
DEBUG search for IAM users
DEBUG iterating over a page of 1 IAM users
INFO get tags for <ARN of the IAM user>: AccessDenied: User:<ARN of my user> is notauthorized to perform: iam:GetUser on resource: <IAMusername> with an explicit deny in a service control policy
INFO status code: 403, request id: <request ID>
DEBUG search for IAM instance profiles
INFO error while finding resources to delete error=get tags for <ARN of IAM user> AccessDenied: User:<ARN of my user> is not authorized to perform: iam:GetUser on resource: <IAM username> with an explicit deny in a service control policy status code: 403, request id: <request ID>

Similarly, the error in AWS CloudTrail logs shows the following (note: some fields in brackets have been redacted):
User: arn:aws:sts::<AWS account no>:assumed-role/<role-name>/<user name> is not authorized to perform: iam:GetUser on resource <IAM User> with an explicit deny in a service control policy

It appears that the destroy operation is failing when the installer is trying to list tags on the only IAM user in the customer's AWS account. As discussed, the SCP does not permit the use of iam:GetUser and consequently this API call on the IAM user is denied. The installer then enters an endless loop as it continuously retries the operation. We have potentially identified the iamUserSearch function within the installer code at pkg/destroy/aws/iamhelpers.go as the area where this call is failing.

See: https://github.com/openshift/installer/blob/16f19ea94ecdb056d4955f33ddacc96c57341bb2/pkg/destroy/aws/iamhelpers.go#L95

There does not appear to be a handler for "AccessDenied" API error in this function. Therefore we request that the access denied event is gracefully handled and skipped over when processing IAM users, allowing the installer to continue with the destroy operation, much in the same way that a similar access denied event is handled within the iamRoleSearch function when processing IAM roles:

See: https://github.com/openshift/installer/blob/16f19ea94ecdb056d4955f33ddacc96c57341bb2/pkg/destroy/aws/iamhelpers.go#L51

We therefore request that the following is considered and addressed:

1. Re-assess if the iam:GetUser permission is actually needed for cluster installation/cluster operations.
2. If the permission is required then the installer should provide a warning or halt the installation.
2. During a "destroy" cluster operation - the installer should gracefully handle AccessDenied errors from the API and "skip over" any IAM Users that the installer does not have permission to list tags for and then continue gracefully with the destroy operation.

https://github.com/openshift/installer/pull/7429

Bug OCPBUGS-34571: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/1049

Bug OCPBUGS-23467: 4.15.0-ec.2 and later should delete the validating-webhook-configuration ValidatingWebhookConfiguration

View the Description View the linked PRs

Description of problem:

It was renamed between ec.1 and ec.2:

$ oc adm release extract --to ec.1 quay.io/openshift-release-dev/ocp-release:4.15.0-ec.1-x86_64
$ oc adm release extract --to ec.2 quay.io/openshift-release-dev/ocp-release:4.15.0-ec.2-x86_64
$ yaml2json <ec.1/0000_30_cluster-api_10_webhooks.yaml | jq -r  .metadata.name
validating-webhook-configuration
$ yaml2json <ec.2/0000_30_cluster-api_10_webhooks.yaml | jq -r  .metadata.name
cluster-capi-operator

And the presence of the old config breaks updates across the gap, as the operator tries to act on resources that are still guarded by a webhook config, despite there no longer being anything serving the hooks it had pointed at. Or something like that. In any case, the cluster-api ClusterOperator goes Degraded=True on SyncingFailed with {{Failed to resync for operator: 4.15.0-ec.2 because &

{%!e(string=unable to reconcile CoreProvider: unable to create or update CoreProvider: Internal error occurred: failed calling webhook "vcoreprovider.operator.cluster.x-k8s.io": failed to call webhook: the server could not find the requested resource)}

}} until the old ValidatingWebhookConfiguration is deleted, and after that deletion, the ClusterOperator recovers.

Version-Release number of selected component (if applicable):

4.15.0-ec.2.

How reproducible:

Untested, but I'd guess 100%.

Steps to Reproduce:

1. Install a tech-preview 4.15.0-ec.1 cluster.
2. Request an update to 4.15.0-ec.2.
3. Wait an hour or so.

Actual results:

cluster-api ConsoleOperator is Degraded=True, blocking further progress in the ClusterVersion update.

Expected results:

ClusterVersion update happily completes.

https://github.com/openshift/cluster-capi-operator/pull/145

Bug OCPBUGS-24126: Update 4.15 prometheus-operator-admission-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/260

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/260

Bug OCPBUGS-28237: The MCO should allow users to skip image registry change disruption

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27508~~. The following is the description of the original issue:
—
Description of problem:

The MCO logic today allows users to not reboot when changing the
registries.conf file (through ICSP/IDMS/ITMS objects), but the MCO will
sometimes drain the node if the change is deemed "unsafe" (deleting a
mirror, for example).

This behaviour is very disruptive for some customers who with to make
all image registries changes non-disruptive. We will address this long term with admin defined policies via the API properly, but we would like to have a backportable solution (as a support exception) for users to do so

Version-Release number of selected component (if applicable):

    4.14->4.16

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4148

Bug OCPBUGS-33885: Automatic scaling not always working because NodeGroup.GetOptions() not being implemented

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33592~~. The following is the description of the original issue:
—
Description of problem:

While investigating a problem with OpenShift Container Platform 4 - Node scaling, I found the below messages reported in my OpenShift Container Platform 4 - Cluster.

E0513 11:15:09.331353       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
E0513 11:15:09.331365       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.331529       1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
I0513 11:15:09.331684       1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
E0513 11:15:09.332076       1 orchestrator.go:507] Failed to get autoscaling options for node group MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c: Not implemented
I0513 11:15:09.332100       1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332110       1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332135       1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]

The same events are reported in must-gather reviewed from customers. Given that we have https://github.com/kubernetes/autoscaler/issues/6037 and https://github.com/kubernetes/autoscaler/issues/6676 that appear to be solved via https://github.com/kubernetes/autoscaler/pull/6677 and https://github.com/kubernetes/autoscaler/pull/6038 I'm wondering whether we should pull in those changes as they seem to eventually impact automated scaling of OpenShift Container Platform 4 - Node(s).

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.15

How reproducible:

Always

Steps to Reproduce:

1. Setup OpenShift Container Platform 4 with ClusterAutoscaler configured
2. Trigger scaling activity and verify the cluster-autoscaler-default logs

Actual results:

Logs like the below are being reported.

E0513 11:15:09.331353       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
E0513 11:15:09.331365       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.331529       1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
I0513 11:15:09.331684       1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
E0513 11:15:09.332076       1 orchestrator.go:507] Failed to get autoscaling options for node group MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c: Not implemented
I0513 11:15:09.332100       1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332110       1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332135       1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]

Expected results:

Scale-up of OpenShift Container Platform 4 - Node to happen without error being reported

I0513 11:15:09.331529       1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
I0513 11:15:09.331684       1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
I0513 11:15:09.332100       1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332110       1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332135       1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]

Additional info:

Please review https://github.com/kubernetes/autoscaler/issues/6037 and https://github.com/kubernetes/autoscaler/issues/6676 as they seem to document the problem and also have a solution linked/merged

https://github.com/openshift/kubernetes-autoscaler/pull/301

Bug OCPBUGS-23769: After PatternFly5 update: Typology list view hover state is incorrect

View the Description View the linked PRs

Issue 43 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

The active and hover states for the typology list view is incorrect

Screenshot: https://drive.google.com/file/d/1DMwmYsvdHXvMBYr0gOD9mActmJNMaH6z/view?usp=share_link

https://github.com/openshift/console/pull/13368

Bug OCPBUGS-29022: [release-4.15] Workloads -> Deployments -> Deployment -> Details -> Volumes -> Remove volume : Translation missing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27908~~. The following is the description of the original issue:
—
Description of problem:

    Navigation:
    Workloads -> Deployments -> (select any Deployment from list) -> Details -> Volumes -> Remove volume

    Issue:
    Message "Are you sure you want to remove volume audit-policies from Deployment: apiserver?" is in English.

    Observation:
    Translation is present in branch release-4.15 file...
    frontend/public/locales/ja/public.json

Version-Release number of selected component (if applicable):

    4.15.0-rc.3

How reproducible:

    Always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Content is in English

Expected results:

    Content should be in selected language

Additional info:

    Reference screenshot attached.

https://github.com/openshift/console/pull/13575

Bug OCPBUGS-30899: error adding container to network "ovn-kubernetes": CNI request failed with status 400

View the Description View the linked PRs

Description of problem:

Pods fail to get scheduled, they remain in ContainerCreating status and from the journal logs we see some ovn errors.

Version-Release number of selected component (if applicable):

OCP 4.14.16 and nightlies after OpenShift 4.14 nightly 2024-03-08 18:06

How reproducible:

    Randomly, and until now, only 1 node in the cluster shows this behaviour (not always the same node)

Steps to Reproduce:

1. Prepare NMstate manifest to use dual-stack through DHCP for LACP bond0 (br-ex), and bond0.vlanY (secondary bridge br-ex1)
2. Deploy OCP 4.14 via IPI with latest nightly GA on a baremetal cluster with OVN-K and NMstate configuration in install-config.yaml as day1 (dedicated worker nodes)
3. After the cluster is ready, apply a Performance Profile
4. Create a basic application with a Deployment, and check the pods, in a replica of 3, sometimes a pod remains in ContainerCreating, and when checking other pods in that node, most of them are in the same status.
5. Check the journal logs of the worker and look for errors such as *error adding container to network "ovn-kubernetes": CNI request failed with status 400*

Actual results:

No pods are scheduled in one of the worker nodes. In a random worker they remain in ContainerCreating status

Expected results:

Pods should be scheduled in any worker, and their status should be "Running"

Affected Platforms:

Only tested in Baremetal deployments with IPI and OVN-kubernetes

Additional info:

If we restart the ovnkube-node-* pod in that worker (delete the pod so it gets recreated) the pods are created, marked as Running and the log errors in the journal disappear.

More details:

We noticed several pods not running, and all of them are in the same worker node

$ oc get pods -A -o wide| grep -Eiv "running|complete"
NAMESPACE                                          NAME                                                              READY   STATUS              RESTARTS        AGE     IP              NODE       NOMINATED NODE   READINESS GATES
myns                                               webserver-6dc5cb556d-5pb9g                                        0/1     ContainerCreating   0               49m     <none>          worker-2   <none>           <none>
openshift-logging                                  cluster-logging-operator-666468c794-snd77                         0/1     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
openshift-monitoring                               thanos-querier-647c9db798-tbtjk                                   0/6     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
spk-data                                           f5-tmm-557bd77784-qvdww                                           0/3     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
spk-dns46                                          f5-tmm-78d4fbc46d-shxrs                                           0/3     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
spk-test                                           f5-hello-world-74d48dc4c6-689jp                                   0/1     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
spk-utilities                                      f5-cert-manager-webhook-6674ddd499-bzpb2                          0/1     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
spk-utilities                                      f5-rabbit-565d9cc79d-fjl4s                                        0/1     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
spk-utilities                                      f5-spk-cwc-7b44fbbcdf-tksxx                                       0/2     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
spk-utilities                                      spk-utilities-f5-dssm-db-1                                        0/3     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
spk-utilities                                      spk-utilities-f5-dssm-sentinel-0                                  0/3     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>
trident                                            trident-controller-86867589c8-bl2wt                               0/6     ContainerCreating   0               10m     <none>          worker-2   <none>           <none>

In the journal log of the worker we could see messages like this:

 Warning  FailedCreatePodSandBox  2m10s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webserver-6dc5cb556d-5pb9g_myns_acf475b2-3b7b-4861-9bc8-9c2b14285b85_0(fccefe3b404b
c01d645602d2c55283396f7c854cb9abedbed4bf75c9886b9601): error adding pod myns_webserver-6dc5cb556d-5pb9g to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request fail
ed with status 400: '&{ContainerID:fccefe3b404bc01d645602d2c55283396f7c854cb9abedbed4bf75c9886b9601 Netns:/var/run/netns/6c2cd468-28b9-42cf-b8b8-33719c353888 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=myns;K8S_POD_NAME=webserver-6
dc5cb556d-5pb9g;K8S_POD_INFRA_CONTAINER_ID=fccefe3b404bc01d645602d2c55283396f7c854cb9abedbed4bf75c9886b9601;K8S_POD_UID=acf475b2-3b7b-4861-9bc8-9c2b14285b85 Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 9
8 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 
116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 
105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116
 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110
 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 $00 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 1$3 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 $
16 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 12$]} ContainerID:"fccefe3b404bc01d645602d2c55283396f7c854cb9abedbed4bf75c9886b9601" Netns:"/var/run/netns/6c2cd468-28b9-42cf-b8b8-33719c353888" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=myns;K8S_POD_NAME=webserver-6dc5cb556d-5p$9g;K8S_POD_INFRA_CONTAINER_ID=fccefe3b404bc01d645602d2c55283396f7c854cb9abedbed4bf75c9886b9601;K8S_POD_UID=acf475b2-3b7b-4861-9bc8-9c2b14285b85" Path:"" ERRORED: error configuring pod [myns/webserver-6dc5cb556d-5pb9g] networking: [myns/w$bserver-6dc5cb556d-5pb9g/acf475b2-3b7b-4861-9bc8-9c2b14285b85:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[myns/webserver-6dc5cb556d-5pb9g fccefe3b404bc01d645602d2c55283396f7c$
54cb9abedbed4bf75c9886b9601 network default NAD default] [myns/webserver-6dc5cb556d-5pb9g fccefe3b404bc01d645602d2c55283396f7c854cb9abedbed4bf75c9886b9601 network default NAD default] failed to get pod annotation: timed out waiting for a$notations: context deadline exceeded

The ovn pod is running in the worker-2, and there are no issues in the logs, also br-ex and br-ex1 interfaces look healthy (they have both ipv4 and ip6v)

$ oc -n openshift-ovn-kubernetes get pods -o wide
NAME                                     READY   STATUS    RESTARTS        AGE     IP              NODE       NOMINATED NODE   READINESS GATES                                                                                               
ovnkube-control-plane-588d654c6d-bdrjl   2/2     Running   0               3h      192.168.12.22   master-1   <none>           <none>                                                                                                        
ovnkube-control-plane-588d654c6d-kgltz   2/2     Running   0               3h14m   192.168.12.23   master-2   <none>           <none>                                                                                                        
ovnkube-node-4wlw9                       8/8     Running   16              3h49m   192.168.12.25   worker-1   <none>           <none>                                                                                                        
ovnkube-node-786qv                       8/8     Running   9 (3h14m ago)   4h28m   192.168.12.23   master-2   <none>           <none>                                                                                                        
ovnkube-node-7ltvf                       8/8     Running   9 (3h1m ago)    4h28m   192.168.12.22   master-1   <none>           <none>                                                                                                        
ovnkube-node-dmhm2                       8/8     Running   16              3h51m   192.168.12.27   worker-3   <none>           <none>                                                                                                        
ovnkube-node-phm6h                       8/8     Running   25              3h49m   192.168.12.24   worker-0   <none>           <none>                                                                                                        
ovnkube-node-vsmdx                       8/8     Running   32              3h49m   192.168.12.26   worker-2   <none>           <none>                                                                                                        
ovnkube-node-zxhqh                       8/8     Running   9 (168m ago)    4h28m   192.168.12.21   master-0   <none>           <none>                                                                                                        

$ oc -n openshift-ovn-kubernetes logs ovnkube-node-vsmdx | tail 
Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy-node, kube-rbac-proxy-ovn-metrics, northd, nbdb, sbdb, ovnkube-controller, kubecfg-setup (init)
2024-03-13T14:43:16.092Z|00064|binding|INFO|Setting lport openshift-network-diagnostics_network-check-target-9m5mg ovn-installed in OVS
2024-03-13T14:43:16.092Z|00065|binding|INFO|Setting lport openshift-network-diagnostics_network-check-target-9m5mg up in Southbound
2024-03-13T14:43:16.092Z|00066|binding|INFO|Setting lport openshift-dns_dns-default-2gs7f ovn-installed in OVS
2024-03-13T14:43:16.092Z|00067|binding|INFO|Setting lport openshift-dns_dns-default-2gs7f up in Southbound
2024-03-13T14:43:16.236Z|00068|binding|INFO|Claiming lport openshift-ingress-canary_ingress-canary-vmshk for this chassis.
2024-03-13T14:43:16.236Z|00069|binding|INFO|openshift-ingress-canary_ingress-canary-vmshk: Claiming 0a:58:0a:80:02:07 10.128.2.7 fd02:0:0:5::7
2024-03-13T14:43:16.237Z|00070|binding|INFO|Setting lport openshift-ingress-canary_ingress-canary-vmshk down in Southbound
2024-03-13T14:43:16.248Z|00071|binding|INFO|Setting lport openshift-ingress-canary_ingress-canary-vmshk ovn-installed in OVS
2024-03-13T14:43:16.248Z|00072|binding|INFO|Setting lport openshift-ingress-canary_ingress-canary-vmshk up in Southbound
2024-03-13T14:43:47.006Z|00073|memory_trim|INFO|Detected inactivity (last active 30003 ms ago): trimming memory

[core@worker-2 ~]$ ip a s br-ex
23: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether b8:83:03:8e:0e:dc brd ff:ff:ff:ff:ff:ff
    inet 192.168.12.26/24 brd 192.168.12.255 scope global dynamic noprefixroute br-ex
       valid_lft 2716sec preferred_lft 2716sec
    inet 169.254.169.2/29 brd 169.254.169.7 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fd69::2/125 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fd1c:61fe:bdf1:12::1a/128 scope global dynamic noprefixroute 
       valid_lft 6128sec preferred_lft 6128sec
    inet6 fe80::ba83:3ff:fe8e:edc/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[core@worker-2 ~]$ ip a s br-ex1
24: br-ex1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether b8:83:03:8e:0e:dc brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.143/26 brd 192.168.16.191 scope global dynamic noprefixroute br-ex1
       valid_lft 2718sec preferred_lft 2718sec
    inet6 fd48:de67:5083:16::36/128 scope global dynamic noprefixroute 
       valid_lft 6209sec preferred_lft 6209sec
    inet6 fe80::ba83:3ff:fe8e:edc/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

I deleted the ovn pod in worker-2 to see if this clears out the issue

$ oc -n openshift-ovn-kubernetes get pods -o wide | egrep "NAME|worker-2"
NAME                                     READY   STATUS    RESTARTS        AGE     IP              NODE       NOMINATED NODE   READINESS GATES
ovnkube-node-vsmdx                       8/8     Running   32              3h59m   192.168.12.26   worker-2   <none>           <none>

$ oc -n openshift-ovn-kubernetes delete pod ovnkube-node-vsmdx
pod "ovnkube-node-vsmdx" deleted

$ oc -n openshift-ovn-kubernetes get pods -o wide | egrep "NAME|worker-2"
NAME                                     READY   STATUS    RESTARTS        AGE     IP              NODE       NOMINATED NODE   READINESS GATES
ovnkube-node-l6v76                       8/8     Running   0               36s     192.168.12.26   worker-2   <none>           <none>

Then I wait for a while, and all previous pods were running in worker-2 after, no issues

$ oc get pods -A -o wide| grep -Eiv "running|complete"
NAMESPACE                                          NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE       NOMINATED NODE   READINESS GATES
[kni@provisioner.cluster1.dfwt5g.lab ~]$ oc get pods -A -o wide| grep worker-2
kube-system                                        istio-cni-node-f266h                                              1/1     Running     2               3h1m    10.128.2.19     worker-2   <none>           <none>
myns                                               webserver-6dc5cb556d-5pb9g                                        1/1     Running     0               84m     10.128.2.16     worker-2   <none>           <none>
openshift-cluster-node-tuning-operator             tuned-85g6g                                                       1/1     Running     4               4h23m   192.168.12.26   worker-2   <none>           <none>
openshift-dns                                      dns-default-2gs7f                                                 2/2     Running     8               4h22m   10.128.2.6      worker-2   <none>           <none>
openshift-dns                                      node-resolver-srqvf                                               1/1     Running     4               4h23m   192.168.12.26   worker-2   <none>           <none>
openshift-image-registry                           node-ca-pwlln                                                     1/1     Running     4               4h23m   192.168.12.26   worker-2   <none>           <none>
openshift-ingress-canary                           ingress-canary-vmshk                                              1/1     Running     4               4h22m   10.128.2.7      worker-2   <none>           <none>
openshift-kni-infra                                coredns-worker-2                                                  2/2     Running     8               4h23m   192.168.12.26   worker-2   <none>           <none>
openshift-kni-infra                                keepalived-worker-2                                               2/2     Running     8               4h23m   192.168.12.26   worker-2   <none>           <none>
openshift-logging                                  cluster-logging-operator-666468c794-snd77                         1/1     Running     0               45m     10.128.2.14     worker-2   <none>           <none>
openshift-machine-config-operator                  machine-config-daemon-c8xrh                                       2/2     Running     8               4h23m   192.168.12.26   worker-2   <none>           <none>
openshift-monitoring                               node-exporter-w9h6l                                               2/2     Running     8               4h21m   192.168.12.26   worker-2   <none>           <none>
openshift-monitoring                               thanos-querier-647c9db798-tbtjk                                   6/6     Running     0               45m     10.128.2.10     worker-2   <none>           <none>
openshift-multus                                   multus-additional-cni-plugins-hq884                               1/1     Running     4               4h23m   192.168.12.26   worker-2   <none>           <none>
openshift-multus                                   multus-zrg4b                                                      1/1     Running     6 (22m ago)     4h23m   192.168.12.26   worker-2   <none>           <none>
openshift-multus                                   network-metrics-daemon-52v4p                                      2/2     Running     8               4h23m   10.128.2.4      worker-2   <none>           <none>
openshift-network-diagnostics                      network-check-target-9m5mg                                        1/1     Running     4               4h23m   10.128.2.3      worker-2   <none>           <none>
openshift-ovn-kubernetes                           ovnkube-node-l6v76                                                8/8     Running     0               22m     192.168.12.26   worker-2   <none>           <none>
openshift-sriov-network-operator                   sriov-device-plugin-w76nv                                         1/1     Running     0               40m     192.168.12.26   worker-2   <none>           <none>
openshift-sriov-network-operator                   sriov-network-config-daemon-h6clk                                 1/1     Running     4               4h4m    192.168.12.26   worker-2   <none>           <none>
spk-data                                           f5-tmm-557bd77784-qvdww                                           3/3     Running     0               45m     10.128.2.12     worker-2   <none>           <none>
spk-dns46                                          f5-tmm-78d4fbc46d-shxrs                                           3/3     Running     0               45m     10.128.2.17     worker-2   <none>           <none>
spk-test                                           f5-hello-world-74d48dc4c6-689jp                                   1/1     Running     0               45m     10.128.2.13     worker-2   <none>           <none>
spk-utilities                                      f5-cert-manager-webhook-6674ddd499-bzpb2                          1/1     Running     0               45m     10.128.2.8      worker-2   <none>           <none>
spk-utilities                                      f5-rabbit-565d9cc79d-fjl4s                                        1/1     Running     0               45m     10.128.2.15     worker-2   <none>           <none>
spk-utilities                                      f5-spk-cwc-7b44fbbcdf-tksxx                                       2/2     Running     0               45m     10.128.2.5      worker-2   <none>           <none>
spk-utilities                                      spk-utilities-f5-dssm-db-1                                        3/3     Running     0               45m     10.128.2.18     worker-2   <none>           <none>
spk-utilities                                      spk-utilities-f5-dssm-sentinel-0                                  3/3     Running     0               45m     10.128.2.9      worker-2   <none>           <none>
trident                                            trident-controller-86867589c8-bl2wt                               6/6     Running     0               45m     10.128.2.11     worker-2   <none>           <none>
trident                                            trident-node-linux-6tl8z                                          2/2     Running     4               3h2m    192.168.12.26   worker-2   <none>           <none>

https://github.com/openshift/ovn-kubernetes/pull/2105

Bug HOSTEDCP-1281: Creating of HostedCluster fails on webhook

View the Description View the linked PRs

When creating a HostedCluster from the cli, with KubeVirt platform and external infra-cluster, the creation is failed with this message:

hypershift_framework.go:223: failed to create cluster, tearing down: failed to apply object "e2e-clusters-jqrxx/example-kk2sm": admission webhook "hostedclusters.hypershift.openshift.io" denied the request: Secret "example-kk2sm-infra-credentials" not found

The reason for that is the HosterCluster CR is created before the kubeconfig secret of the external infra-cluster is created. The HostedCluster creation webhook is trying to access the external infra-cluster, fails to find the secret that is not created yet.

https://github.com/openshift/hypershift/pull/3164

Bug OCPBUGS-19258: Update 4.15 egress-router-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/egress-router-cni/pull/76

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/egress-router-cni/pull/76

Bug OCPBUGS-19282: Update 4.15 ose-tools image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1545

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1545

Bug OCPBUGS-38496: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14152

Bug OCPBUGS-39463: [OCP 4.15] "error getting ignition payload: failed to download binaries"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39447~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-39419~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-38794~~. The following is the description of the original issue:
—
Description of problem:

HCP cluster is being updated but the nodepool is stuck updating:
~~~
NAME                   CLUSTER   DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
nodepool-dev-cluster   dev       2               2               False         False        4.15.22   True              True
~~~

Version-Release number of selected component (if applicable):

Hosting OCP cluster 4.15
HCP 4.15.23

How reproducible:

N/A

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Nodepool stuck in upgrade

Expected results:

Upgrade success

Additional info:

I have found this error repeating continually in the ignition-server pods:
~~~
{"level":"error","ts":"2024-08-20T09:02:19Z","msg":"Reconciler error","controller":"secret","controllerGroup":"","controllerKind":"Secret","Secret":{"name":"token-nodepool-dev-cluster-3146da34","namespace":"dev-dev"},"namespace":"dev-dev","name":"token-nodepool-dev-cluster-3146da34","reconcileID":"ec1f0a7f-1657-4245-99ef-c984977ff0f8","error":"error getting ignition payload: failed to download binaries: failed to extract image file: failed to extract image file: file not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}

{"level":"info","ts":"2024-08-20T09:02:20Z","logger":"get-payload","msg":"discovered machine-config-operator image","image":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3b55cc8f88b9e6564fe6ad0bc431cd7270c0586a06d9b4a19ff2b518c461ede"}
{"level":"info","ts":"2024-08-20T09:02:20Z","logger":"get-payload","msg":"created working directory","dir":"/payloads/get-payload4089452863"}

{"level":"info","ts":"2024-08-20T09:02:28Z","logger":"get-payload","msg":"extracted image-references","time":"8s"}

{"level":"info","ts":"2024-08-20T09:02:38Z","logger":"get-payload","msg":"extracted templates","time":"10s"}
{"level":"info","ts":"2024-08-20T09:02:38Z","logger":"image-cache","msg":"retrieved cached file","imageRef":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3b55cc8f88b9e6564fe6ad0bc431cd7270c0586a06d9b4a19ff2b518c461ede","file":"usr/lib/os-release"}
{"level":"info","ts":"2024-08-20T09:02:38Z","logger":"get-payload","msg":"read os-release","mcoRHELMajorVersion":"8","cpoRHELMajorVersion":"9"}
{"level":"info","ts":"2024-08-20T09:02:38Z","logger":"get-payload","msg":"copying file","src":"usr/bin/machine-config-operator.rhel9","dest":"/payloads/get-payload4089452863/bin/machine-config-operator"}
~~~

https://github.com/openshift/hypershift/pull/4666

Bug OCPBUGS-20342: Flaky debug pod return code

View the Description View the linked PRs

Description of problem:

As a part of the forbidden node label e2e test, we execute `oc debug` command to set the forbidden labels on the node. The `oc debug` command is expected to fail while applying the forbidden label.

In our testing, we observed that even though the actual command on the node (kubectl label node/<node> <forbidden_label>) expectedly fails, the `oc debug` command does not carry the return code correctly (it will return 0, even though `kubectl label` fails with error).

Version-Release number of selected component (if applicable):

4.14

How reproducible:

flaky

Steps to Reproduce:

1. Run the test at https://gist.github.com/harche/c9143c382cfe94d7836414d5ccc0ba45
2. Observe that sometimes it flakes at https://gist.github.com/harche/c9143c382cfe94d7836414d5ccc0ba45#file-test-go-L39

Actual results:

oc debug return value flakes

Expected results:

oc debug return value should be consistent.

Additional info:

https://github.com/openshift/oc/pull/1571

Bug OCPBUGS-23697: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/48

Bug OCPBUGS-33402: large number of additional manifests exceeds ignition area

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-14478~~. The following is the description of the original issue:
—
Description of problem:

This was discovered during Contrail testing when a large number of additional manifests specific to contrail were added to the openshift/ dir. The additional manifests are here - https://github.com/Juniper/contrail-networking/tree/main/releases/23.1/ocp.

When creating the agent image the following error occurred:
failed to fetch Agent Installer ISO: failed to generate asset \"Agent Installer ISO\": failed to create overwrite reader for ignition: content length (802204) exceeds embed area size (262144)"]

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8635

Bug MGMT-16414: When trying to create cluster with s390x architecture, an error occurs that stops cluster creation

View the Description View the linked PRs

Description of the problem:

When trying to create cluster with s390x architecture, an error occurs that stops cluster creation. The error is "cannot use Skip MCO reboot because it's not compatible with the s390x architecture on version 4.15.0-ec.3 of OpenShift"

How reproducible:

Always

Steps to reproduce:

Create cluster with architecture s390x

Actual results:

Create failed

Expected results:

Create should succeed

https://github.com/openshift/assisted-service/pull/5876

Bug OCPBUGS-11437: MCO keeps the pull secret to .orig file once it replaced

View the Description View the linked PRs

Description of problem:

If we replace the cluster global pull secret with a empty one then MCO keeps the original secret file in `/etc/machine-config-daemon/orig/var/lib/kubelet/config.json.mcdorig` location.

Version-Release number of selected component (if applicable):

4.12.z

Steps to Reproduce:

1. create a sno cluster using cluster-bot
- launch 4.12.9 aws,single-node 

2. Replace the pull secret
```
$ cat <<EOF | oc replace -f -
apiVersion: v1
data:
  .dockerconfigjson: e30K
kind: Secret
metadata:
  name: pull-secret
  namespace: openshift-config
type: kubernetes.io/dockerconfigjson
EOF
```

3. Wait for cluster to conciliated
```
$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
00-worker                                          f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
01-master-container-runtime                        f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
01-master-kubelet                                  f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
01-worker-container-runtime                        f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
01-worker-kubelet                                  f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
99-master-generated-kubelet                        f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
99-master-generated-registries                     f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
99-master-ssh                                                                                 3.2.0             60m
99-worker-generated-registries                     f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
99-worker-ssh                                                                                 3.2.0             60m
rendered-master-50d505c46c5e1dae8f1d91c81b2e0d1e   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
rendered-master-619b2780e8787c88c3acb0c68de45a9f   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             36m
rendered-master-801d3c549c0fb3267cafc7e48968a8ac   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
rendered-worker-86690adc0446e7f7feb68f9b9690632d   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             36m
rendered-worker-d7e635328a14333ed6ad27603fe5b5db   f6c21976e39cf6cb9e2ca71141478d5e612fb53f   3.2.0             56m
```

4. debug to the node and check the file
```
$ cat /etc/machine-config-daemon/orig/var/lib/kubelet/config.json.mcdorig
```

Actual results:

orig file have actual pull secretes which was used in initial cluster provision.

Expected results:

There shouldn't be any file with this info

Additional info:

https://github.com/openshift/machine-config-operator/pull/3759

Bug OCPBUGS-47633: SDN to OVN-K live migration runs MTU migration phase more than once and fails

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44338~~. The following is the description of the original issue:
—
Description of problem:

Under some circumstances, the live migration runs the MTU migration phase, it ends correctly, then while running the second MCO rollout to make the target CNI become in-use, it tries to run the MTU migration phase again. This happens more than once and ultimately causes the live migration to never complete.

Version-Release number of selected component (if applicable):

Tested in-house in 4.16.19

How reproducible:

Always under certain circumstances, sometimes otherwise.

Steps to Reproduce:

This is a way to reproduce it with 100% chance, but it may not be the only way to reproduce:

1. Start with a 4.16 cluster upgraded from 4.14 that has openshift-sdn plugin and a custom machine config pool (that inherits the worker machineconfigs, as required).

2. Start the live migration to OVN-Kubernetes

3. Once the MTU migration phase has completed for the first time, pause the custom machineconfigpool

Actual results:

MTU phase retried again and again.

Expected results:

MTU phase to be never repeated after being run for the first time. If there is some MCP paused, MCO rollout will stay on hold and live migration should stay on hold with it. If no MCP is paused, live migration should complete successfully. But what can never happen anyway is that the MTU phase is tried more than once.

Additional info:

This is a customer issue that can be reproduced as per the instructions. More details about what I have studied about the code behavior will be placed in comments (any required data will be shared privately).

https://github.com/openshift/cluster-network-operator/pull/2609

Task HOSTEDCP-1284: Bump k8s.io/pod-security-admission to v0.28.3

View the Description View the linked PRs

Bump k8s.io/pod-security-admission to v0.28.3

https://github.com/openshift/hypershift/pull/3181

Bug OCPBUGS-22152: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/22

Bug OCPBUGS-25312: [OVN][IPSEC EW]Upgrade from 4.14->4.15 failed for Vsphere

View the Description View the linked PRs

Description of problem:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-upgrade/job/upgrade-pipeline/46064/consoleFull
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-upgrade/job/upgrade-pipeline/46126/console

Version-Release number of selected component (if applicable):

How reproducible:
two upgrades, two failed.

Steps to Reproduce:

Triggered 2 upgrade for template 11_UPI on vSphere 8.0& FIPS ON & OVN IPSEC & Static Network & Bonding & HW19 & Secureboot (IPSEC E-W only)
1. From 4.13.26-x86_64 - > 4.14.0-0.nightly-2023-12-08-072853->4.15.0-0.nightly-2023-12-09-012410

12-11 16:28:56.968 oc get clusteroperators:
12-11 16:28:56.968 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
12-11 16:28:56.968 authentication 4.15.0-0.nightly-2023-12-09-012410 False False True 104m APIServicesAvailable: "oauth.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request...
12-11 16:28:56.968 baremetal 4.15.0-0.nightly-2023-12-09-012410 True False False 5h39m
12-11 16:28:56.968 cloud-controller-manager 4.15.0-0.nightly-2023-12-09-012410 True False False 5h43m
12-11 16:28:56.968 cloud-credential 4.15.0-0.nightly-2023-12-09-012410 True False False 5h45m
12-11 16:28:56.968 cluster-autoscaler 4.15.0-0.nightly-2023-12-09-012410 True False False 5h40m
12-11 16:28:56.968 config-operator 4.15.0-0.nightly-2023-12-09-012410 True False False 5h40m
12-11 16:28:56.968 console 4.15.0-0.nightly-2023-12-09-012410 False False False 107m RouteHealthAvailable: console route is not admitted
12-11 16:28:56.968 control-plane-machine-set 4.15.0-0.nightly-2023-12-09-012410 True False False 5h39m
12-11 16:28:56.968 csi-snapshot-controller 4.15.0-0.nightly-2023-12-09-012410 True False False 5h40m
12-11 16:28:56.968 dns 4.15.0-0.nightly-2023-12-09-012410 True False False 5h39m
12-11 16:28:56.968 etcd 4.15.0-0.nightly-2023-12-09-012410 True False False 5h38m
12-11 16:28:56.968 image-registry 4.15.0-0.nightly-2023-12-09-012410 True False False 109m
12-11 16:28:56.968 ingress 4.15.0-0.nightly-2023-12-09-012410 True False False 108m
12-11 16:28:56.968 insights 4.15.0-0.nightly-2023-12-09-012410 True False False 5h33m
12-11 16:28:56.968 kube-apiserver 4.15.0-0.nightly-2023-12-09-012410 True False False 5h35m
12-11 16:28:56.968 kube-controller-manager 4.15.0-0.nightly-2023-12-09-012410 True False True 5h37m GarbageCollectorDegraded: error querying alerts: Post "[https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query]": dial tcp 172.30.77.136:9091: i/o timeout
12-11 16:28:56.968 kube-scheduler 4.15.0-0.nightly-2023-12-09-012410 True False False 5h37m
12-11 16:28:56.968 kube-storage-version-migrator 4.15.0-0.nightly-2023-12-09-012410 True False False 109m
12-11 16:28:56.968 machine-api 4.15.0-0.nightly-2023-12-09-012410 True False False 5h36m
12-11 16:28:56.968 machine-approver 4.15.0-0.nightly-2023-12-09-012410 True False False 5h39m
12-11 16:28:56.968 machine-config 4.14.0-0.nightly-2023-12-08-072853 True False False 5h39m
12-11 16:28:56.968 marketplace 4.15.0-0.nightly-2023-12-09-012410 True False False 5h39m
12-11 16:28:56.968 monitoring 4.15.0-0.nightly-2023-12-09-012410 False True True 63s UpdatingThanosQuerier: reconciling Thanos Querier Route failed: retrieving Route object failed: the server is currently unable to handle the request (get routes.route.openshift.io thanos-querier), UpdatingAlertmanager: reconciling Alertmanager Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io alertmanager-main), UpdatingUserWorkloadThanosRuler: reconciling Thanos Ruler Route failed: retrieving Route object failed: the server is currently unable to handle the request (get routes.route.openshift.io thanos-ruler), UpdatingPrometheus: reconciling Prometheus API Route failed: retrieving Route object failed: the server is currently unable to handle the request (get routes.route.openshift.io prometheus-k8s), UpdatingUserWorkloadPrometheus: reconciling UserWorkload federate Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io federate)
12-11 16:28:56.968 network 4.15.0-0.nightly-2023-12-09-012410 True False False 5h40m
12-11 16:28:56.968 node-tuning 4.15.0-0.nightly-2023-12-09-012410 True False False 124m
12-11 16:28:56.968 openshift-apiserver 4.15.0-0.nightly-2023-12-09-012410 False False False 97m APIServicesAvailable: "image.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
12-11 16:28:56.968 openshift-controller-manager 4.15.0-0.nightly-2023-12-09-012410 True False False 5h39m
12-11 16:28:56.968 openshift-samples 4.15.0-0.nightly-2023-12-09-012410 True False False 124m
12-11 16:28:56.968 operator-lifecycle-manager 4.15.0-0.nightly-2023-12-09-012410 True False False 5h40m
12-11 16:28:56.968 operator-lifecycle-manager-catalog 4.15.0-0.nightly-2023-12-09-012410 True False False 5h40m
12-11 16:28:56.968 operator-lifecycle-manager-packageserver 4.15.0-0.nightly-2023-12-09-012410 True False False 100m
12-11 16:28:56.968 service-ca 4.15.0-0.nightly-2023-12-09-012410 True False False 5h40m
12-11 16:28:56.968 storage 4.15.0-0.nightly-2023-12-09-012410 True False False 104m


2. From 4.14.5-x86_64 - > 4.15.0-0.nightly-2023-12-11-033133
% oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.15.0-0.nightly-2023-12-11-033133 False False True 3h32m APIServicesAvailable: "user.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request...
baremetal 4.15.0-0.nightly-2023-12-11-033133 True False False 5h45m
cloud-controller-manager 4.15.0-0.nightly-2023-12-11-033133 True False False 5h47m
cloud-credential 4.15.0-0.nightly-2023-12-11-033133 True False False 5h50m
cluster-autoscaler 4.15.0-0.nightly-2023-12-11-033133 True False False 5h45m
config-operator 4.15.0-0.nightly-2023-12-11-033133 True False False 5h46m
console 4.15.0-0.nightly-2023-12-11-033133 False False False 3h30m RouteHealthAvailable: console route is not admitted
control-plane-machine-set 4.15.0-0.nightly-2023-12-11-033133 True False False 5h45m
csi-snapshot-controller 4.15.0-0.nightly-2023-12-11-033133 True False False 5h45m
dns 4.15.0-0.nightly-2023-12-11-033133 True False False 5h45m
etcd 4.15.0-0.nightly-2023-12-11-033133 True False False 5h43m
image-registry 4.15.0-0.nightly-2023-12-11-033133 True False False 3h34m
ingress 4.15.0-0.nightly-2023-12-11-033133 True False False 4h22m
insights 4.15.0-0.nightly-2023-12-11-033133 True False False 5h39m
kube-apiserver 4.15.0-0.nightly-2023-12-11-033133 True False False 5h42m
kube-controller-manager 4.15.0-0.nightly-2023-12-11-033133 True False True 5h42m GarbageCollectorDegraded: error fetching rules: Get "[https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules]": dial tcp 172.30.237.96:9091: i/o timeout
kube-scheduler 4.15.0-0.nightly-2023-12-11-033133 True False False 5h42m
kube-storage-version-migrator 4.15.0-0.nightly-2023-12-11-033133 True False False 3h34m
machine-api 4.15.0-0.nightly-2023-12-11-033133 True False False 5h41m
machine-approver 4.15.0-0.nightly-2023-12-11-033133 True False False 5h45m
machine-config 4.14.5 True False False 5h44m
marketplace 4.15.0-0.nightly-2023-12-11-033133 True False False 5h45m
monitoring 4.15.0-0.nightly-2023-12-11-033133 False True True 4m32s UpdatingAlertmanager: reconciling Alertmanager Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io alertmanager-main), UpdatingUserWorkloadThanosRuler: reconciling Thanos Ruler Route failed: retrieving Route object failed: the server is currently unable to handle the request (get routes.route.openshift.io thanos-ruler), UpdatingThanosQuerier: reconciling Thanos Querier Route failed: retrieving Route object failed: the server is currently unable to handle the request (get routes.route.openshift.io thanos-querier), UpdatingPrometheus: reconciling Prometheus API Route failed: retrieving Route object failed: the server is currently unable to handle the request (get routes.route.openshift.io prometheus-k8s), UpdatingUserWorkloadPrometheus: reconciling UserWorkload federate Route failed: retrieving Route object failed: the server is currently unable to handle the request (get routes.route.openshift.io federate)
network 4.15.0-0.nightly-2023-12-11-033133 True False False 5h44m
node-tuning 4.15.0-0.nightly-2023-12-11-033133 True False False 3h48m
openshift-apiserver 4.15.0-0.nightly-2023-12-11-033133 False False False 11m APIServicesAvailable: "apps.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request...
openshift-controller-manager 4.15.0-0.nightly-2023-12-11-033133 True False False 5h41m
openshift-samples 4.15.0-0.nightly-2023-12-11-033133 True False False 3h49m
operator-lifecycle-manager 4.15.0-0.nightly-2023-12-11-033133 True False False 5h45m
operator-lifecycle-manager-catalog 4.15.0-0.nightly-2023-12-11-033133 True False False 5h45m
operator-lifecycle-manager-packageserver 4.15.0-0.nightly-2023-12-11-033133 True False False 2m57s
service-ca 4.15.0-0.nightly-2023-12-11-033133 True False False 5h46m
storage 4.15.0-0.nightly-2023-12-11-033133 True False False 3h28m

 

% oc get pods -n openshift-ovn-kubernetes
NAME READY STATUS RESTARTS AGE
ovn-ipsec-host-bn5mm 1/1 Running 0 3h17m
ovn-ipsec-host-dlg5c 1/1 Running 0 3h20m
ovn-ipsec-host-dztzf 1/1 Running 0 3h14m
ovn-ipsec-host-tfflr 1/1 Running 0 3h11m
ovn-ipsec-host-wvkwq 1/1 Running 0 3h10m
ovnkube-control-plane-85b45bf6cf-78tbq 2/2 Running 0 3h30m
ovnkube-control-plane-85b45bf6cf-n5pqn 2/2 Running 0 3h33m
ovnkube-node-4rwk4 8/8 Running 8 3h40m
ovnkube-node-567rz 8/8 Running 8 3h34m
ovnkube-node-c7hv4 8/8 Running 8 3h40m
ovnkube-node-qmw49 8/8 Running 8 3h35m
ovnkube-node-s2nsw 8/8 Running 0 3h36m

Multiple pods on different nodes have the connection problems.
% oc get pods -n openshift-network-diagnostics -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
network-check-source-5cd74f77cc-mlqvz 1/1 Running 0 134m 10.131.0.25 huirwang-46126-g66cb-compute-0 <none> <none>
network-check-target-824mt 1/1 Running 1 139m 10.130.0.212 huirwang-46126-g66cb-control-plane-2 <none> <none>
network-check-target-dzl7m 1/1 Running 1 140m 10.128.2.46 huirwang-46126-g66cb-compute-1 <none> <none>
network-check-target-l224m 1/1 Running 1 133m 10.129.0.173 huirwang-46126-g66cb-control-plane-1 <none> <none>
network-check-target-qd48q 1/1 Running 1 138m 10.128.0.148 huirwang-46126-g66cb-control-plane-0 <none> <none>
network-check-target-sc8hn 1/1 Running 0 134m 10.131.0.3 huirwang-46126-g66cb-compute-0 <none> <none>

% oc rsh -n openshift-network-diagnostics network-check-source-5cd74f77cc-mlqvz
sh-5.1$ curl 10.130.0.212:8080 --connect-timeout 5
curl: (28) Connection timed out after 5000 milliseconds
sh-5.1$ curl 10.128.2.46:8080 --connect-timeout 5
curl: (28) Connection timed out after 5001 milliseconds
sh-5.1$ curl 10.129.0.173:8080 --connect-timeout 5
curl: (28) Connection timed out after 5001 milliseconds
sh-5.1$ curl 10.128.0.148:8080 --connect-timeout 5
curl: (28) Connection timed out after 5001 milliseconds
sh-5.1$ curl 10.131.0.3:8080 --connect-timeout 5
Hello, 10.131.0.25. You have reached 10.131.0.3 on huirwang-46126-g66cb-compute-0sh-5.1$

Actual results:
Upgrade failed.

Expected results:
Upgrade succeeded.

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

    Is it an
    # internal CI failure
    # customer issue / SD
    # internal RedHat testing failure

If it is an internal RedHat testing failure:
* Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/252060/artifact/workdir/install-dir/auth/kubeconfig/*view*/

If it is a CI failure:

    * Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
    * Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
    * Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
    * When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
    * If it's a connectivity issue,
    * What is the srcNode, srcIP and srcNamespace and srcPodName?
    * What is the dstNode, dstIP and dstNamespace and dstPodName?
    * What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

    * Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
    * Don’t presume that Engineering has access to Salesforce.
    * Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: [https://access.redhat.com/support/cases/#/case/]<case number>/discussion?attachmentId=<attachment id>
    * Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
    * Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
    ** If the issue is in a customer namespace then provide a namespace inspect.
    ** If it is a connectivity issue:
    *** What is the srcNode, srcNamespace, srcPodName and srcPodIP?
    *** What is the dstNode, dstNamespace, dstPodName and dstPodIP?
    *** What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
    *** Please provide the UTC timestamp networking outage window from must-gather
    *** Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
    ** If it is not a connectivity issue:
    *** Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

    * For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
    * For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
    * Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2181

Bug OCPBUGS-29661: Bump to kubernetes 1.28.7

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.7:

Changelog:
v1.28.7: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1286

https://github.com/openshift/kubernetes/pull/1891

Bug OCPBUGS-19411: cluster-autoscaler-operator clusterrole needs watch on clusteroperators

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.oc -n openshift-machine-api get role/cluster-autoscaler-operator -o yaml
2. Observe missing watch verb
3. Tail cluster-autoscaler logs to see error

status.go:444] No ClusterAutoscaler. Reporting available.
I0919 16:40:52.877216       1 status.go:244] Operator status available: at version 4.14.0-rc.1
E0919 16:40:53.719592       1 reflector.go:148] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.ClusterOperator: unknown (get clusteroperators.config.openshift.io)

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-autoscaler-operator/pull/287

Bug OCPBUGS-20024: Excessive TopologyAwareHintsDisabled events due to service/dns-default with topology aware hints activated.

View the Description View the linked PRs

Kube 1.26 introduced the warning level TopologyAwareHintsDisabled event. TopologyAwareHintsDisabled is fired by the EndpointSliceController whenever reconciling a service that has activated topology aware hints via the service.kubernetes.io/topology-aware-hints annotation, but there is not enough information in the existing cluster resources (typically nodes) to apply the topology aware hints.

When re-basing OpnShift onto Kube 1.26, are CI builds are failing (except on AWS), because these events are firing "pathologically", for example:

: [sig-arch] events should not repeat pathologically
events happened too frequently event happened 83 times, something is wrong: ns/openshift-dns service/dns-default - reason/TopologyAwareHintsDisabled Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4 result=reject

AWS nodes seem to have the proper values in the nodes. GCP has the values also, but they are not "right" for the purposes of the EndpointSliceController:

event happened 38 times, something is wrong: ns/openshift-dns service/dns-default - reason/TopologyAwareHintsDisabled Unable to allocate minimum required endpoints to each zone without exceeding overload threshold (5 endpoints, 3 zones), addressType: IPv4 result=reject }

https://github.com/openshift/origin/pull/27666 will mask this problem (make it stop erroring in CI) but changes still need to be made in the product so end users are not subjected to these events.

Now links to:

[sig-arch] events should not repeat pathologically for ns/openshift-dns

Bug OCPBUGS-20528: the manifest type *ocischema.DeserializedImageIndex is not supported

View the Description View the linked PRs

Description of problem:

It's blocking the Prow CI test: https://github.com/openshift/release/pull/42822#issuecomment-1760704535

[cloud-user@preserve-olm-env2 jian]$ oc image extract registry.ci.openshift.org/ocp/4.15:cli  --path /usr/bin/oc:. --confirm
[cloud-user@preserve-olm-env2 jian]$ sudo chmod 777 oc
[cloud-user@preserve-olm-env2 jian]$ 
[cloud-user@preserve-olm-env2 jian]$ ./oc version 
Client Version: v4.2.0-alpha.0-2030-g0307852
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
[cloud-user@preserve-olm-env2 jian]$ ./oc image mirror --insecure=true --skip-missing=true --skip-verification=true --keep-manifest-list=true --filter-by-os='.*' quay.io/openshifttest/ociimage:multiarch localhost:5000/olmqe/ociimage3:multiarch
localhost:5000/
  olmqe/ociimage3
    error: the manifest type *ocischema.DeserializedImageIndex is not supported
    manifests:
      sha256:d58e3e003ddec723dd14f72164beaa609d24c5e5e366579e23bc8b34b9a58324 -> multiarch
  stats: shared=0 unique=0 size=0B

error: the manifest type *ocischema.DeserializedImageIndex is not supported
error: an error occurred during planning

Version-Release number of selected component (if applicable):

The master branch of https://github.com/openshift/oc : https://github.com/openshift/oc/commit/03078525c97d612c2070081d0e9f322f946360f4

[cloud-user@preserve-olm-env2 jian]$ podman inspect  registry.ci.openshift.org/ocp/4.15:cli 
[
     {
          "Id": "feac27a180964dff0a0ff0a9fcdb593fcf87a7d80177e6c79ab804fb8477f55b",
          "Digest": "sha256:8fcc83d3c72c66867c38456a217298239d99626d96012dbece5c669e3ad5952c",
          "RepoTags": [
               "registry.ci.openshift.org/ocp/4.15:cli"
          ],
          "RepoDigests": [
               "registry.ci.openshift.org/ocp/4.15@sha256:8fcc83d3c72c66867c38456a217298239d99626d96012dbece5c669e3ad5952c",
               "registry.ci.openshift.org/ocp/4.15@sha256:cf4f54e2f20af19afe3c5c0685aa95ab3296d177204b01a3d8bfddf7c3d45f49"
          ],
...
                    "summary": "Provides the latest release of the Red Hat Extended Life Base Image.",
                    "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-base/images/v4.15.0-202310111407.p0.g16dbf5e.assembly.stream",
                    "vcs-ref": "03078525c97d612c2070081d0e9f322f946360f4",
                    "vcs-type": "git",
                    "vcs-url": "https://github.com/openshift/oc",
                    "vendor": "Red Hat, Inc.",
                    "version": "v4.15.0"
...
                    "created": "2023-10-12T23:06:08.279786979Z",
                    "created_by": "/bin/sh -c #(nop) LABEL \"io.openshift.build.name\"=\"cli-amd64\" \"io.openshift.build.namespace\"=\"ci-op-37527gwf\" \"io.openshift.build.commit.author\"=\"\" \"io.openshift.build.commit.date\"=\"\" \"io.openshift.build.commit.id\"=\"03078525c97d612c2070081d0e9f322f946360f4\" \"io.openshift.build.commit.message\"=\"\" \"io.openshift.build.commit.ref\"=\"master\" \"io.openshift.build.name\"=\"\" \"io.openshift.build.namespace\"=\"\" \"io.openshift.build.source-context-dir\"=\"\" \"io.openshift.build.source-location\"=\"https://github.com/openshift/oc\" \"io.openshift.ci.from.base\"=\"sha256:d7a2588527405101eeb1578a0e97e465ec83b0b927b71cf689703554e81cb585\" \"vcs-ref\"=\"03078525c97d612c2070081d0e9f322f946360f4\" \"vcs-type\"=\"git\" \"vcs-url\"=\"https://github.com/openshift/oc\"",

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

4.15.0-0.nightly-2023-10-09-101435(the `oc` commits 1bbfec243e5910a5a86df985489700c3d3137aed) works well.

[cloud-user@preserve-olm-env2 client]$ ./oc version 
Client Version: 4.15.0-0.nightly-2023-10-09-101435
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

[cloud-user@preserve-olm-env2 client]$ ./oc image mirror --insecure=true --skip-missing=true --skip-verification=true --keep-manifest-list=true --filter-by-os='.*' quay.io/openshifttest/ociimage:multiarch localhost:5000/olmqe/ociimage2:multiarch2
localhost:5000/
  olmqe/ociimage2
...
sha256:d58e3e003ddec723dd14f72164beaa609d24c5e5e366579e23bc8b34b9a58324 localhost:5000/olmqe/ociimage2:multiarch2
info: Mirroring completed in 2.47s (72.87MB/s)

[cloud-user@preserve-olm-env2 oc]$ oc adm release info registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2023-10-09-101435 --commits |grep oc 
Pull From: registry.ci.openshift.org/ocp/release@sha256:b5d1f88597d49d0e34ed4acfe3149817d02774d4c0661cbcb0c04896d1a852c6
...
  tools                                          https://github.com/openshift/oc                                             1bbfec243e5910a5a86df985489700c3d3137aed

https://github.com/openshift/oc/pull/1575

Bug OCPBUGS-31500: [release-4.15] Egress IP multi NIC: ipv6 does not work

View the linked PRs

https://github.com/openshift/ovn-kubernetes/pull/2103

Bug OCPBUGS-31865: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-21736: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-operator/pull/313

Bug OCPBUGS-28271: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/219

Bug OCPBUGS-37849: Cluster API should sort CredentialsRequest manifests after namespace

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37441~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-36296~~. The following is the description of the original issue:
—

Description of problem

Currently the manifests directory has:

0000_30_cluster-api_00_credentials-request.yaml
0000_30_cluster-api_00_namespace.yaml
...

CredentialsRequests go into the openshift-cloud-credential-operator namespace, so they can come before or after the openshift-cluster-api namespace. But because they ask for Secrets in the openshift-cluster-api namespace, there would be less race and drama if the CredentialsRequest manifests were given a name that sorted them after the namespace. Like 0000_30_cluster-api_01_credentials-request.yaml.

Version-Release number of selected component

I haven't gone digging in history, it may have been like this since forever.

How reproducible

Every time.

Steps to Reproduce

With a release image pullspec like registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-27-184535:

$ oc adm release extract --to manifests registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-27-184535
$ ls manifests/0000_30_cluster-api_* | grep 'namespace\|credentials-request'

Actual results

$ ls manifests/0000_30_cluster-api_* | grep 'namespace\|credentials-request'
manifests/0000_30_cluster-api_00_credentials-request.yaml
manifests/0000_30_cluster-api_00_namespace.yaml

Expected results

$ ls manifests/0000_30_cluster-api_* | grep 'namespace\|credentials-request'
manifests/0000_30_cluster-api_00_namespace.yaml
manifests/0000_30_cluster-api_01_credentials-request.yaml

https://github.com/openshift/cluster-capi-operator/pull/193

Bug OCPBUGS-21829: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-21853: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-22691: error when adding sriov pods to multus-cni-network

View the Description View the linked PRs

Description of problem:

When trying to create sriov pods, pods are stuck in state ContainerCreating.

pod definition:

apiVersion: v1                                                                                                                                                                                          
kind: Pod                                                                                                                                                                                               
metadata:                                                                                                                                                                                               
  name: test-sriov-pod                                                                                                                                                                                  
  namespace: default                                                                                                                                                                                    
  annotations:                                                                                                                                                                                          
    v1.multus-cni.io/default-network: default/ftnetattach                                                                                                                                               
  labels:                                                                                                                                                                                               
    pod-name: ft-iperf-server-pod-v4                                                                                                                                                                    
spec:                                                                                                                                                                                                   
  containers:
  - name: ft-iperf-server-pod-v4
    image: quay.io/wizhao/ft-base-image:0.8-x86_64

net-attach-def:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    k8s.v1.cni.cncf.io/resourceName: openshift.io/mlxnics
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"k8s.cni.cncf.io/v1","kind":"NetworkAttachmentDefinition","metadata":{"annotations":{"k8s.v1.cni.cncf.io/resourceName":"openshift.io/mlxnics"},"name":"ftnetattach","namespace":"default"},"spec":{"config":"{\"cniVersion\":\"0.3.1\",\"name\":\"ftnetattach\",\"type\":\"ovn-k8s-cni-overlay\",\"logFile\":\"/var/log/ovn-kubernetes/flowtest.log\",\"logLevel\":\"4\",\"ipam\":{},\"dns\":{}}"}}
  creationTimestamp: "2023-10-27T20:59:38Z"
  generation: 1
  name: ftnetattach
  namespace: default
  resourceVersion: "241792"
  uid: c394f8bc-20bc-4d0f-b5ce-9f5baad7c3de
spec:
  config: '{"cniVersion":"0.3.1","name":"ftnetattach","type":"ovn-k8s-cni-overlay","logFile":"/var/log/ovn-kubernetes/flowtest.log","logLevel":"4","ipam":{},"dns":{}}'

From a bisect of when this error started occurring, it appears this error was triggered with this change: https://github.com/ovn-org/ovn-kubernetes/pull/3958

Version-Release number of selected component (if applicable):

How reproducible:

Everytime

Steps to Reproduce:

1. Deploy sriov network operator 
2. Apply ovn-k8s-cni-overlay net-attach-def
3. Create pod

Actual results:

[]# oc get pod test-sriov-pod                                                                                                                        
NAME             READY   STATUS              RESTARTS   AGE
test-sriov-pod   0/1     ContainerCreating   0          2d18h

[] oc describe pod test-sriov-pod
<....>

  Warning  FailedCreatePodSandBox  36s (x18366 over 2d18h)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox
k8s_test-sriov-pod_default_12194f6e-96ea-4255-be89-a05c57e7d85b_0(cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb): error adding pod default_test-sriov-pod to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb Netns:/var/run/netns/58ad326c-68fe-487a-b449-ff1e0d9bbb64 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=test-sriov-pod;K8S_POD_INFRA_CONTAINER_ID=cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb;K8S_POD_UID=12194f6e-96ea-4255-be89-a05c57e7d85b Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110
34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76
101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb" Netns:"/var/run/netns/58ad326c-68fe-487a-b449-ff1e0d9bbb64" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=test-sriov-pod;K8S_POD_INFRA_CONTAINER_ID=cfd3586aa90898cb4197f9c659b80f9e50989fc847e7722a529d137d450a9feb;K8S_POD_UID=12194f6e-96ea-4255-be89-a05c57e7d85b" Path:"" ERRORED: error configuring pod [default/test-sriov-pod] networking: [default/test-sriov-pod/12194f6e-96ea-4255-be89-a05c57e7d85b:ftnetattach]: error adding container to network "ftnetattach": failed to send CNI request: Post "http://dummy/": EOF

Expected results:

pod is created and allocated device

Additional info:

Red Hat CoreOS: 414.92.202310270216-0
Cluster version: 4.14.0-0.nightly-multi-2023-10-27-070855

https://github.com/openshift/ovn-kubernetes/pull/1952

Task HOSTEDCP-1319: Fix Dependabot

View the Description View the linked PRs

Dependabot is not updating dependencies. Investigate & fix.

https://github.com/openshift/hypershift/pull/3246

Bug OCPBUGS-13597: Failed to create STS resources in China regions using ccoctl

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/596

Bug OCPBUGS-20161: HostedCluster with ControlPlaneEndpoint: 443 also exposes on 6443

View the Description View the linked PRs

Description of problem:

HostedClusters with a .status.controlPlaneEndpoint.port: 443 unexepectedly also expose the KAS on port 6443. This causes four security group rules to be consumed per LoadBalancer service (443/6443 for router and 443/6443 for private-router) instead of just two (443 for router and 443 for private-router). This directly impacts the number of HostedClusters on a Management Cluster since there is a hard cap of 200 security group rules per security group.

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Create a HostedCluster resulting in its .status.controlPlaneEndpoint.port: 443
2. Observe that the router/private-router LoadBalancer services expose both ports 6443 and 443

Actual results:

The router/private-router LoadBalancer services expose both ports 6443 and 443

Expected results:

The router/private-router LoadBalancer services exposes only port 443

Additional info:

https://github.com/openshift/hypershift/pull/3149

Bug OCPBUGS-21738: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/95

Bug OCPBUGS-27252: The source of idms should not be localhost:55000/openshift

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26513~~. The following is the description of the original issue:
—
Description of problem:

oc-mirror with v2 will create the idms file as output , but the source is like :
apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  creationTimestamp: null
  name: idms-2024-01-08t04-19-04z
spec:
  imageDigestMirrors:
  - mirrors:
    - ec2-3-144-29-184.us-east-2.compute.amazonaws.com:5000/ocp2/openshift
    source: localhost:55000/openshift
  - mirrors:
    - ec2-3-144-29-184.us-east-2.compute.amazonaws.com:5000/ocp2/openshift-release-dev
    source: quay.io/openshift-release-dev
status: {}

The source should always be the origin registry like :quay.io/openshift-release-dev

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

   1. run the command with v2 :
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
mirror:
  platform:
    channels:
      - name: stable-4.14
        minVersion: 4.14.3
        maxVersion: 4.14.3
    graph: true

`oc-mirror --config config.yaml file://out --v2` 
`oc-mirror --config config.yaml --from file://out  --v2 docker://xxxx:5000/ocp2`    
2. check the idms file

Actual results:

    2. cat idms-2024-01-08t04-19-04z.yaml 
apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  creationTimestamp: null
  name: idms-2024-01-08t04-19-04z
spec:
  imageDigestMirrors:
  - mirrors:
    - xxxx.com:5000/ocp2/openshift
    source: localhost:55000/openshift
  - mirrors:
    - xxxx.com:5000/ocp2/openshift-release-dev
    source: quay.io/openshift-release-dev

Expected results:

The source should not be localhost:55000, should be like the origin registry.

Additional info:

https://github.com/openshift/oc-mirror/pull/780

Bug OCPBUGS-32515: ovn-ipsec-host pod fails to configure cert on nss db

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32402~~. The following is the description of the original issue:
—
Description of problem:

It is noticed that ovs-monitor-ipsec fails to import cert into nss db with following error.

2024-04-17T19:57:21.140989157Z 2024-04-17T19:57:21Z |  6  | reconnect | INFO | unix:/var/run/openvswitch/db.sock: connecting...
2024-04-17T19:57:21.142234972Z 2024-04-17T19:57:21Z |  9  | reconnect | INFO | unix:/var/run/openvswitch/db.sock: connected
2024-04-17T19:57:21.170709468Z 2024-04-17T19:57:21Z |  14 | ovs-monitor-ipsec | INFO | Tunnel ovn-69b991-0 appeared in OVSDB
2024-04-17T19:57:21.171379359Z 2024-04-17T19:57:21Z |  16 | ovs-monitor-ipsec | INFO | Tunnel ovn-52bc87-0 appeared in OVSDB
2024-04-17T19:57:21.171826906Z 2024-04-17T19:57:21Z |  18 | ovs-monitor-ipsec | INFO | Tunnel ovn-3e78bb-0 appeared in OVSDB
2024-04-17T19:57:21.172300675Z 2024-04-17T19:57:21Z |  20 | ovs-monitor-ipsec | INFO | Tunnel ovn-12fb32-0 appeared in OVSDB
2024-04-17T19:57:21.172726970Z 2024-04-17T19:57:21Z |  22 | ovs-monitor-ipsec | INFO | Tunnel ovn-8a4d01-0 appeared in OVSDB
2024-04-17T19:57:21.178644919Z 2024-04-17T19:57:21Z |  24 | ovs-monitor-ipsec | ERR | Import cert and key failed.
2024-04-17T19:57:21.178644919Z b"No cert in -in file '/etc/openvswitch/keys/ipsec-cert.pem' matches private key\n80FBF36CDE7F0000:error:05800074:x509 certificate routines:X509_check_private_key:key values mismatch:crypto/x509/x509_cmp.c:405:\n"
2024-04-17T19:57:21.179581526Z 2024-04-17T19:57:21Z |  25 | ovs-monitor-ipsec | ERR | traceback
2024-04-17T19:57:21.179581526Z Traceback (most recent call last):
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 1382, in <module>
2024-04-17T19:57:21.179581526Z     main()
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 1369, in main
2024-04-17T19:57:21.179581526Z     monitor.run()
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 1176, in run
2024-04-17T19:57:21.179581526Z     if self.ike_helper.config_global(self):
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 521, in config_global
2024-04-17T19:57:21.179581526Z     self._nss_import_cert_and_key(cert, key, name)
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 809, in _nss_import_cert_and_key
2024-04-17T19:57:21.179581526Z     os.remove(path)
2024-04-17T19:57:21.179581526Z FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ovs_certkey_ef9cf1a5-bfb2-4876-8fb3-69c6b22561a2.p12'

Version-Release number of selected component (if applicable):

 4.16.0

How reproducible:

Hit on the CI: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/50690/rehearse-50690-pull-ci-openshift-cluster-network-operator-master-e2e-ovn-ipsec-step-registry/1780660589492703232

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

openshift-install failed with error:

time="2024-04-17T19:34:47Z" level=error msg="Cluster initialization failed because one or more operators are not functioning properly.\nThe cluster should be accessible for troubleshooting as detailed in the documentation linked below,\nhttps://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html\nThe 'wait-for install-complete' subcommand can then be used to continue the installation"
time="2024-04-17T19:34:47Z" level=error msg="failed to initialize the cluster: Multiple errors are preventing progress:\n* Cluster operator authentication is degraded\n* Cluster operators monitoring, openshift-apiserver are not available"

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/50690/rehearse-50690-pull-ci-openshift-cluster-network-operator-master-e2e-ovn-ipsec-step-registry/1780660589492703232/artifacts/e2e-ovn-ipsec-step-registry/ipi-install-install/artifacts/.openshift_install-1713382487.log

Expected results:

Cluster must come up COs running with IPsec enabled for EW traffic.

Additional info:

It seems like ovn-ipsec-host pod's ovn-keys init container write empty content into /etc/openvswitch/keys/ipsec-cert.pem though corresponding csr request containing certificate in its status.

https://github.com/openshift/cluster-network-operator/pull/2348

Bug OCPBUGS-19255: Update 4.15 ose-vsphere-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-vsphere/pull/48

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-vsphere/pull/48

Bug OCPBUGS-24086: Update 4.15 ose-cluster-openshift-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/319

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-26758: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/162

Bug OCPBUGS-43414: Console crashes when ssh is selected in add secret for starting a pipeline run

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42369~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42060~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41228. The following is the description of the original issue:
—
Description of problem:

The console crashes when the user selects SSH as the Authentication type for the git server under add secret in the start pipeline form

Version-Release number of selected component (if applicable):

How reproducible:

Everytime. Only in developer perspective and if the Pipelines dynamic plugin is enabled.

Steps to Reproduce:

    1. Create a pipeline through add flow and open start pipeline page 
    2. Under show credentials select add secret
    3. In the secret form select `Access to ` as Git server and `Authentication type` as SSH key

Actual results:

Console crashes

Expected results:

UI should work as expected

Additional info:

Attaching console log screenshot

https://drive.google.com/file/d/1bGndbq_WLQ-4XxG5ylU7VuZWZU15ywTI/view?usp=sharing

https://github.com/openshift/console/pull/14401

Bug OCPBUGS-18103: Panic detected in pod on 4.14 PowerVS CI runs

View the Description View the linked PRs

Description:

Now that the huge e2e test case failures in CI jobs is resolved in the recent jobs observed a Undiagnosed panic detected in pod issue.

JobLink

Error:

{ pods/openshift-image-registry_cluster-image-registry-operator-7f7bd7c9b4-k8fmh_cluster-image-registry-operator_previous.log.gz:E0825 02:44:06.686400 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) pods/openshift-image-registry_cluster-image-registry-operator-7f7bd7c9b4-k8fmh_cluster-image-registry-operator_previous.log.gz:E0825 02:44:06.686630 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)}

Some Observations:
1)While starting ImageConfigController it Failed to watch *v1.Route: as the server could not find the requested resource",

2)which eventually lead sync problem "E0825 01:26:52.428694 1 clusteroperator.go:104] unable to sync ClusterOperatorStatusController: config.imageregistry.operator.openshift.io "cluster" not found, requeuing"

3)and then while creating deployment resource for "cluster-image-registry-operator" it caused a panic error: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference):"

https://github.com/openshift/cluster-image-registry-operator/pull/909

Bug OCPBUGS-19458: console PR 13114 makes many functions under "Observe > Metrics" unavailable

View the Description View the linked PRs

Description of problem:

tested https://github.com/openshift/console/pull/13114 with cluster-bot

launch 4.15,openshift/console#13114 gcp

the below functions are unavailable, see recording: https://drive.google.com/file/d/1yBS_xGWgJwfIoOdLdIjZ6riSL_cOARrb/view?usp=sharing

1. time interval drop-down
2. Actions drop-down:
Add query
Collapse all query tables

3. Add query button
4. kebab menu:
Disable query
Delete query
Duplicate query

5. disable/enable query toggle button

NOTE: also checked on 4.15.0-0.nightly-arm64-2023-09-19-235618, no such issues

Version-Release number of selected component (if applicable):

test https://github.com/openshift/console/pull/13114 with cluster-bot

How reproducible:

always

Steps to Reproduce:

1. regression testing for console PR 13114
2.
3.

Actual results:

console PR 13114 makes many functions under "Observe > Metrics" unavailable

Expected results:

no issue

https://github.com/openshift/console/pull/13114

Bug OCPBUGS-24165: Update 4.15 ose-apiserver-network-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/apiserver-network-proxy/pull/44

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/apiserver-network-proxy/pull/44

Bug OCPBUGS-36466: [Backport 4.15] OCP upgrade from 4.13 to 4.14 triggers the error "failed to update canary route openshift-ingress-canary/canary"

View the Description View the linked PRs

Description of problem:

In the OCP upgrades from 4.13 to 4.14, the canary route configuration is changed as below:

Canary route configuration in OCP 4.13
$ oc get route -n openshift-ingress-canary canary -oyaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
labels:
ingress.openshift.io/canary: canary_controller
name: canary
namespace: openshift-ingress-canary
spec:
host: canary-openshift-ingress-canary.apps.<cluster-domain>.com <---- canary route configured with .spec.host
Canary route configuration in OCP 4.14:
$ oc get route -n openshift-ingress-canary canary -oyaml
apiVersion: route.openshift.io/v1
kind: Route
labels:
ingress.openshift.io/canary: canary_controller
name: canary
namespace: openshift-ingress-canary
spec:
port:
targetPort: 8080
subdomain: canary-openshift-ingress-canary <---- canary route configured with .spec.subdomain

After the upgrade, the following messages are printed in the ingress-operator pod:

2024-04-24T13:16:34.637Z        ERROR   operator.init   controller/controller.go:265    Reconciler error        {"controller": "canary_controller", "object": {"name":"default","namespace":"openshift-ingress-operator"}, "namespace": "openshift-ingress-operator", "name": "default", "reconcileID": "46290893-d755-4735-bb01-e8b707be4053", "error": "failed to ensure canary route: failed to update canary route openshift-ingress-canary/canary: Route.route.openshift.io \"canary\" is invalid: spec.subdomain: Invalid value: \"canary-openshift-ingress-canary\": field is immutable"}

The issue is resolved when the canary route is deleted.

See below the audit logs from the process:

# The route can't be updated with error 422: 

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"4e8bfb36-21cc-422b-9391-ef8ff42970ca","stage":"ResponseComplete","requestURI":"/apis/route.openshift.io/v1/namespaces/openshift-ingress-canary/routes/canary","verb":"update","user":{"username":"system:serviceaccount:openshift-ingress-operator:ingress-operator","groups":["system:serviceaccounts","system:serviceaccounts:openshift-ingress-operator","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["ingress-operator-746cd8598-hq2st"],"authentication.kubernetes.io/pod-uid":["f3ebccdf-f3b3-420d-8ea5-e33d98945403"]}},"sourceIPs":["10.128.0.93","10.128.0.2"],"userAgent":"Go-http-client/2.0","objectRef":{"resource":"routes","namespace":"openshift-ingress-canary","name":"canary","uid":"3e179946-d4e3-45ad-9380-c305baefd14e","apiGroup":"route.openshift.io","apiVersion":"v1","resourceVersion":"297888"},"responseStatus":{"metadata":{},"status":"Failure","message":"Route.route.openshift.io \"canary\" is invalid: spec.subdomain: Invalid value: \"canary-openshift-ingress-canary\": field is immutable","reason":"Invalid","details":{"name":"canary","group":"route.openshift.io","kind":"Route","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"canary-openshift-ingress-canary\": field is immutable","field":"spec.subdomain"}]},"code":422},"requestReceivedTimestamp":"2024-04-24T13:16:34.630249Z","stageTimestamp":"2024-04-24T13:16:34.636869Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"openshift-ingress-operator\" of ClusterRole \"openshift-ingress-operator\" to ServiceAccount \"ingress-operator/openshift-ingress-operator\""}}

# Route is deleted manually

"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"70821b58-dabc-4593-ba6d-5e81e5d27d21","stage":"ResponseComplete","requestURI":"/aps/route.openshift.io/v1/namespaces/openshift-ingress-canary/routes/canary","verb":"delete","user":{"username":"system:admin","groups":["system:masters","syste:authenticated"]},"sourceIPs":["10.0.91.78","10.128.0.2"],"userAgent":"oc/4.13.0 (linux/amd64) kubernetes/7780c37","objectRef":{"resource":"routes","namespace:"openshift-ingress-canary","name":"canary","apiGroup":"route.openshift.io","apiVersion":"v1"},"responseStatus":{"metadata":{},"status":"Success","details":{"ame":"canary","group":"route.openshift.io","kind":"routes","uid":"3e179946-d4e3-45ad-9380-c305baefd14e"},"code":200},"requestReceivedTimestamp":"2024-04-24T1324:39.558620Z","stageTimestamp":"2024-04-24T13:24:39.561267Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}}

# Route is created again

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"92e6132a-aa1d-482d-a1dc-9ce021ae4c37","stage":"ResponseComplete","requestURI":"/aps/route.openshift.io/v1/namespaces/openshift-ingress-canary/routes","verb":"create","user":{"username":"system:serviceaccount:openshift-ingress-operator:ingres-operator","groups":["system:serviceaccounts","system:serviceaccounts:openshift-ingress-operator","system:authenticated"],"extra":{"authentication.kubernetesio/pod-name":["ingress-operator-746cd8598-hq2st"],"authentication.kubernetes.io/pod-uid":["f3ebccdf-f3b3-420d-8ea5-e33d98945403"]}},"sourceIPs":["10.128.0.93""10.128.0.2"],"userAgent":"Go-http-client/2.0","objectRef":{"resource":"routes","namespace":"openshift-ingress-canary","name":"canary","apiGroup":"route.opensift.io","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":201},"requestReceivedTimestamp":"2024-04-24T13:24:39.577255Z","stageTimestamp":"2024-04-24T1:24:39.584371Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"openshift-ingress-perator\" of ClusterRole \"openshift-ingress-operator\" to ServiceAccount \"ingress-operator/openshift-ingress-operator\""}}

Version-Release number of selected component (if applicable):

    Ocp upgrade between 4.13 and 4.14

How reproducible:

    Upgrade the cluster from OCP 4.13 to 4.14 and check the ingress operator pod logs

Steps to Reproduce:

    1. Install cluster in OCP 4.13
    2. Upgrade to OCP 4.14
    3. Check the ingress operator logs

Actual results:

    Reported errors above

Expected results:

    The ingress canary route should be update without isssues

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/1100

Bug OCPBUGS-25322: did not find "trackTimestampsStaleness: true" setting for kubelet/kubelet-minimal servicemonitor

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25025~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2201

Bug OCPBUGS-29881: Need option to disable dedicated request serving isolation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29855~~. The following is the description of the original issue:
—
Description of problem:

    To operate HyperShift at high scale, we need an option to disable dedicated request serving isolation, if not used.

Version-Release number of selected component (if applicable):

    4.16, 4.15, 4.14, 4.13

How reproducible:

    100%

Steps to Reproduce:

    1. Install hypershift operator for versions 4.16, 4.15, 4.14, or 4.13
    2. Observe start-up logs
    3. Dedicated request serving isolation controllers are started

Actual results:

    Dedicated request serving isolation controllers are started

Expected results:

    Dedicated request serving isolation controllers to not start, if unneeded

Additional info:

https://github.com/openshift/hypershift/pull/3633

Bug OCPBUGS-44979: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14534

Bug OCPBUGS-15844: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7540

Bug OCPBUGS-21754: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-state-metrics/pull/103

Bug OCPBUGS-21791: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/295

Bug OCPBUGS-21803: Ingress stuck in progressing when maxConnections increased to 2000000

View the Description View the linked PRs

Description of problem:

The test case https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-50926 was created for NE-577 epic. When we increase the 'spec.tuningOptions.maxConnections' to 200000, the default ingress controller stuck in progressing.

Version-Release number of selected component (if applicable):

How reproducible:

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-50926

Steps to Reproduce:

1.Edit the defualt controller with max value 2000000oc -n openshift-ingress-operator edit ingresscontroller defaulttuningOptions:
    maxConnections: 2000000
2.melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller default -o yaml | grep  -A1 tuningOptions
  tuningOptions:
    maxConnections: 2000000
3. melvinjoseph@mjoseph-mac openshift-tests-private % oc get co/ingress 
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress   4.15.0-0.nightly-2023-10-16-231617   True        True          False      3h42m   ingresscontroller "default" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination......

Actual results:

The default ingress controller stuck in progressing

Expected results:

The ingress controller should work as normal

Additional info:

melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress get po
NAME                              READY   STATUS        RESTARTS   AGE
router-default-7cf67f448-gb7mr    0/1     Running       0          38s
router-default-7cf67f448-qmvks    0/1     Running       0          38s
router-default-7dcd556587-kvk8d   0/1     Terminating   0          3h53m
router-default-7dcd556587-vppk4   1/1     Running       0          3h53m
melvinjoseph@mjoseph-mac openshift-tests-private % 

melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress get po
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7cf67f448-gb7mr    0/1     Running   0          111s
router-default-7cf67f448-qmvks    0/1     Running   0          111s
router-default-7dcd556587-vppk4   1/1     Running   0          3h55m

melvinjoseph@mjoseph-mac openshift-tests-private % oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h28m   
baremetal                                  4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
cloud-controller-manager                   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h58m   
cloud-credential                           4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h59m   
cluster-autoscaler                         4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
config-operator                            4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
console                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h34m   
control-plane-machine-set                  4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
csi-snapshot-controller                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
dns                                        4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
etcd                                       4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h47m   
image-registry                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      176m    
ingress                                    4.15.0-0.nightly-2023-10-16-231617   True        True          False      3h39m   ingresscontroller "default" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination......
insights                                   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h49m   
kube-apiserver                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h45m   
kube-controller-manager                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h46m   
kube-scheduler                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h46m   
kube-storage-version-migrator              4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
machine-api                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h45m   
machine-approver                           4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
machine-config                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h53m   
marketplace                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
monitoring                                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h35m   
network                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h57m   
node-tuning                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
openshift-apiserver                        4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
openshift-controller-manager               4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
openshift-samples                          4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
operator-lifecycle-manager                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
operator-lifecycle-manager-catalog         4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
operator-lifecycle-manager-packageserver   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
service-ca                                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
storage                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h36m   
melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get po
NAME                               READY   STATUS    RESTARTS        AGE
ingress-operator-c6fd989fd-jsrzv   2/2     Running   4 (3h45m ago)   3h58m
melvinjoseph@mjoseph-mac openshift-tests-private % 


melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator logs ingress-operator-c6fd989fd-jsrzv -c ingress-operator --tail=20
2023-10-17T11:34:54.327Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:34:54.348Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:34:54.348Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:34:54.394Z    INFO    operator.ingressclass_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.394Z    INFO    operator.route_metrics_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.394Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.397Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.429Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.446Z    INFO    operator.certificate_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.553Z    INFO    operator.ingressclass_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.553Z    INFO    operator.route_metrics_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.553Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.557Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "59m59.9999758s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
2023-10-17T11:34:54.558Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.583Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.657Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "59m59.345629987s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
2023-10-17T11:34:54.794Z    INFO    operator.certificate_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:36:11.151Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:36:11.151Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:36:11.248Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "58m42.755479533s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
melvinjoseph@mjoseph-mac openshift-tests-private % 

 
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po -n openshift-ingress
NAME                              READY   STATUS    RESTARTS      AGE
router-default-7cf67f448-gb7mr    0/1     Running   1 (71s ago)   3m57s
router-default-7cf67f448-qmvks    0/1     Running   1 (70s ago)   3m57s
router-default-7dcd556587-vppk4   1/1     Running   0             3h57m

melvinjoseph@mjoseph-mac openshift-tests-private %   oc -n openshift-ingress logs router-default-7cf67f448-gb7mr --tail=20 
I1017 11:39:22.623928       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:23.623924       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:24.623373       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:25.627359       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:26.623337       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:27.623603       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:28.623866       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:29.623183       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:30.623475       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:31.623949       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
melvinjoseph@mjoseph-mac openshift-tests-private % 
melvinjoseph@mjoseph-mac openshift-tests-private % 
melvinjoseph@mjoseph-mac openshift-tests-private % 
melvinjoseph@mjoseph-mac openshift-tests-private %   oc -n openshift-ingress logs router-default-7cf67f448-qmvks --tail=20
I1017 11:39:34.553475       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:35.551412       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:36.551421       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
E1017 11:39:37.052068       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1017 11:39:37.551648       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:38.551632       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:39.551410       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:40.552620       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:41.552050       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:42.551076       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:42.564293       1 template.go:828] router "msg"="Shutdown requested, waiting 45s for new connections to cease" 

melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller 
NAME      AGE
default   3h59m
melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller default -o yaml
apiVersion: operator.openshift.io/v1
<-----snip---->
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2023-10-17T07:41:42Z"
    reason: Valid
    status: "True"
    type: Admitted
  - lastTransitionTime: "2023-10-17T07:57:01Z"
    message: The deployment has Available status condition set to True
    reason: DeploymentAvailable
    status: "True"
    type: DeploymentAvailable
  - lastTransitionTime: "2023-10-17T07:57:01Z"
    message: Minimum replicas requirement is met
    reason: DeploymentMinimumReplicasMet
    status: "True"
    type: DeploymentReplicasMinAvailable
  - lastTransitionTime: "2023-10-17T11:34:54Z"
    message: 1/2 of replicas are available
    reason: DeploymentReplicasNotAvailable
    status: "False"
    type: DeploymentReplicasAllAvailable
  - lastTransitionTime: "2023-10-17T11:34:54Z"
    message: |
      Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
    reason: DeploymentRollingOut
    status: "True"
    type: DeploymentRollingOut
  - lastTransitionTime: "2023-10-17T07:41:43Z"
    message: The endpoint publishing strategy supports a managed load balancer
    reason: WantedByEndpointPublishingStrategy
    status: "True"
    type: LoadBalancerManaged
  - lastTransitionTime: "2023-10-17T07:57:24Z"
    message: The LoadBalancer service is provisioned
    reason: LoadBalancerProvisioned
    status: "True"
    type: LoadBalancerReady
  - lastTransitionTime: "2023-10-17T07:41:43Z"
    message: LoadBalancer is not progressing
    reason: LoadBalancerNotProgressing
    status: "False"
    type: LoadBalancerProgressing
  - lastTransitionTime: "2023-10-17T07:41:43Z"
    message: DNS management is supported and zones are specified in the cluster DNS
      config.
    reason: Normal
    status: "True"
    type: DNSManaged
  - lastTransitionTime: "2023-10-17T07:57:26Z"
    message: The record is provisioned in all reported zones.
    reason: NoFailedZones
    status: "True"
    type: DNSReady
  - lastTransitionTime: "2023-10-17T07:57:26Z"
    status: "True"
    type: Available
  - lastTransitionTime: "2023-10-17T11:34:54Z"
    message: |-
      One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
      )
    reason: IngressControllerProgressing
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-10-17T07:57:28Z"
    status: "False"
    type: Degraded
  - lastTransitionTime: "2023-10-17T07:41:43Z"
<-----snip---->

Bug OCPBUGS-36036: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/1068

Bug OCPBUGS-42438: [IBMCloud] MonitorTests fail due to CSI Driver pods require ClusterRole SCC binding

View the Description View the linked PRs

This is a clone of issue OCPBUGS-42231. The following is the description of the original issue:
—
Description of problem:

    OCP Conformance MonitorTests can fail based on CSI Drivers pod and ClusterRole applied order. SA, CR, CRB likely should be applied first prior to deployment/pods.

Version-Release number of selected component (if applicable):

    4.18.0

How reproducible:

60%

Steps to Reproduce:

    1. Create IPI cluster on IBM Cloud
    2. Run OCP Conformance w/ MonitorTests

Actual results:

    : [sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]

{  fail [github.com/openshift/origin/test/extended/authorization/scc.go:76]: 1 pods failed before test on SCC errors
Error creating: pods "ibm-vpc-block-csi-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[6]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[7]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[9]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider restricted-v2: .containers[0].runAsUser: Invalid value: 0: must be in the ranges: [1000180000, 1000189999], provider restricted-v2: .containers[1].runAsUser: Invalid value: 0: must be in the ranges: [1000180000, 1000189999], provider restricted-v2: .containers[1].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[2].runAsUser: Invalid value: 0: must be in the ranges: [1000180000, 1000189999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/ibm-vpc-block-csi-node -n openshift-cluster-csi-drivers happened 7 times

Ginkgo exit error 1: exit with code 1}

Expected results:

    No pod creation failures using the wrong SCC, because the ClusterRole/ClusterRoleBinding, etc. had not been applied yet.

Additional info:

Sorry, I did not see an IBM Cloud Storage listed in the targeted Component for this bug, so selected the generic Storage component. Please forward as necessary/possible.


Items to consider:

ClusterRole:  https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/rbac/privileged_role.yaml

ClusterRoleBinding:  https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/rbac/node_privileged_binding.yaml

The ibm-vpc-block-csi-node-* pods eventually reach running using privileged SCC. I do not know whether it is possible to stage the resources that get created first, within the CSI Driver Operator
https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/9288e5078f2fe3ce2e69a4be3d94622c164c3dbd/pkg/operator/starter.go#L98-L99
Prior to the CSI Driver daemonset (`node.yaml`), perhaps order matters within the list.

Example of failure in CI:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_installer/8235/pull-ci-openshift-installer-master-e2e-ibmcloud-ovn/1836521032031145984

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/130

Story API-1673: Add description annotations to service CA secret and configmap

View the Description View the linked PRs

Service CA operator creates certificates and secrets to inject cert info into configmaps that request via annotation.

Those secrets and configmaps need to have ownership and description annotations to support cert ownership validation.

https://github.com/openshift/service-ca-operator/pull/225

Bug OCPBUGS-24152: Update 4.15 ose-gcp-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-gcp/pull/47

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-gcp/pull/47

Bug OCPBUGS-24290: Generate a report about certificates violating metadata requirements

View the linked PRs

https://github.com/openshift/origin/pull/28432

Bug OCPBUGS-42284: Failure to pull NTO image preventing startup of ocp-tuned-one-shot.service

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39124~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39005. The following is the description of the original issue:
—
Hello Team,

After the hard reboot of all nodes due to a power outage, failure of image pull of NTO preventing "ocp-tuned-one-shot.service" startup result in dependency failure for kubelet and crio services,

------------

journalctl_--no-pager

Aug 26 17:07:46 ocp05 systemd[1]: Reached target The firstboot OS update has completed.
Aug 26 17:07:46 ocp05 resolv-prepender.sh[3577]: NM resolv-prepender: Starting download of baremetal runtime cfg image
Aug 26 17:07:46 ocp05 systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP...
Aug 26 17:07:46 ocp05 systemd[1]: Starting TuneD service from NTO image...
Aug 26 17:07:46 ocp05 nm-dispatcher[3687]: NM resolv-prepender triggered by lo up.
Aug 26 17:07:46 ocp05 resolv-prepender.sh[3644]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ lo == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + '[' -z ']'
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + echo 'Not a DHCP4 address. Ignoring.'
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: Not a DHCP4 address. Ignoring.
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + exit 0
Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + '[' -z '' ']'
Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + echo 'Not a DHCP6 address. Ignoring.'
Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: Not a DHCP6 address. Ignoring.
Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + exit 0
Aug 26 17:07:46 ocp05 bash[3655]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
Aug 26 17:07:46 ocp05 podman[3661]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26...
Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Main process exited, code=exited, status=125/n/a
Aug 26 17:07:46 ocp05 nm-dispatcher[3793]: NM resolv-prepender triggered by brtrunk up.
Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Failed with result 'exit-code'.
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ brtrunk == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + '[' -z ']'
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + echo 'Not a DHCP4 address. Ignoring.'
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: Not a DHCP4 address. Ignoring.
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + exit 0
Aug 26 17:07:46 ocp05 systemd[1]: Failed to start TuneD service from NTO image.
Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Dependencies necessary to run kubelet.
Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Kubernetes Kubelet.
Aug 26 17:07:46 ocp05 systemd[1]: kubelet.service: Job kubelet.service/start failed with result 'dependency'.
Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Container Runtime Interface for OCI (CRI-O).
Aug 26 17:07:46 ocp05 systemd[1]: crio.service: Job crio.service/start failed with result 'dependency'.
Aug 26 17:07:46 ocp05 systemd[1]: kubelet-dependencies.target: Job kubelet-dependencies.target/start failed with result 'dependency'.
Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + '[' -z '' ']'
Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + echo 'Not a DHCP6 address. Ignoring.'
Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: Not a DHCP6 address. Ignoring.
Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + exit 0

-----------

$ oc get proxy config cluster -oyaml
status:
httpProxy: http://proxy_ip:8080
httpsProxy: http://proxy_ip:8080

$ cat /etc/mco/proxy.env
HTTP_PROXY=http://proxy_ip:8080
HTTPS_PROXY=http://proxy_ip:8080

-----------

-----------
× ocp-tuned-one-shot.service - TuneD service from NTO image
Loaded: loaded (/etc/systemd/system/ocp-tuned-one-shot.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Mon 2024-08-26 17:07:46 UTC; 2h 30min ago
Main PID: 3661 (code=exited, status=125)

Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
-----------

Customer has proxy configured in their environment. However, nodes can not start after hard reboot of all nodes as it looks that NTO ignoring cluster wide proxy settings. To resolve NTO image pull issue, customer has to include proxy variable in /etc/systemd/system.conf manually.

https://github.com/openshift/cluster-node-tuning-operator/pull/1171

Bug OCPBUGS-27676: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-operator-controller/pull/68

Bug OCPBUGS-19332: Inconsistent impersonation in hypershift dump

View the Description View the linked PRs

Description of problem:

hypershift dump fails to acquire localhost-kubeconfig when impersonating. When attempting to dump guest cluster, it fails to read Secrets from the HCP namespace on the management cluster. As a result, it can't access anything from the guest cluster and fails to dump it successfully.

Version-Release number of selected component (if applicable):

Hypershift 0.1.11
Supported OCP version 4.15.0

How reproducible:

100%

Steps to Reproduce:

Execute hypershift dump cluster --as backplane-cluster-admin --name ${CLUSTER_NAME} --namespace ocm-${ENVIRONMENT}-${CLUSTER_ID} --dump-guest-cluster  --artifact-dir ${DIR_NAME}

Actual results:

After a while a failure message will appear showing permission issue when attempting to acquire localhost-kubeconfig

Expected results:

localhost-kubeconfig should be acquired correctly and dump should be able to dump the guest cluster successfully

Additional info:

https://github.com/openshift/hypershift/pull/3011

Bug OCPBUGS-21645: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/182

Bug OCPBUGS-21794: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/197

Bug OCPBUGS-44201: OIDC IDP validation check should not be fatal to CPO reconcilation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43840~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43746~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38132. The following is the description of the original issue:
—
The CPO reconciliation aborts when the OIDC/LDAP IDP validation check fails and this result in failure to reconcile for any components that are reconciled after that point in the code.

This failure should not be fatal to the CPO reconcile and should likely be reported as a condition on the HC.

xref

Customer incident
https://issues.redhat.com/browse/OCPBUGS-38071

RFE for bypassing the check
https://issues.redhat.com/browse/RFE-5638

PR to proxy the IDP check through the data plane network
https://github.com/openshift/hypershift/pull/4273

https://github.com/openshift/hypershift/pull/5037

Bug OCPBUGS-9719: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/10177

Task MON-3287: Move etcd monitoring RBAC to CEO

View the Description View the linked PRs

The ServiceMonitors and other related resources were moved in https://issues.redhat.com/browse/MON-669

We thought move RBAC make more sense as well https://github.com/openshift/cluster-monitoring-operator/pull/2039#discussion_r1262307325

Bug OCPBUGS-20369: worker CSR are pending, so no worker nodes available

View the Description View the linked PRs

Description of problem:

worker CSR are pending, so no worker nodes available

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-06-234925

How reproducible:

Always

Steps to Reproduce:

Create a cluster with profile - aws-c2s-ipi-disconnected-private-fips

Actual results:

Workers csrs are pending

Expected results:

workers should be up and running all CSRs approved

Additional info:

failed to find machine for node ip-10-143-1-120” , in logs of cluster-machine-approver 

Seems like we should have ips like 
“ip-10-143-1-120.ec2.internal”

failing here - https://github.com/openshift/cluster-machine-approver/blob/master/pkg/controller/csr_check.go#L263

Must-gather - https://drive.google.com/file/d/15tz9TLdTXrH6bSBSfhlIJ1l_nzeFE1R3/view?usp=sharing

cluster - https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/238922/

template for installation - https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-fips-c2s-ci

cc Yunfei Jiang Zhaohua Sun

https://github.com/openshift/machine-config-operator/pull/3979

Bug OCPBUGS-24019: MCO TLS artifacts should have ownership annotations

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4045

Bug OCPBUGS-25782: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3321

Bug OCPBUGS-31949: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1231

Bug OCPBUGS-23516: Monitoring console plugin should avoid browser-caching failures

View the Description View the linked PRs

Description of problem:

~~MON-2967~~ and cmo#1890 moved the Observe console menu into a console plugin (in 4.15? 4.14?). Sometimes If-Modified-Since browser caching results in failures that result in a missing Observe menu, and when the user eventually finds /k8s/cluster/operator.openshift.io~v1~Console/cluster/console-plugins, render failure as:

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
SyntaxError: Unexpected end of JSON input

This appears to be the result of the browser's If-Modified-Since caching:

$ curl -sH Accept:application/json -H Cache-Control:max-age=0 -H 'Cookie: openshift-session-token=...; login-state=...; ...; csrf-token=...'
-H 'If-Modified-Since: Fri, 03 Nov 2023 00:47:45 GMT' -i https://console.build02.ci.openshift.org/api/plugins/monitoring-plugin/plugin-manifest.json
HTTP/1.1 200 OK
date: Tue, 21 Nov 2023 16:52:55 GMT
etag: "65444331-9a2"
last-modified: Fri, 03 Nov 2023 00:47:45 GMT
referrer-policy: strict-origin-when-cross-origin
server: nginx/1.20.1
x-content-type-options: nosniff
x-dns-prefetch-control: off
x-frame-options: DENY
x-xss-protection: 1; mode=block
content-length: 0

While a more recent If-Modified-Since returns populated JSON:

$ curl -sH Accept:application/json -H 'If-Modified-Since: Fri, 10 Nov 2023 10:47:45 GMT' -H 'Cookie: openshift-session-token=...; login-state=...; ...; csrf-token=...' https://console.build02.ci.openshift.org/api/plugins/monitoring-plugin/plugin-manifest.json | jq . | head
{
  "name": "monitoring-plugin",
  "version": "1.0.0",
  "displayName": "OpenShift console monitoring plugin",
  "description": "This plugin adds the monitoring UI to the OpenShift web console",
  "dependencies": {
    "@console/pluginAPI": "*"
  },
  "extensions": [
    {

Disabling caching on the monitoring-plugin side would avoid this issues. But fixing 304 handling in the console's proxy would likely also resolve the issue.

Version-Release number of selected component (if applicable):

Seen in 4.15.0-ec.2. Reproduced in ec.2. Failed to reproduce in ec.1. Possibly a regression from ec.1 to ec.2, although I haven't identified a regressing commit yet.

How reproducible:

Seen multiple times by multiple users in 4.15.0-ec.2 in two long-lived clusters, and also reproduced in an ec.2 Cluster Bot cluster. Likely consistently reprodible on ec.2.

Steps to Reproduce:

1. Install a cluster, e.g. with launch 4.15.0-ec.2 gcp.
2. Log into the console and use the developer tab to get an openshift-session-token value from a successful HTTPS request.
3.

$ curl -ksi -H "Cookie: openshift-session-token=${TOKEN}" "https://${HOST}/api/plugins/monitoring-plugin/plugin-manifest.json" | grep 'HTTP\|content-\|last-modified'

with your ${TOKEN} and ${HOST}, to confirm 200 responses and find the last-modified value.
4.

$ curl -ksi -H "If-Modified-Since: ${LAST_MODIFIED}" -H "Cookie: openshift-session-token=${TOKEN}" "https://${HOST}/api/plugins/monitoring-plugin/plugin-manifest.json"

with your ${TOKEN}, ${HOST}, and ${LAST_MODIFIED}.

Actual results:

Observe menu is missing, with browser-console logs like:

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
SyntaxError: Unexpected end of JSON input

200 responses with no content when If-Modified-Since is greater than or equal to the content's last-modified.

Expected results:

Reliably successful loading of the monitoring console plugin, with a 304 when If-Modified-Since is greater than or equal to the content's last-modified.

Possibly more obvious warnings pointing at /k8s/cluster/operator.openshift.io~v1~Console/cluster/console-plugins when plugins fail to load.

Additional info:

Using the browser's development tools to disable caching while loading the console avoids the problematic caching interaction.

https://github.com/openshift/cluster-monitoring-operator/pull/2186

Bug OCPBUGS-30412: CCO degrade when remove root credential for GCP cluster in Mint mode

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28787~~. The following is the description of the original issue:
—
Description of problem:

It was found when testing OCP-71263 and regression OCP-35770 for 4.15.
For GCP in Mint mode, the root credential can be removed after cluster installation.
But after removing the root credential, CCO became degrade.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-25-051548

4.15.0-rc.3

How reproducible:

    
Always

Steps to Reproduce:

    1.Install a GCP cluster with Mint mode

    2.After install, remove the root credential
jianpingshu@jshu-mac ~ % oc delete secret -n kube-system gcp-credentials
secret "gcp-credentials" deleted     

    3.Wait some time(about 1/2h to 1h), CCO became degrade 
    
jianpingshu@jshu-mac ~ % oc get co cloud-credential
NAME               VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
cloud-credential   4.15.0-rc.3   True        True          True       6h45m   6 of 7 credentials requests are failing to sync.

jianpingshu@jshu-mac ~ % oc -n openshift-cloud-credential-operator get -o json credentialsrequests | jq -r '.items[] | select(tostring | contains("InfrastructureMismatch") | not) | .metadata.name as $n | .status.conditions // [{type: "NoConditions"}] | .[] | .type + "=" + .status + " " + $n + " " + .reason + ": " + .message' | sort
CredentialsProvisionFailure=False openshift-cloud-network-config-controller-gcp CredentialsProvisionSuccess: successfully granted credentials request
CredentialsProvisionFailure=True cloud-credential-operator-gcp-ro-creds CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-gcp-ccm CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-gcp-pd-csi-driver-operator CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-image-registry-gcs CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-ingress-gcp CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-machine-api-gcp CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found

openshift-cloud-network-config-controller-gcp has no failure because it doesn't has customized role in 4.15.0.rc3

Actual results:

 CCO became degrade

Expected results:

 CCO not in degrade, just "upgradeable" condition updated with missing the root credential

Additional info:

Tested the same case on 4.14.10, no issue

https://github.com/openshift/cloud-credential-operator/pull/686

Bug OCPBUGS-30742: [4.15] HCP deletion can get stuck if CPO is unable to delete the default worker security group

View the Description View the linked PRs

Description of problem:

If a ROSA HCP customer uses the default worker security group that the CPO creates for some other purpose (i.e. creates their own VPC Endpoint or EC2 instance using this security group) and then starts an uninstallation - the uninstallation will hang indefinitely because the CPO is unable to delete the security group.

https://github.com/openshift/hypershift/blob/9e6255e5e44c8464da0850f8c19dc085bdbaf8cb/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L317-L331

Version-Release number of selected component (if applicable):

4.14.8

How reproducible:

100%

Steps to Reproduce:

    1. Create a ROSA HCP cluster
    2. Attach the default worker security group to some other object unrelated to the cluster, like an EC2 instance or VPC Endpoint
    3. Uninstall the ROSA HCP cluster

Actual results:

The uninstall hangs without much feedback to the customer

Expected results:

Either that the uninstall gives up and moves on eventually, or that clear feedback is provided to the customer, so that they know that the uninstall is held up because of an inability to delete a specific security group id. If this feedback mechanism is already in place, but not wired through to OCM, this may not be an OCPBUGS and could just be an OCM bug instead!

Additional info:

https://github.com/openshift/hypershift/pull/3726

Bug OCPBUGS-37554: [4.15.z] SCC pinning for all workloads in platform namespaces (openshift-operator-lifecycle-manager)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the openshift-operator-lifecycle-manager.

Namespaces with workloads that need pinning:

openshift-operator-lifecycle-manager

See 4.17 PR for more info on what needs pinning.

https://github.com/openshift/operator-framework-olm/pull/828

Task OSASINFRA-3294: UPI docs script to update bootstrap ignition shim is broken

View the Description View the linked PRs

The official openshift doc does not contain this issue https://docs.openshift.com/container-platform/4.14/installing/installing_openstack/installing-openstack-user.html
Only the upstream docs has it.

https://github.com/openshift/installer/pull/7743

Bug OCPBUGS-19204: Update 4.15 prometheus-operator-admission-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/244

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/244

Bug OCPBUGS-24276: include network-tools in pre-dispatch script

View the Description View the linked PRs

In the python script used during bug pre-dispatch, include "networking / network-tools" component.

https://github.com/openshift/network-tools/pull/102

Bug OCPBUGS-26495: Installer should have a pre-check which prevents installation on non-BareMetal platforms without the CloudCredential cap

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24956~~. The following is the description of the original issue:
—
The Cloud Credential operator was made optional in OCP 4.15, see https://issues.redhat.com/browse/OCPEDGE-69. The CloudCredential cap was added as a new capability.

However, for OCP 4.15 the disablement of CCO is only supported on BareMetal platforms, see https://issues.redhat.com/browse/OCPEDGE-69?focusedId=23595076&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23595076.

We propose to guard against installations on non-BareMetal platforms without the CloudCredential cap, which could be implemented similar to https://issues.redhat.com/browse/OCPBUGS-15659. 

Bug OCPBUGS-27373: potential regression: [sig-arch] events should not repeat pathologically for ns/openshift-monitoring

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26594~~. The following is the description of the original issue:
—
Component Readiness has found a potential regression in [sig-arch] events should not repeat pathologically for ns/openshift-monitoring.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.15
Start Time: 2024-01-04T00:00:00Z
End Time: 2024-01-10T23:59:59Z
Success Rate: 42.31%
Successes: 11
Failures: 15
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 100.00%
Successes: 151
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Monitoring&confidence=95&environment=sdn%20no-upgrade%20amd64%20gcp%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&network=sdn&pity=5&platform=gcp&platform=gcp&sampleEndTime=2024-01-10%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-01-04%2000%3A00%3A00&testId=openshift-tests%3A567152bb097fa9ce13dd2fb6885e094a&testName=%5Bsig-arch%5D%20events%20should%20not%20repeat%20pathologically%20for%20ns%2Fopenshift-monitoring&upgrade=no-upgrade&upgrade=no-upgrade&variant=serial&variant=serial

Bug OCPBUGS-18115: PrometheusOperatorRejectedResources alert fires on Hypershift clusters with user-defined monitoring

View the Description View the linked PRs

Description of problem:

After enabling user-defined monitoring on an HyperShift hosted cluster, PrometheusOperatorRejectedResources starts firing.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Start an hypershift-hosted cluster with cluster-bot
2. Enable user-defined monitoring
3.

Actual results:

PrometheusOperatorRejectedResources alert becomes firing

Expected results:

No alert firing

Additional info:

Need to reach out to the HyperShift folks as the fix should probably be in their code base.

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/551

Bug OCPBUGS-30857: OCP4.15 - Port_Security flag doesn't work in ShiftonStack Sriov Worker node deployment

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30855~~. The following is the description of the original issue:
—
Description of problem:

    The Port_Security has been override although it has set to false in Worker machineset configuration

Version-Release number of selected component (if applicable):

    OCP=4.14.14
    RHOSP=17.1

How reproducible:

    NFV Perf lab 
    ShiftonStack Deployment mode = IPI

Steps to Reproduce:

    1.Network configuration resources for Worker node
$ oc get machinesets.machine.openshift.io -n openshift-machine-api | grep worker
5kqfbl3y0rhocpnfv-wj2jj-worker-0   1         1         1       1           5d23h
$ oc describe machinesets.machine.openshift.io -n openshift-machine-api 5kqfbl3y0rhocpnfv-wj2jj-worker-0
Name:         5kqfbl3y0rhocpnfv-wj2jj-worker-0
Namespace:    openshift-machine-api
Labels:       machine.openshift.io/cluster-api-cluster=5kqfbl3y0rhocpnfv-wj2jj
              machine.openshift.io/cluster-api-machine-role=worker
              machine.openshift.io/cluster-api-machine-type=worker
Annotations:  machine.openshift.io/memoryMb: 47104
              machine.openshift.io/vCPU: 26
API Version:  machine.openshift.io/v1beta1
Kind:         MachineSet
Metadata:
  Creation Timestamp:  2024-03-07T05:24:07Z
  Generation:          3
  Resource Version:    226098
  UID:                 8cb06872-9b62-4c2c-b66b-bf91a03efa2d
Spec:
  Replicas:  1
  Selector:
    Match Labels:
      machine.openshift.io/cluster-api-cluster:     5kqfbl3y0rhocpnfv-wj2jj
      machine.openshift.io/cluster-api-machineset:  5kqfbl3y0rhocpnfv-wj2jj-worker-0
  Template:
    Metadata:
      Labels:
        machine.openshift.io/cluster-api-cluster:       5kqfbl3y0rhocpnfv-wj2jj
        machine.openshift.io/cluster-api-machine-role:  worker
        machine.openshift.io/cluster-api-machine-type:  worker
        machine.openshift.io/cluster-api-machineset:    5kqfbl3y0rhocpnfv-wj2jj-worker-0
    Spec:
      Lifecycle Hooks:
      Metadata:
      Provider Spec:
        Value:
          API Version:        machine.openshift.io/v1alpha1
          Availability Zone:  worker
          Cloud Name:         openstack
          Clouds Secret:
            Name:        openstack-cloud-credentials
            Namespace:   openshift-machine-api
          Config Drive:  true
          Flavor:        sos-worker
          Image:         5kqfbl3y0rhocpnfv-wj2jj-rhcos
          Kind:          OpenstackProviderSpec
          Metadata:
          Networks:
            Filter:
            Subnets:
              Filter:
                Id:  7fb7d2d6-325d-49e1-b3f8-b4dbb1197e34
          Ports:
            Fixed I Ps:
              Subnet ID:    1a892dcf-bf93-46ef-bf37-bda6cf923471
            Name Suffix:    provider3p1
            Network ID:     50a557b5-34c2-4c47-b539-963688f7167c
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    76430b9e-302f-428d-916a-77482d9cfb19
            Name Suffix:    provider4p1
            Network ID:     e2106b16-8f83-4e2e-bdbd-20e2c12ec279
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    1a892dcf-bf93-46ef-bf37-bda6cf923471
            Name Suffix:    provider3p2
            Network ID:     50a557b5-34c2-4c47-b539-963688f7167c
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    76430b9e-302f-428d-916a-77482d9cfb19
            Name Suffix:    provider4p2
            Network ID:     e2106b16-8f83-4e2e-bdbd-20e2c12ec279
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    1a892dcf-bf93-46ef-bf37-bda6cf923471
            Name Suffix:    provider3p3
            Network ID:     50a557b5-34c2-4c47-b539-963688f7167c
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    76430b9e-302f-428d-916a-77482d9cfb19
            Name Suffix:    provider4p3
            Network ID:     e2106b16-8f83-4e2e-bdbd-20e2c12ec279
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    1a892dcf-bf93-46ef-bf37-bda6cf923471
            Name Suffix:    provider3p4
            Network ID:     50a557b5-34c2-4c47-b539-963688f7167c
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    76430b9e-302f-428d-916a-77482d9cfb19
            Name Suffix:    provider4p4
            Network ID:     e2106b16-8f83-4e2e-bdbd-20e2c12ec279
            Port Security:  false
            Tags:
              sriov
            Trunk:         false
            Vnic Type:     direct
          Primary Subnet:  7fb7d2d6-325d-49e1-b3f8-b4dbb1197e34
          Security Groups:
            Filter:
            Name:             5kqfbl3y0rhocpnfv-wj2jj-worker
          Server Group Name:  5kqfbl3y0rhocpnfv-wj2jj-worker-worker
          Server Metadata:
            Name:                  5kqfbl3y0rhocpnfv-wj2jj-worker
            Openshift Cluster ID:  5kqfbl3y0rhocpnfv-wj2jj
          Tags:
            openshiftClusterID=5kqfbl3y0rhocpnfv-wj2jj
          Trunk:  true
          User Data Secret:
            Name:  worker-user-data
Status:
  Available Replicas:      1
  Fully Labeled Replicas:  1
  Observed Generation:     3
  Ready Replicas:          1
  Replicas:                1
Events:                    <none>
$ oc get nodes
NAME                                     STATUS   ROLES                  AGE     VERSION
5kqfbl3y0rhocpnfv-wj2jj-master-0         Ready    control-plane,master   5d23h   v1.27.10+28ed2d7
5kqfbl3y0rhocpnfv-wj2jj-master-1         Ready    control-plane,master   5d23h   v1.27.10+28ed2d7
5kqfbl3y0rhocpnfv-wj2jj-master-2         Ready    control-plane,master   5d23h   v1.27.10+28ed2d7
5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr   Ready    worker                 5d22h   v1.27.10+28ed2d7
$ oc describe nodes 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
Name:               5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=sos-worker
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=regionOne
                    failure-domain.beta.kubernetes.io/zone=worker
                    feature.node.kubernetes.io/network-sriov.capable=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=sos-worker
                    node.openshift.io/os_id=rhcos
                    topology.cinder.csi.openstack.org/zone=worker
                    topology.kubernetes.io/region=regionOne
                    topology.kubernetes.io/zone=worker
Annotations:        alpha.kubernetes.io/provided-node-ip: 192.168.0.91
                    csi.volume.kubernetes.io/nodeid: {"cinder.csi.openstack.org":"aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879"}
                    machine.openshift.io/machine: openshift-machine-api/5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
                    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-8c613531f97974a9561f8b0ada0c2cd0
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-8c613531f97974a9561f8b0ada0c2cd0
                    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-8c613531f97974a9561f8b0ada0c2cd0
                    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-8c613531f97974a9561f8b0ada0c2cd0
                    machineconfiguration.openshift.io/lastSyncedControllerConfigResourceVersion: 505735
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    sriovnetwork.openshift.io/state: Idle
                    tuned.openshift.io/bootcmdline:
                      skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=10-25 tuned.non_isolcpus=000003ff systemd.cpu_affinity=0,1,2,3...
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 07 Mar 2024 06:09:31 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
  AcquireTime:     <unset>
  RenewTime:       Wed, 13 Mar 2024 04:55:28 +0000
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 13 Mar 2024 04:55:33 +0000   Thu, 07 Mar 2024 15:18:00 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 13 Mar 2024 04:55:33 +0000   Thu, 07 Mar 2024 15:18:00 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 13 Mar 2024 04:55:33 +0000   Thu, 07 Mar 2024 15:18:00 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 13 Mar 2024 04:55:33 +0000   Thu, 07 Mar 2024 15:18:05 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.0.91
  Hostname:    5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
Capacity:
  cpu:                          26
  ephemeral-storage:            104266732Ki
  hugepages-1Gi:                20Gi
  hugepages-2Mi:                0
  memory:                       47264764Ki
  openshift.io/intl_provider3:  4
  openshift.io/intl_provider4:  4
  pods:                         250
Allocatable:
  cpu:                          16
  ephemeral-storage:            95018478229
  hugepages-1Gi:                20Gi
  hugepages-2Mi:                0
  memory:                       25166844Ki
  openshift.io/intl_provider3:  4
  openshift.io/intl_provider4:  4
  pods:                         250
System Info:
  Machine ID:                             aa5cfdcbeb4646d88ac25bb6f0c0d879
  System UUID:                            aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879
  Boot ID:                                77573755-0d27-4717-80fe-4579692d9c2c
  Kernel Version:                         5.14.0-284.54.1.el9_2.x86_64
  OS Image:                               Red Hat Enterprise Linux CoreOS 414.92.202402201520-0 (Plow)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.27.3-6.rhaos4.14.git7eb2281.el9
  Kubelet Version:                        v1.27.10+28ed2d7
  Kube-Proxy Version:                     v1.27.10+28ed2d7
ProviderID:                               openstack:///aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879
Non-terminated Pods:                      (19 in total)
  Namespace                               Name                                                 CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                               ----                                                 ------------  ----------  ---------------  -------------  ---
  crucible-rickshaw                       testpmd-host-device-e810-sriov                       10 (62%)      10 (62%)    10000Mi (40%)    10000Mi (40%)  3d13h
  openshift-cluster-csi-drivers           openstack-cinder-csi-driver-node-hnv49               30m (0%)      0 (0%)      150Mi (0%)       0 (0%)         5d22h
  openshift-cluster-node-tuning-operator  tuned-fcjfp                                          10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         5d22h
  openshift-dns                           dns-default-v7s59                                    60m (0%)      0 (0%)      110Mi (0%)       0 (0%)         5d22h
  openshift-dns                           node-resolver-gkz8b                                  5m (0%)       0 (0%)      21Mi (0%)        0 (0%)         5d22h
  openshift-image-registry                node-ca-p5dn5                                        10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         5d22h
  openshift-ingress-canary                ingress-canary-fk59t                                 10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         5d22h
  openshift-machine-config-operator       machine-config-daemon-9qw8z                          40m (0%)      0 (0%)      100Mi (0%)       0 (0%)         5d22h
  openshift-monitoring                    node-exporter-czcmj                                  9m (0%)       0 (0%)      47Mi (0%)        0 (0%)         5d22h
  openshift-monitoring                    prometheus-adapter-7696787779-vj5wk                  1m (0%)       0 (0%)      40Mi (0%)        0 (0%)         5d4h
  openshift-multus                        multus-additional-cni-plugins-l7rpv                  10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         5d22h
  openshift-multus                        multus-nxr6k                                         10m (0%)      0 (0%)      65Mi (0%)        0 (0%)         5d22h
  openshift-multus                        network-metrics-daemon-tb7sq                         20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         5d22h
  openshift-network-diagnostics           network-check-target-pqtp9                           10m (0%)      0 (0%)      15Mi (0%)        0 (0%)         5d22h
  openshift-openstack-infra               coredns-5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr       200m (1%)     0 (0%)      400Mi (1%)       0 (0%)         5d22h
  openshift-openstack-infra               keepalived-5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr    200m (1%)     0 (0%)      400Mi (1%)       0 (0%)         5d22h
  openshift-sdn                           sdn-9mdnb                                            110m (0%)     0 (0%)      220Mi (0%)       0 (0%)         5d22h
  openshift-sriov-network-operator        sriov-device-plugin-tr68w                            10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         5d13h
  openshift-sriov-network-operator        sriov-network-config-daemon-dtf95                    100m (0%)     0 (0%)      100Mi (0%)       0 (0%)         5d22h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                     Requests       Limits
  --------                     --------       ------
  cpu                          10845m (67%)   10 (62%)
  memory                       11928Mi (48%)  10000Mi (40%)
  ephemeral-storage            0 (0%)         0 (0%)
  hugepages-1Gi                8Gi (40%)      8Gi (40%)
  hugepages-2Mi                0 (0%)         0 (0%)
  openshift.io/intl_provider3  4              4
  openshift.io/intl_provider4  4              4
Events:                        <none>

    2. OpenStack Network resource for Worker node
$ openstack server list --all --fit-width
+--------------------------------------+----------------------------------------+--------+----------------------------------------------------------------------------------------------------------------------+-------------------------------+------------+
| ID                                   | Name                                   | Status | Networks                                                                                                             | Image                         | Flavor     |
+--------------------------------------+----------------------------------------+--------+----------------------------------------------------------------------------------------------------------------------+-------------------------------+------------+
| aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr | ACTIVE | management=192.168.0.91; provider-3=192.168.177.197, 192.168.177.59, 192.168.177.66, 192.168.177.83;                 | 5kqfbl3y0rhocpnfv-wj2jj-rhcos | sos-worker |
|                                      |                                        |        | provider-4=192.168.178.108, 192.168.178.121, 192.168.178.144, 192.168.178.18                                         |                               |            |
| 1a24baf3-acde-49a0-ab8e-4f4afcc9d3cc | 5kqfbl3y0rhocpnfv-wj2jj-master-2       | ACTIVE | management=192.168.0.62                                                                                              | 5kqfbl3y0rhocpnfv-wj2jj-rhcos | sos-master |
| 3e545ab5-6e28-4189-8d94-9272dfa1cd05 | 5kqfbl3y0rhocpnfv-wj2jj-master-1       | ACTIVE | management=192.168.0.78                                                                                              | 5kqfbl3y0rhocpnfv-wj2jj-rhcos | sos-master |
| 97e5c382-0fb0-4a70-b58e-0469d3869a4e | 5kqfbl3y0rhocpnfv-wj2jj-master-0       | ACTIVE | management=192.168.0.93                                                                                              | 5kqfbl3y0rhocpnfv-wj2jj-rhcos | sos-master |
+--------------------------------------+----------------------------------------+--------+----------------------------------------------------------------------------------------------------------------------+-------------------------------+------------+$ openstack port list --server aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879
+--------------------------------------+----------------------------------------------------+-------------------+--------------------------------------------------------------------------------+--------+
| ID                                   | Name                                               | MAC Address       | Fixed IP Addresses                                                             | Status |
+--------------------------------------+----------------------------------------------------+-------------------+--------------------------------------------------------------------------------+--------+
| 0a562c29-4ddc-41c4-82e8-13934d3ee273 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-0           | fa:16:3e:16:9a:c3 | ip_address='192.168.0.91', subnet_id='7fb7d2d6-325d-49e1-b3f8-b4dbb1197e34'    | ACTIVE |
| 0c1814db-cd4f-4f6a-a0c6-4f8e569b6767 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p4 | fa:16:3e:15:88:d7 | ip_address='192.168.178.108', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19' | ACTIVE |
| 1778cb62-5fbf-42be-8847-53a7b092bdf5 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p2 | fa:16:3e:2a:64:e4 | ip_address='192.168.177.197', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471' | ACTIVE |
| 557f205b-2674-4f6e-91a2-643fe1702be2 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p1 | fa:16:3e:56:a3:48 | ip_address='192.168.177.83', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471'  | ACTIVE |
| 721b5f15-2dc9-4509-a4ba-09f364ae8771 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p3 | fa:16:3e:dd:c3:28 | ip_address='192.168.177.59', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471'  | ACTIVE |
| 9da4b1be-27d7-4428-a194-9eb4b02f6ac5 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p3 | fa:16:3e:fb:06:1b | ip_address='192.168.178.144', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19' | ACTIVE |
| a72fcbd2-83d3-4fa9-be3d-e9fbde27d4bf | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p4 | fa:16:3e:a9:28:0e | ip_address='192.168.177.66', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471'  | ACTIVE |
| ba5cd10f-c6bc-4bed-b978-3b8a3560ad5c | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p1 | fa:16:3e:33:e4:c4 | ip_address='192.168.178.18', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19'  | ACTIVE |
| bf2ce123-76fc-4e5c-9e4f-0473febbdeac | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p2 | fa:16:3e:ce:91:10 | ip_address='192.168.178.121', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19' | ACTIVE |
+--------------------------------------+----------------------------------------------------+-------------------+--------------------------------------------------------------------------------+--------+
$ openstack port show --fit-width 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p4
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
| Field                   | Value                                                                                                                     |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
| admin_state_up          | UP                                                                                                                        |
| allowed_address_pairs   |                                                                                                                           |
| binding_host_id         | nfv-intel-11.perflab.com                                                                                                  |
| binding_profile         | pci_slot='0000:b1:11.2', pci_vendor_info='8086:1889', physical_network='provider4'                                        |
| binding_vif_details     | connectivity='l2', port_filter='False', vlan='178'                                                                        |
| binding_vif_type        | hw_veb                                                                                                                    |
| binding_vnic_type       | direct                                                                                                                    |
| created_at              | 2024-03-07T06:03:43Z                                                                                                      |
| data_plane_status       | None                                                                                                                      |
| description             | Created by cluster-api-provider-openstack cluster openshift-machine-api-5kqfbl3y0rhocpnfv-wj2jj                           |
| device_id               | aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879                                                                                      |
| device_owner            | compute:worker                                                                                                            |
| device_profile          | None                                                                                                                      |
| dns_assignment          | fqdn='host-192-168-178-108.openstacklocal.', hostname='host-192-168-178-108', ip_address='192.168.178.108'                |
| dns_domain              |                                                                                                                           |
| dns_name                |                                                                                                                           |
| extra_dhcp_opts         |                                                                                                                           |
| fixed_ips               | ip_address='192.168.178.108', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19'                                            |
| id                      | 0c1814db-cd4f-4f6a-a0c6-4f8e569b6767                                                                                      |
| ip_allocation           | None                                                                                                                      |
| mac_address             | fa:16:3e:15:88:d7                                                                                                         |
| name                    | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p4                                                                        |
| network_id              | e2106b16-8f83-4e2e-bdbd-20e2c12ec279                                                                                      |
| numa_affinity_policy    | None                                                                                                                      |
| port_security_enabled   | True                                                                                                                      |
| project_id              | 927450d0f06647a99d86214acd822679                                                                                          |
| propagate_uplink_status | None                                                                                                                      |
| qos_network_policy_id   | None                                                                                                                      |
| qos_policy_id           | None                                                                                                                      |
| resource_request        | None                                                                                                                      |
| revision_number         | 6                                                                                                                         |
| security_group_ids      | f0df9265-c7fd-4f47-875f-d346e5cb5074                                                                                      |
| status                  | ACTIVE                                                                                                                    |
| tags                    | cluster-api-provider-openstack, openshift-machine-api-5kqfbl3y0rhocpnfv-wj2jj, openshiftClusterID=5kqfbl3y0rhocpnfv-wj2jj |
| trunk_details           | None                                                                                                                      |
| updated_at              | 2024-03-07T06:04:10Z                                                                                                      |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+$ openstack port show --fit-width 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p2
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
| Field                   | Value                                                                                                                     |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
| admin_state_up          | UP                                                                                                                        |
| allowed_address_pairs   |                                                                                                                           |
| binding_host_id         | nfv-intel-11.perflab.com                                                                                                  |
| binding_profile         | pci_slot='0000:b1:01.1', pci_vendor_info='8086:1889', physical_network='provider3'                                        |
| binding_vif_details     | connectivity='l2', port_filter='False', vlan='177'                                                                        |
| binding_vif_type        | hw_veb                                                                                                                    |
| binding_vnic_type       | direct                                                                                                                    |
| created_at              | 2024-03-07T06:03:41Z                                                                                                      |
| data_plane_status       | None                                                                                                                      |
| description             | Created by cluster-api-provider-openstack cluster openshift-machine-api-5kqfbl3y0rhocpnfv-wj2jj                           |
| device_id               | aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879                                                                                      |
| device_owner            | compute:worker                                                                                                            |
| device_profile          | None                                                                                                                      |
| dns_assignment          | fqdn='host-192-168-177-197.openstacklocal.', hostname='host-192-168-177-197', ip_address='192.168.177.197'                |
| dns_domain              |                                                                                                                           |
| dns_name                |                                                                                                                           |
| extra_dhcp_opts         |                                                                                                                           |
| fixed_ips               | ip_address='192.168.177.197', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471'                                            |
| id                      | 1778cb62-5fbf-42be-8847-53a7b092bdf5                                                                                      |
| ip_allocation           | None                                                                                                                      |
| mac_address             | fa:16:3e:2a:64:e4                                                                                                         |
| name                    | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p2                                                                        |
| network_id              | 50a557b5-34c2-4c47-b539-963688f7167c                                                                                      |
| numa_affinity_policy    | None                                                                                                                      |
| port_security_enabled   | True                                                                                                                      |
| project_id              | 927450d0f06647a99d86214acd822679                                                                                          |
| propagate_uplink_status | None                                                                                                                      |
| qos_network_policy_id   | None                                                                                                                      |
| qos_policy_id           | None                                                                                                                      |
| resource_request        | None                                                                                                                      |
| revision_number         | 9                                                                                                                         |
| security_group_ids      | f0df9265-c7fd-4f47-875f-d346e5cb5074                                                                                      |
| status                  | ACTIVE                                                                                                                    |
| tags                    | cluster-api-provider-openstack, openshift-machine-api-5kqfbl3y0rhocpnfv-wj2jj, openshiftClusterID=5kqfbl3y0rhocpnfv-wj2jj |
| trunk_details           | None                                                                                                                      |
| updated_at              | 2024-03-07T06:10:42Z                                                                                                      |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
$ openstack network list
+--------------------------------------+-------------+--------------------------------------+
| ID                                   | Name        | Subnets                              |
+--------------------------------------+-------------+--------------------------------------+
| 50a557b5-34c2-4c47-b539-963688f7167c | provider-3  | 1a892dcf-bf93-46ef-bf37-bda6cf923471 |
| e2106b16-8f83-4e2e-bdbd-20e2c12ec279 | provider-4  | 76430b9e-302f-428d-916a-77482d9cfb19 |
| 5fdddf1c-3a71-4752-94bd-bdb5b9674500 | management  | 7fb7d2d6-325d-49e1-b3f8-b4dbb1197e34 |
+--------------------------------------+-------------+--------------------------------------+$ openstack network show provider-3
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | UP                                   |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2024-03-01T16:45:48Z                 |
| description               |                                      |
| dns_domain                |                                      |
| id                        | 50a557b5-34c2-4c47-b539-963688f7167c |
| ipv4_address_scope        | None                                 |
| ipv6_address_scope        | None                                 |
| is_default                | None                                 |
| is_vlan_transparent       | None                                 |
| mtu                       | 9216                                 |
| name                      | provider-3                           |
| port_security_enabled     | True                                 |
| project_id                | ad4b9a972ac64bd9916ad7ee80288353     |
| provider:network_type     | vlan                                 |
| provider:physical_network | provider3                            |
| provider:segmentation_id  | 177                                  |
| qos_policy_id             | None                                 |
| revision_number           | 2                                    |
| router:external           | Internal                             |
| segments                  | None                                 |
| shared                    | True                                 |
| status                    | ACTIVE                               |
| subnets                   | 1a892dcf-bf93-46ef-bf37-bda6cf923471 |
| tags                      |                                      |
| updated_at                | 2024-03-01T16:45:52Z                 |
+---------------------------+--------------------------------------+$ openstack network show provider-4
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | UP                                   |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2024-03-01T16:45:57Z                 |
| description               |                                      |
| dns_domain                |                                      |
| id                        | e2106b16-8f83-4e2e-bdbd-20e2c12ec279 |
| ipv4_address_scope        | None                                 |
| ipv6_address_scope        | None                                 |
| is_default                | None                                 |
| is_vlan_transparent       | None                                 |
| mtu                       | 9216                                 |
| name                      | provider-4                           |
| port_security_enabled     | True                                 |
| project_id                | ad4b9a972ac64bd9916ad7ee80288353     |
| provider:network_type     | vlan                                 |
| provider:physical_network | provider4                            |
| provider:segmentation_id  | 178                                  |
| qos_policy_id             | None                                 |
| revision_number           | 2                                    |
| router:external           | Internal                             |
| segments                  | None                                 |
| shared                    | True                                 |
| status                    | ACTIVE                               |
| subnets                   | 76430b9e-302f-428d-916a-77482d9cfb19 |
| tags                      |                                      |
| updated_at                | 2024-03-01T16:46:01Z                 |
+---------------------------+--------------------------------------+
     3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-provider-openstack/pull/108

Bug OCPBUGS-43057: gather_network_logs_basics script when node is in the NotReady [backport 4.15]

View the Description View the linked PRs

This comes from this bug https://issues.redhat.com/browse/OCPBUGS-29940

After applying the workaround suggested [1][2] with "oc adm must-gather --node-name" we found another issue where must-gather creates the debug pod on all master nodes and gets stuck for a while because the script gather_network_logs_basics loop. Filtering out the NotReady nodes would allow us to apply the workaround.

The script gather_network_logs_basics gets the master nodes by label (node-role.kubernetes.io/master) and saves them in the CLUSTER_NODES variable. It then passes this as a parameter to the function gather_multus_logs $CLUSTER_NODES, where it loops through the list of master nodes and performs debugging for each node.

collection-scripts/gather_network_logs_basics
...
CLUSTER_NODES="${@:-$(oc get node -l node-role.kubernetes.io/master -oname)}"
/usr/bin/gather_multus_logs $CLUSTER_NODES
...

collection-scripts/gather_multus_logs
...
function gather_multus_logs {
  for NODE in "$@"; do
    nodefilename=$(echo "$NODE" | sed -e 's|node/||')
    out=$(oc debug "${NODE}" -- \
    /bin/bash -c "cat $INPUT_LOG_PATH" 2>/dev/null) && echo "$out" 1> "${OUTPUT_LOG_PATH}/multus-log-$nodefilename.log"
  done
}

This could be resolved with something similar to this:

CLUSTER_NODES="${@:-$(oc get node -l node-role.kubernetes.io/master -o json | jq -r '.items[] | select(.status.conditions[] | select(.type=="Ready" and .status=="True")).metadata.name')}"
/usr/bin/gather_multus_logs $CLUSTER_NODES

[1] - https://access.redhat.com/solutions/6962230
[2] - https://issues.redhat.com/browse/OCPBUGS-29940

https://github.com/openshift/must-gather/pull/450

Bug OCPBUGS-35496: Race condition in CPMS presubmits can cause not found error

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35476~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35416~~. The following is the description of the original issue:
—
Description of problem:

The presubmit test that expects an inactive CPMS to be regnerated, resets the state at the end of the test.
In doing so, it causes the CPMS generator to re-generate back to the original state.
Part of regeneration involves deleting and recreating the CPMS.

If the regeneration is not quick enough, the next part of the test can fail, as it is expecting the CPMS to exist.

We should change this to an eventually to avoid the race between the generator and the test.

See https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-control-plane-machine-set-operator/304/pull-ci-openshift-cluster-control-plane-machine-set-operator-release-4.13-e2e-aws-operator/1801195115868327936 as an example failure

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/308

Bug OCPBUGS-17811: Ensure Bootstrap has access to Image Registry Certs

View the Description View the linked PRs

Description of problem:

in 4.14, the MCO became the default provider of image registry certificates. However, all of these certs are put onto disk and into config in cluster. We need a way for components like hypershift, to be able to provide certificates they need to run properly during their bootstrap process.

Version-Release number of selected component (if applicable):

How reproducible:

always with hypershift

Steps to Reproduce:

1. bootstrap a hypershift cluster
2. will fail due to image pull errors

Actual results:

failure due to lack of IR certs

Expected results:

IR certs provided by the component who needs them via a cmd flag, bootstrap success.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3876

Bug OCPBUGS-18853: Update 4.15 openshift-enterprise-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/148

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/148

Bug OCPBUGS-19262: Update 4.15 ose-cluster-image-registry-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-image-registry-operator/pull/918

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-image-registry-operator/pull/918

Bug OCPBUGS-22721: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4017

Bug OCPBUGS-32355: multi-arch libvirt jobs need yq-v4

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32354~~. The following is the description of the original issue:
—
Description of problem:

    Recent changes to the multi-arch CI steps [1] now require yq-v4 to be present in the libvirt-installer container image.

[1] https://github.com/openshift/release/pull/50310

Version-Release number of selected component (if applicable):

all

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

    yq-v4 is present on the libvirt-installer CI image.

Additional info:

https://github.com/openshift/installer/pull/8277

Bug OCPBUGS-41981: [4.15]OLM catalogsource pods do not recover from node failure when registryPoll is none

View the Description View the linked PRs

Description of problem:

The pod of catalogsource without registryPoll wasn't recreated during the node failure

    jiazha-mac:~ jiazha$ oc get pods 
NAME                                    READY   STATUS        RESTARTS       AGE
certified-operators-rcs64               1/1     Running       0              123m
community-operators-8mxh6               1/1     Running       0              123m
marketplace-operator-769fbb9898-czsfn   1/1     Running       4 (117m ago)   136m
qe-app-registry-5jxlx                   1/1     Running       0              106m
redhat-marketplace-4bgv9                1/1     Running       0              123m
redhat-operators-ww5tb                  1/1     Running       0              123m
test-2xvt8                              1/1     Terminating   0              12m

jiazha-mac:~ jiazha$ oc get pods test-2xvt8 -o wide 
NAME         READY   STATUS    RESTARTS   AGE    IP            NODE                                          NOMINATED NODE   READINESS GATES
test-2xvt8   1/1     Running   0          7m6s   10.129.2.26   qe-daily-417-0708-cv2p6-worker-westus-gcrrc   <none>           <none>

jiazha-mac:~ jiazha$ oc get node qe-daily-417-0708-cv2p6-worker-westus-gcrrc
NAME                                          STATUS     ROLES    AGE    VERSION
qe-daily-417-0708-cv2p6-worker-westus-gcrrc   NotReady   worker   116m   v1.30.2+421e90e

Version-Release number of selected component (if applicable):

     Cluster version is 4.17.0-0.nightly-2024-07-07-131215

How reproducible:

    always

Steps to Reproduce:

    1. create a catalogsource without the registryPoll configure.

jiazha-mac:~ jiazha$ cat cs-32183.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: test
  namespace: openshift-marketplace
spec:
  displayName: Test Operators
  image: registry.redhat.io/redhat/redhat-operator-index:v4.16
  publisher: OpenShift QE
  sourceType: grpc

jiazha-mac:~ jiazha$ oc create -f cs-32183.yaml 
catalogsource.operators.coreos.com/test created

jiazha-mac:~ jiazha$ oc get pods test-2xvt8 -o wide 
NAME         READY   STATUS    RESTARTS   AGE     IP            NODE                                          NOMINATED NODE   READINESS GATES
test-2xvt8   1/1     Running   0          3m18s   10.129.2.26   qe-daily-417-0708-cv2p6-worker-westus-gcrrc   <none>           <none>


     2. Stop the node 
jiazha-mac:~ jiazha$ oc debug node/qe-daily-417-0708-cv2p6-worker-westus-gcrrc 
Temporary namespace openshift-debug-q4d5k is created for debugging node...
Starting pod/qe-daily-417-0708-cv2p6-worker-westus-gcrrc-debug-v665f ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.5
If you don't see a command prompt, try pressing enter.
sh-5.1# chroot /host
sh-5.1# systemctl stop kubelet; sleep 600; systemctl start kubelet


Removing debug pod ...
Temporary namespace openshift-debug-q4d5k was removed.

jiazha-mac:~ jiazha$ oc get node qe-daily-417-0708-cv2p6-worker-westus-gcrrc
NAME                                          STATUS     ROLES    AGE    VERSION
qe-daily-417-0708-cv2p6-worker-westus-gcrrc   NotReady   worker   115m   v1.30.2+421e90e


    3. check it this catalogsource's pod recreated.

Actual results:

No new pod was generated.

    jiazha-mac:~ jiazha$ oc get pods 
NAME                                    READY   STATUS        RESTARTS       AGE
certified-operators-rcs64               1/1     Running       0              123m
community-operators-8mxh6               1/1     Running       0              123m
marketplace-operator-769fbb9898-czsfn   1/1     Running       4 (117m ago)   136m
qe-app-registry-5jxlx                   1/1     Running       0              106m
redhat-marketplace-4bgv9                1/1     Running       0              123m
redhat-operators-ww5tb                  1/1     Running       0              123m
test-2xvt8                              1/1     Terminating   0              12m

once node recovery, a new pod was generated.


jiazha-mac:~ jiazha$ oc get node qe-daily-417-0708-cv2p6-worker-westus-gcrrc
NAME                                          STATUS   ROLES    AGE    VERSION
qe-daily-417-0708-cv2p6-worker-westus-gcrrc   Ready    worker   127m   v1.30.2+421e90e

jiazha-mac:~ jiazha$ oc get pods 
NAME                                    READY   STATUS    RESTARTS       AGE
certified-operators-rcs64               1/1     Running   0              127m
community-operators-8mxh6               1/1     Running   0              127m
marketplace-operator-769fbb9898-czsfn   1/1     Running   4 (121m ago)   140m
qe-app-registry-5jxlx                   1/1     Running   0              109m
redhat-marketplace-4bgv9                1/1     Running   0              127m
redhat-operators-ww5tb                  1/1     Running   0              127m
test-wqxvg                              1/1     Running   0              27s

Expected results:

During the node failure, a new catalog source pod should be generated.

Additional info:

Hi Team,

After some more investigating the source code of operator-lifecycle-manager, we figure out the reason.

The commit [1] try to fix this issue by adding "force deleting dead pod" process into ensurePod() function.
The ensurePod() is called by EnsureRegistryServer() [2].
However, the syncRegistryServer() will return immediately without calling EnsureRegistryServer() if there is no registryPoll in catalog [3].

There is no registryPoll defined in catalogsource that were generated when we build catalog image following Doc [4].

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operator-index
  namespace: openshift-marketplace
spec:
  image: quay-server.bastion.tokyo.com:5000/redhat/redhat-operator-index-logging:logging-vstable-5.8-v5.8.5
  sourceType: grpc

So the catalog pod created by the catalogsource cannot recovered.

And we verified that the catalog pod can be recreated on other node if we add the configuration of registryPoll to catalogsource as the following (The lines with <==).

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operator-index
  namespace: openshift-marketplace
spec:
  image: quay-server.bastion.tokyo.com:5000/redhat/redhat-operator-index-logging:logging-vstable-5.8-v5.8.5
  sourceType: grpc
  updateStrategy:   <==
    registryPoll:   <==
      interval: 10m <==

The registryPoll is NOT MUST for catalogsource.
So the commit [1] trying to fix the issue in EnsureRegistryServer() is not properly.

[1] https://github.com/operator-framework/operator-lifecycle-manager/pull/3201/files
[2] https://github.com/joelanford/operator-lifecycle-manager/blob/82f499723e52e85f28653af0610b6e7feff096cf/pkg/controller/registry/reconciler/grpc.go#L290
[3] https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/catalog/operator.go#L1009
[4] https://docs.openshift.com/container-platform/4.16/operators/admin/olm-managing-custom-catalogs.html

https://github.com/openshift/operator-framework-olm/pull/868

Bug OCPBUGS-18569: CNO pod restart in hypershift CI

View the Description View the linked PRs

We are seeing flakes on CNO pod restarts flake in hypershift CI on the hypershift control plane

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2967/pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn/1699008879737704448/artifacts/e2e-kubevirt-aws-ovn/run-e2e-local/artifacts/TestCreateCluster/namespaces/e2e-clusters-pvhd5-example-s6skm/core/pods/logs/cluster-network-operator-78fd774c97-7w7dg-cluster-network-operator-previous.log

W0905 11:42:53.359515       1 builder.go:106] graceful termination failed, controllers failed with error: failed to get infrastructure name: infrastructureName not set in infrastructure 'cluster'

The current backoff is set to retry.DefaultBackoff which is appropriate for 409 conflicts and only retries for < 1s

var DefaultBackoff = wait.Backoff{
	Steps:    4,
	Duration: 10 * time.Millisecond,
	Factor:   5.0,
	Jitter:   0.1,
}

Elsewhere in the codebase, retry.DefaultBackoff is used with retry.RetryOnConflict() where it is appropriate, but we need to retry for much longer here and much less frequently.

https://github.com/openshift/cluster-network-operator/pull/1986

Bug OCPBUGS-19517: auto-generated documentation for microshift includes unsupported commands

View the Description View the linked PRs

Description of problem:

~~OSDOCS-7408~~ lists some commands to be removed from the documentation for MicroShift because they are not supported.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1548

Bug OCPBUGS-21864: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc-mirror/pull/712

Bug OCPBUGS-23770: After PatternFly5 update: Typology sidebar layout issue

View the Description View the linked PRs

Issue 44 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

The Observe tab has Metric and Events within an Accordian component blue border is against the side bar container. Either remove it (currently) or add spacing between

Screenshot: https://drive.google.com/file/d/1i8SMUwTYXZL4CG0r1UXnxnm5e8QdAhQK/view?usp=sharing

https://github.com/openshift/console/pull/13363

Bug OCPBUGS-29283: Error in displaying BuildRun logs in Console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27473~~. The following is the description of the original issue:
—

Description of problem:

BuildRun logs cannot be displayed in the console and shows the following error:

The buildrun is created and started using the shp cli (similar behavior is observed when the build is created & started via console/yaml too):

shp build create goapp-buildah \
    --strategy-name="buildah" \
    --source-url="https://github.com/shipwright-io/sample-go" \
    --source-context-dir="docker-build" \
    --output-image="image-registry.openshift-image-registry.svc:5000/demo/go-app"

The issue occurs on OCP 4.14.6. Investigation showed that this works correctly on OCP 4.14.5.

https://github.com/openshift/console/pull/13595

Bug OCPBUGS-18720: Catalog pods in hypershift control plane in ImagePullBackOff

View the Description View the linked PRs

Description of problem:

Catalog pods in hypershift control plane in ImagePullBackOff

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a cluster in 4.14 HO + OCP 4.14.0-0.ci-2023-09-07-120503
2. Check controlplane pods, catalog pods in control plane namespace in ImagePullBackOff
3.

Actual results:

jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jie-test | grep catalog catalog-operator-64fd787d9c-98wx5                     2/2     Running            0          2m43s 
certified-operators-catalog-7766fc5b8-4s66z           0/1     ImagePullBackOff   0          2m43s 
community-operators-catalog-847cdbff6-wsf74           0/1     ImagePullBackOff   0          2m43s 
redhat-marketplace-catalog-fccc6bbb5-2d5x4            0/1     ImagePullBackOff   0          2m43s 
redhat-operators-catalog-86b6f66d5d-mpdsc             0/1     ImagePullBackOff   0          2m43s

Events:   Type     Reason          Age                 From               Message   ----     ------          ----                ----               -------   Normal   Scheduled       65m                 default-scheduler  Successfully assigned clusters-jie-test/certified-operators-catalog-7766fc5b8-4s66z to ip-10-0-64-135.us-east-2.compute.internal   Normal   AddedInterface  65m                 multus             Add eth0 [10.128.2.141/23] from openshift-sdn   Normal   Pulling         63m (x4 over 65m)   kubelet            Pulling image "from:imagestream"   Warning  Failed          63m (x4 over 65m)   kubelet            Failed to pull image "from:imagestream": rpc error: code = Unknown desc = reading manifest imagestream in docker.io/library/from: requested access to the resource is denied   Warning  Failed          63m (x4 over 65m)   kubelet            Error: ErrImagePull   Warning  Failed          63m (x6 over 65m)   kubelet            Error: ImagePullBackOff   Normal   BackOff         9s (x280 over 65m)  kubelet            Back-off pulling image "from:imagestream" jiezhao-mac:hypershift jiezhao$

Expected results:

catalog pods are running

Additional info:

slack:
https://redhat-internal.slack.com/archives/C01C8502FMM/p1694170060144859

https://github.com/openshift/hypershift/pull/3001

Bug OCPBUGS-34423: Fix audit-logs container to respect SIGTERM

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33181~~. The following is the description of the original issue:
—
Description of problem:

    The audit-logs container for kas, oapi and oauth apiservers does not terminate within the `TerminationGracePeriodSeconds` timer. This is due to the container not terminating when a `SIGTERM` command is issued.

When testing without the audit logs container, oapi and oath-apiserver terminates within a 90-110 second range gracefully. The kas does not terminate with the container gone and I have a hunch that it's the konnectivity container that also does not follow `SIGTERM` (I've attempted 10 minutes and still does not timeout).

So this issue is to change the logic for audit-logs to terminate gracefully and increase the TerminationGracePeriodSeconds from the default of 30s to 120s.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create a hypershift cluster with auditing enabled
    2. Try deleting apiserver pods and watch the pods being force deleted after 30 seconds (95 for kas) instead of gracefully terminated.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4089

Bug OCPBUGS-38764: The option "Auto deploy when new image is available" becomes unchecked when editing a deployment from web console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37048~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-36339~~. The following is the description of the original issue:
—
Description of problem:

The option "Auto deploy when new image is available" becomes unchecked when editing a deployment from web console

Version-Release number of selected component (if applicable):

4.15.17

How reproducible:

100%

Steps to Reproduce:

1. Goto Workloads --> Deployments --> Edit Deployment --> Under Images section --> Tick the option "Auto deploy when new Image is available" and now save deployment.
2. Now again edit the deployment and observe that the option "Auto deploy when new Image is available" is unchecked.
3. Same test work fine in 4.14 cluster.

Actual results:

Option "Auto deploy when new Image is available" is in unchecked state.

Expected results:

Option "Auto deploy when new Image is available" remains in checked state.

Additional info:

https://github.com/openshift/console/pull/14177

Bug OCPBUGS-21648: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/285

Bug OCPBUGS-22569: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/173

Bug OCPBUGS-24658: [release-4.15] Observer -> Alerting, Metrics and Targets page does not load

View the Description View the linked PRs

Description of problem:

    Observer - Alerting, Metrics, and Targets page does not load as expected, blank page would be shown

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-07-041003

How reproducible:

    Always

Steps to Reproduce:

    1.Navigate to Observer -> Alerting, Metrics, and Targets page directly
    2.
    3.

Actual results:

    Blank page, no data be loaded

Expected results:

    Work as normal

Additional info:

 Failed to load resource: the server responded with a status of 404 (Not Found)
/api/accounts_mgmt/v1/subscriptions?page=1&search=external_cluster_id%3D%2715ace915-53d3-4455-b7e3-b7a5a4796b5c%27:1

Failed to load resource: the server responded with a status of 403 (Forbidden)
main-chunk-bb9ed989a7f7c65da39a.min.js:1 API call to get support level has failed r: Access denied due to cluster policy.
    at https://console-openshift-console.apps.ci-ln-9fl1l5t-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-bb9ed989a7f7c65da39a.min.js:1:95279
(anonymous) @ main-chunk-bb9ed989a7f7c65da39a.min.js:1
/api/kubernetes/apis/operators.coreos.com/v1alpha1/namespaces/#ALL_NS#/clusterserviceversions?:1
        
        
       Failed to load resource: the server responded with a status of 404 (Not Found)
vendor-patternfly-5~main-chunk-95cb256d9fa7738d2c46.min.js:1 Modal: When using hasNoBodyWrapper or setting a custom header, ensure you assign an accessible name to the the modal container with aria-label or aria-labelledby.

https://github.com/openshift/monitoring-plugin/pull/85

Bug OCPBUGS-48048: [release-4.15] Patternfly 5 components are missing their CSS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41672~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-38845~~. The following is the description of the original issue:
—
Description of problem:

    The css of some components isn't loading properly (Banner, Jumplinks)

See screenshot: https://photos.app.goo.gl/2Z1cK5puufGBVBcu5

On the screen cast, ex-aao in namespace default is a banner, and should look like: https://photos.app.goo.gl/n4LUgrGNzQT7n1Pr8

The vertical jumplinks should look like: https://photos.app.goo.gl/8GAs71S43PnAS7wH7

You can test our plugin: https://github.com/artemiscloud/activemq-artemis-self-provisioning-plugin/pull/278

1. yarn

2. yarn start

3. navigate to http://localhost:9000/k8s/ns/default/add-broker

https://github.com/openshift/console/pull/14662

Bug OCPBUGS-27777: no ipsec on cluster post NS mc's deletion during ipsecConfig mode `Full`

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26952~~. The following is the description of the original issue:
—
Description of problem:

 no ipsec on cluster post NS mc's deletion during ipsecConfig mode `Full`, on an upgraded cluster from 4.14 ->4.15 build

Version-Release number of selected component (if applicable):

 bot build on https://github.com/openshift/cluster-network-operator/pull/2191

How reproducible:

    Always

Steps to Reproduce:

Steps:
1. Cluster on EW+NS cluster(4.14), Upgraded to above bot build to check ipsecConfig modes 
2. ipsecConfig mode changed to Full
3. Deleted NS MCs 
4. new MCs spawned up as `80-ipsec-master-extensions` and `80-ipsec-worker-extensions`
5. cluster settled with no ipsec at all (no ovn-ipsec-host ds)
6. mode still Full

Actual results:

mode Full actually replicated Diasbled state on above steps

Expected results:

Just NS IPsec should have gone away. EW should have persisted

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2215

Bug OCPBUGS-28944: openshift/csi-driver-shared-resource - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28663~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource/pull/166

Task MGMT-16313: Add support for C++ to NFD

View the Description View the linked PRs

Some drivers have parts written in C+, like the NVIDIA Open GPU drivers. We need to add gcc-c+ to the DTK image

https://github.com/openshift/driver-toolkit/pull/137

Bug OCPBUGS-21878: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-powervs/pull/53

Bug OCPBUGS-23475: [Reliability][regression]multus pods memory increased from <100M to 700+M in 7 days

View the Description View the linked PRs

In Reliability (loaded longrun, the load is stable) test, the 3 multus pods memory increased from <100 MiB to 700+MB in 7 days.

The multus pods have requests memory: 65Mi, while there is no memory limit. If the test run for longer time and the memory keep increasing, this issue can impact the nodes' resource.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-13-174800

How reproducible:

Met this the first time. I did not see this in 4.14's Reliability test.

Steps to Reproduce:

1. Install a AWS compact cluster with 3 masters, workers are on master nodes too. O
2. Run reliability-v2 test https://github.com/openshift/svt/tree/master/reliability-v2. The test will long run and simulate multiple customers usage on the cluster.
config: 1 admin, 5 dev-test, 5 dev-prod, 1 dev-cron.
3. Monitor the metrics: container_memory_rss{container="kube-multus",namespace="openshift-multus"}

Actual results:

3 multus pods memory increased from <100 MiB to 700+MB in 7 days.
After the test load stopped, the memory increase stopped, but didn't drop down.

Expected results:

memory should not continuous increase

Additional info:

% oc adm top pod -n openshift-multus --containers=true --sort-by memory -l app=multus
POD NAME CPU(cores) MEMORY(bytes)
multus-xp474 kube-multus 12m 1275Mi
multus-xp474 POD 0m 0Mi
multus-xt64s kube-multus 21m 971Mi
multus-xt64s POD 0m 0Mi
multus-d9xcs kube-multus 6m 757Mi
multus-d9xcs POD 0m 0Mi

The monitoring screenshots:

multus-memory-increase.png

multus-memory-increase-stop.png

Must-gather: must-gather.local.4628887688332215806.tar.gz

https://github.com/openshift/multus-cni/pull/201

Bug OCPBUGS-26480: GCP CCM credentials should be granular

View the Description View the linked PRs

Description of problem:

GCP CCM should be using granular permissions rather then pre-defined roles.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/320

Bug OCPBUGS-37760: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vsphere-problem-detector/pull/170

Bug OCPBUGS-22743: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/185

Bug OCPBUGS-26607: CVO does not reconcile metadata on ClusterOperators

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26014~~. The following is the description of the original issue:
—

Description of problem:

While testing oc adm upgrade status against b02, I noticed some COs do not have any annotations, while I expected them to have the include/exclude.release.openshift.io/* ones (to recognize COs that come from the payload).

$ b02 get clusteroperator etcd -o jsonpath={.metadata.annotations}
$ ota-stage get clusteroperator etcd -o jsonpath={.metadata.annotations}
{"exclude.release.openshift.io/internal-openshift-hosted":"true","include.release.openshift.io/self-managed-high-availability":"true","include.release.openshift.io/single-node-developer":"true"}

CVO does not reconcile CO resources once they exist, only precreates them but does not touch them once they exist. Build02 does not have CO with reconciled metadata because it was born as 4.2 which (AFAIK) is before OCP started to use the exclude/include annotations.

Version-Release number of selected component (if applicable):

4.16 (development branch)

How reproducible:

deterministic

Steps to Reproduce:

1. delete an annotation on a ClusterOperator resource

Actual results:

The annotation wont be recreated

Expected results:

The annotation should be recreated

https://github.com/openshift/cluster-version-operator/pull/1017

Bug OCPBUGS-3356: HAproxy warning when httpCaptureCookies.maxLength exceeds 63 bytes

View the Description View the linked PRs

Description of problem:
IHAC with OCP 4.9 who has configured the IngressControllers with a long httpLogFormat, and the routers are printing every time it reloads

I0927 13:29:45.495077 1 router.go:612] template "msg"="router reloaded" "output"="[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'public'.\n[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'fe_sni'.\n[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'fe_no_sni'.\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

This is the Ingress Contoller configuration:

  logging:
    access:
      destination:
        syslog:
          address: 10.X.X.X
          port: 10514
        type: Syslog
      httpCaptureCookies:
      - matchType: Exact
        maxLength: 128
        name: ITXSESSIONID
      httpCaptureHeaders:
        request:
        - maxLength: 128
          name: Host
        - maxLength: 128
          name: itxrequestid
      httpLogFormat: actconn="%ac",backend_name="%b",backend_queue="%bq",backend_source_ip="%bi",backend_source_port="%bp",beconn="%bc",bytes_read="%B",bytes_uploaded="%U",captrd_req_cookie="%CC",captrd_req_headers="%hr",captrd_res_cookie="%CS",captrd_res_headers="%hs",client_ip="%ci",client_port="%cp",cluster="ieec1ocp1",datacenter="ieec1",environment="pro",fe_name_transport="%ft",feconn="%fc",frontend_name="%f",hostname="%H",http_version="%HV",log_type="http",method="%HM",query_string="%HQ",req_date="%tr",request="%HP",res_time="%TR",retries="%rc",server_ip="%si",server_name="%s",server_port="%sp",srv_queue="%sq",srv_conn="%sc",srv_queue="%sq",status_code="%ST",Ta="%Ta",Tc="%Tc",tenant="bk",term_state="%tsc",tot_wait_q="%Tw",Tr="%Tr"
      logEmptyRequests: Ignore

Any way to avoid this truncate warning?

How reproducible:
For every reload of haproxy config

Steps to Reproduce:
You can reproduce easily with the following configuration in the default ingress controller:

logging:
access:
destination:
type: Container
httpCaptureCookies:

matchType: Exact
maxLength: 128
name: _abck
And accessing from out console, you will get a log like:

2022-10-18T14:13:53.068164+00:00 xxxx xxxxxx haproxy[38]: 10.39.192.203:40698 [18/Oct/2022:14:13:52.488] fe_sni~ be_secure:openshift-console:console/pod:console-5976495467-zxgxr:console:https:10.128.1.116:8443 0/0/0/10/580 200 1130598 _abck=B7EA642C9E828FA8210F329F80B7B2D80YAAQnVozuFVfkOaDAQAADk - --VN 78/37/33/33/0 0/0 "GET /api/kubernetes/openapi/v2 HTTP/1.1"

https://github.com/openshift/cluster-ingress-operator/pull/871

Bug MON-3551: Fix jq command in local cmo run

View the Description View the linked PRs

Fix jq command in local cmo run

https://github.com/openshift/cluster-monitoring-operator/pull/2180

Bug OCPBUGS-16871: MCO - currentConfig missing on the filesystem

View the Description View the linked PRs

Description of problem:

If the `currentConfig` is removed from the master node, the Machine Config Daemon will not recreate it. 

The logs will say:
~~~
W0726 23:57:35.890645 3013426 daemon.go:1097] Got an error from auxiliary tools: could not get current config from disk: open /etc/machine-config-daemon/currentconfig: no such file or directory
~~~

However, the MCD won't create that currentconfig.
Is this desired state?

The workaround is to create the correct annotation

Version-Release number of selected component (if applicable):

OpenShift 4.12 and tested on 4.13

How reproducible:

- remove the currentConfig from the node
- check the status of the MCD

Steps to Reproduce:

1.
2.
3.

Actual results:

- the currentconfig is missing - stopping the MCD

Expected results:

- if the currentconfig is missing, MCD should reconcile based on the desiredconfig label of the node

Additional info:

https://github.com/openshift/machine-config-operator/pull/3963

Bug OCPBUGS-30884: kdump doesn't create the dumpfile via ssh with OVN

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30239~~. The following is the description of the original issue:
—
kdump crash logs are not created to the SSH remote when OVN is configured.

See https://issues.redhat.com/browse/OCPBUGS-28239

https://github.com/openshift/machine-config-operator/pull/4259

Bug OCPBUGS-33332: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/536

Bug OCPBUGS-37174: Removing imageContentSources from HostedCluster does not update IDMS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36766~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34820~~. The following is the description of the original issue:
—
Description of problem:

    Removing imageContentSources from HostedCluster does not update IDMS for the cluster.

Version-Release number of selected component (if applicable):

    Tested with 4.15.14

How reproducible:

    100%

Steps to Reproduce:

    1. add imageContentSources to HostedCluster
    2. verify it is applied to IDMS
    3. remove imageContentSources from HostedCluster

Actual results:

    IDMS is not updated to remove imageDigestMirrors contents

Expected results:

    IDMS is updated to remove imageDigestMirrors contents

Additional info:

    Workaround, set imageContentSources=[]

https://github.com/openshift/hypershift/pull/4457

Bug OCPBUGS-22742: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2138

Bug OCPBUGS-27491: [release-4.15] tls: bad certificate from kube-apiserver-operator

View the Description View the linked PRs

As this shows tls: bad certificate from kube-apiserver operator, for example, https://reportportal-openshift.apps.ocp-c1.prod.psi.redhat.com/ui/#prow/launches/all/470214, checked its must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-aws-ipi-imdsv2-fips-f14/1726036030588456960/artifacts/aws-ipi-imdsv2-fips-f14/gather-must-gather/artifacts/

MacBook-Pro:~ jianzhang$ omg logs prometheus-operator-admission-webhook-6bbdbc47df-jd5mb | grep "TLS handshake"
2023-11-27 10:11:50.687 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
2023-11-19T00:57:08.318983249Z ts=2023-11-19T00:57:08.318923708Z caller=stdlib.go:105 caller=server.go:3215 msg="http: TLS handshake error from 10.129.0.35:48334: remote error: tls: bad certificate"
2023-11-19T00:57:10.336569986Z ts=2023-11-19T00:57:10.336505695Z caller=stdlib.go:105 caller=server.go:3215 msg="http: TLS handshake error from 10.129.0.35:48342: remote error: tls: bad certificate"
...
MacBook-Pro:~ jianzhang$ omg get pods -A -o wide | grep "10.129.0.35"
2023-11-27 10:12:16.382 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
openshift-kube-apiserver-operator                 kube-apiserver-operator-f78c754f9-rbhw9                          1/1    Running    2         5h27m  10.129.0.35   ip-10-0-107-238.ec2.internal

for more information slack - https://redhat-internal.slack.com/archives/CC3CZCQHM/p1700473278471309

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1628

Bug OCPBUGS-43675: [IBMCloud] MAPI only checks first set of subnets (no pagination support)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36698~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-36185~~. The following is the description of the original issue:
—
Description of problem:

    The MAPI for IBM Cloud currently only checks the first group of subnets (50) when searching for Subnet details by name. It should provide pagination support to search all subnets.

Version-Release number of selected component (if applicable):

    4.17

How reproducible:

    100%, dependent on order of subnets returned by IBM Cloud API's however

Steps to Reproduce:

    1. Create 50+ IBM Cloud VPC Subnets
    2. Create a new IPI cluster (with or without BYON)
    3. MAPI will attempt to find Subnet details by name, likely failing as it only checks the first group (50)...depending on order returned by IBM Cloud API

Actual results:

    MAPI fails to find Subnet ID, thus cannot create/manage cluster nodes.

Expected results:

    Successful IPI deployment.

Additional info:

    IBM Cloud is working on a patch to MAPI to handle the ListSubnets API call and pagination results.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/48

Task OPRUN-3078: Downstream Sync for rukpak v0.15.0

View the Description View the linked PRs

Bring the downstream rukpak repo up-to-date with the v0.15.0 upstream release.

https://github.com/openshift/operator-framework-rukpak/pull/50

Bug OCPBUGS-12707: Master MCP is degraded because of MC not found

View the Description View the linked PRs

Description of problem:


When we deploy a cluster in AWS using this template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci master MCP is degraded and reports this error:

  - lastTransitionTime: "2023-04-25T07:48:45Z"
    message: 'Node ip-10-0-55-111.us-east-2.compute.internal is reporting: "machineconfig.machineconfiguration.openshift.io
      \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\" not found", Node ip-10-0-60-138.us-east-2.compute.internal
      is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\"
      not found", Node ip-10-0-69-137.us-east-2.compute.internal is reporting: "machineconfig.machineconfiguration.openshift.io
      \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\" not found"'
    reason: 3 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         3h12m   Error while reconciling 4.14.0-0.nightly-2023-04-19-125337: the cluster operator machine-config is degraded

How reproducible:

2 out of 2.

Steps to Reproduce:

1. Install OCP using this template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci

We can see examples of this installation here:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/198964/

and here:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/199028/


Builds have been marked as keep forever, but just in case, the parameters are:

INSTANCE_NAME_PREFIX: Your ID, any short string just make it sure it is unit.
VARIABLES_LOCATION: private-templates/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci
LAUNCHER_VARS: <leave empty>
BUSHSLICER_CONFIG:<leave emtpy>

Actual results:


The installation failed reporting a degrade master MCP

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         3h12m   Error while reconciling 4.14.0-0.nightly-2023-04-19-125337: the cluster operator machine-config is degraded

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                                                      False     True       True       3              0                   0                     3                      4h21m
worker   rendered-worker-166729d2617b1b63cf5d9bb818dd9cf8   True      False      False      3              3                   3                     0                      4h21m

Expected results:

Installation should finish without problems and no MCP should be degraded

Additional info:

Must gather linked in the first comment

https://github.com/openshift/installer/pull/7514

Bug OCPBUGS-34492: [Upgrade] kube-apiserver stuck in updating versions when upgrade from old releases

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34408~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33963~~. The following is the description of the original issue:
—
Description of problem:

kube-apiserver was stuck in updating versions when upgrade from 4.1 to 4.16 with AWS ipi installation

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-01-111315

How reproducible:

    always

Steps to Reproduce:

    1. IPI Install an AWS 4.1 cluster, upgrade it to 4.16
    2. Upgrade was stuck in 4.15 to 4.16, waiting on etcd, kube-apiserver updating

Actual results:

   1. Upgrade was stuck in 4.15 to 4.16, waiting on etcd, kube-apiserver updating
   $ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-05-16-091947   True        True          39m     Working towards 4.16.0-0.nightly-2024-05-16-092402: 111 of 894 done (12% complete)

Expected results:

Upgrade should be successful.

Additional info:

Must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.1-aws-ipi-f30/1791391925467615232/artifacts/aws-ipi-f30/gather-must-gather/artifacts/must-gather.tar

Checked the must-gather logs, 
$ omg get clusterversion -oyaml
...
conditions:
  - lastTransitionTime: '2024-05-17T09:35:29Z'
    message: Done applying 4.15.0-0.nightly-2024-05-16-091947
    status: 'True'
    type: Available
  - lastTransitionTime: '2024-05-18T06:31:41Z'
    message: 'Multiple errors are preventing progress:

      * Cluster operator kube-apiserver is updating versions

      * Could not update flowschema "openshift-etcd-operator" (82 of 894): the server
      does not recognize this resource, check extension API servers'
    reason: MultipleErrors
    status: 'True'
    type: Failing

$ omg get co | grep -v '.*True.*False.*False'
NAME                                      VERSION                             AVAILABLE  PROGRESSING  DEGRADED  SINCE
kube-apiserver                            4.15.0-0.nightly-2024-05-16-091947  True       True         False     10m

$ omg get pod -n openshift-kube-apiserver
NAME                                               READY  STATUS     RESTARTS  AGE
installer-40-ip-10-0-136-146.ec2.internal          0/1    Succeeded  0         2h29m
installer-41-ip-10-0-143-206.ec2.internal          0/1    Succeeded  0         2h25m
installer-43-ip-10-0-154-116.ec2.internal          0/1    Succeeded  0         2h22m
installer-44-ip-10-0-154-116.ec2.internal          0/1    Succeeded  0         1h35m
kube-apiserver-guard-ip-10-0-136-146.ec2.internal  1/1    Running    0         2h24m
kube-apiserver-guard-ip-10-0-143-206.ec2.internal  1/1    Running    0         2h24m
kube-apiserver-guard-ip-10-0-154-116.ec2.internal  0/1    Running    0         2h24m
kube-apiserver-ip-10-0-136-146.ec2.internal        5/5    Running    0         2h27m
kube-apiserver-ip-10-0-143-206.ec2.internal        5/5    Running    0         2h24m
kube-apiserver-ip-10-0-154-116.ec2.internal        4/5    Running    17        1h34m
revision-pruner-39-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h44m
revision-pruner-39-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h50m
revision-pruner-39-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h52m
revision-pruner-40-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h29m
revision-pruner-40-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h29m
revision-pruner-40-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h29m
revision-pruner-41-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h26m
revision-pruner-41-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h26m
revision-pruner-41-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h26m
revision-pruner-42-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h24m
revision-pruner-42-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-42-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-43-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-43-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-43-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-44-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         1h35m
revision-pruner-44-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         1h35m
revision-pruner-44-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         1h35m

Checked the kube-apiserver kube-apiserver-ip-10-0-154-116.ec2.internal logs, seems something wring with informers, 
$ grep 'informers not started yet' current.log  | wc -l
360

$ grep 'informers not started yet' current.log 
2024-05-18T06:34:51.888804183Z [-]informer-sync failed: 4 informers not started yet: [*v1.PriorityLevelConfiguration *v1.Secret *v1.FlowSchema *v1.ConfigMap]
2024-05-18T06:34:51.889350484Z [-]informer-sync failed: 4 informers not started yet: [*v1.PriorityLevelConfiguration *v1.FlowSchema *v1.Secret *v1.ConfigMap]
2024-05-18T06:34:52.004808401Z [-]informer-sync failed: 2 informers not started yet: [*v1.FlowSchema *v1.PriorityLevelConfiguration]
2024-05-18T06:34:52.095516498Z [-]informer-sync failed: 2 informers not started yet: [*v1.PriorityLevelConfiguration *v1.FlowSchema]
...

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1692

Bug OCPBUGS-36322: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-kube-storage-version-migrator-operator)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the cluster-kube-storage-version-migrator-operator.

Namespaces with workloads that need pinning:

openshift-kube-storage-version-migrator
openshift-kube-storage-version-migrator-operator

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/112

Bug OCPBUGS-37419: Resolve snyk issue: k8s.io/client-go/transport [4.15]

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37418~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-37334~~. The following is the description of the original issue:
—
Description of problem:

    ci/prow/security is failing: k8s.io/client-go/transport

Version-Release number of selected component (if applicable):

4.16

How reproducible:

    always

Steps to Reproduce:

    1. trigger ci/prow/security on a pull request
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/738

Bug OCPBUGS-24082: Update 4.15 ose-cluster-dns-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/396

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/396

Bug OCPBUGS-41949: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-controller-manager/pull/291

Bug OCPBUGS-16922: AdmissionWebhookMatchConditions tests are failing with Kubernetes 1.28 bump

View the Description View the linked PRs

Description of problem:

AdmissionWebhookMatchConditions are enabled by default in Kubernetes 1.28, but we are currently disabling the feature gate in openshift/api.

As a result, e2e tests are failing with Kubernetes 1.28 bump:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/1646/pull-ci-openshift-kubernetes-master-e2e-aws-ovn-fips/1684354421837795328

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

AdmissionWebhookMatchConditions tests are failing

Expected results:

AdmissionWebhookMatchConditions should pass

Additional info:

Let me know once this is fixed so that we can drop the commit that skip these tests.

https://github.com/openshift/kubernetes/pull/1790

Story TRT-1361: Hypershift failures blocking CI payloads

View the Description View the linked PRs

Two payloads in a row, first had more failures, second had less but still broken.

Both exhibit this status on the console operator:

  status:
    conditions:
    - lastTransitionTime: "2023-11-17T06:06:57Z"
      message: 'OAuthClientSyncDegraded: the server is currently unable to handle
        the request (get oauthclients.oauth.openshift.io console)'
      reason: OAuthClientSync_FailedRegister
      status: "True"
      type: Degraded

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1725383840110743552/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestUpgradeControlPlane/hostedcluster-example-dcxq4/cluster-scoped-resources/config.openshift.io/clusteroperators.yaml

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1725305112194191360/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestUpgradeControlPlane/hostedcluster-example-c7bz4/cluster-scoped-resources/config.openshift.io/clusteroperators.yaml

We are suspicious of this PR, however this change was before the payloads started failing, perhaps the issue only surfaces on upgrades once the change was in an accepted payload: https://github.com/openshift/console-operator/pull/808

There is also a hypershift PR that was only present in second failed payload, possibly a reaction to the problem but didn't fully fix? There were less failures in the second payload than the first: https://github.com/openshift/hypershift/pull/3151 ? If so, this will complicate a revert.

Discussion: https://redhat-internal.slack.com/archives/C01C8502FMM/p1700226091335339

https://github.com/openshift/console-operator/pull/813

Bug OCPBUGS-26074: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/109

Bug OCPBUGS-27758: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-29442: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-23473~~.

https://github.com/openshift/installer/pull/8016

Bug OCPBUGS-29658: [4.15] console-operator is unable to add its OIDC client info

View the Description View the linked PRs

Description of problem:

    If the authentication.config/cluster Type=="" but the OAuth/User APIs are already missing, the console-operator won't update the authentication.config/cluster status with its own client as it's crashing on being unable to retrieve OAuthClients.

Version-Release number of selected component (if applicable):

    4.15.0

How reproducible:

    100%

Steps to Reproduce:

    1. scale oauth-apiserver to 0
    2. set featuregates to TechPreviewNotUpgradable
    3. watch the authentication.config/cluster .status.oidcClients

Actual results:

    The client for the console does not appear.

Expected results:

    The client for the console should appear.

Additional info:

https://github.com/openshift/console-operator/pull/870

Bug OCPBUGS-35495: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13977

Task MON-3530: Update Owners file in Openshift-State-Metric repository

View the Description View the linked PRs

Update the owners file in openshift-state-metric repository, add new team mates in, move old team mates out.

https://github.com/openshift/openshift-state-metrics/pull/110

Bug OCPBUGS-19272: Update 4.15 ose-powervs-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-powervs/pull/43

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-powervs/pull/43

Bug OCPBUGS-26591: [release-4.15] Web Console Shows Non-printable file detected

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18699~~. The following is the description of the original issue:
—
Description of problem:

Openshift Console shows "Info alert:Non-printable file detected. File contains non-printable characters. Preview is not available." while edit an XML file type configmaps.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create configmap from file:
# oc create cm test-cm --from-file=server.xml=server.xml
configmap/test-cm created

2. If we try to edit the configmap in the OCP console we see the following error:

Info alert:Non-printable file detected.
File contains non-printable characters. Preview is not available.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13499

Bug OCPBUGS-32093: Accessing FeatureGates in 4.15-to-4.16 updates with 4.15 RBAC

View the Description View the linked PRs

Description of problem

Seth Jennings noticed that HyperShift HostedCluster update CI is struggling with new DNS operators vs. old ClusterRoles since ~~CFE-852~~ landed both configInformers.Config().V1().FeatureGates() calls to the 0000_70_dns-operator_02-deployment.yaml operator and FeatureGate RBAC to the 0000_70_dns-operator_00-cluster-role.yaml ClusterRole. In standalone clusters, the cluster-version operator ensures that ClusterRole is reconciled before bumping the operator Deployment, so all goes smoothly. In HyperShift, the HostedControlPlane controller is rolling out the new Deployment in parallel with the cluster-version operator rolling out the new ClusterRole, and when the Deployment wins that race, there can be a few rounds of crash-looping like:

: TestUpgradeControlPlane/Main/EnsureNoCrashingPods	0s
{Failed  === RUN   TestUpgradeControlPlane/Main/EnsureNoCrashingPods
    util.go:488: Container dns-operator in pod dns-operator-687bd5d756-c48qm has a restartCount > 0 (3)
        --- FAIL: TestUpgradeControlPlane/Main/EnsureNoCrashingPods (0.02s)
}

with pod logs like:

...
W0410 22:58:02.495248       1 reflector.go:535] github.com/openshift/client-go/config/informers/externalversions/factory.go:116: failed to list *v1.FeatureGate: featuregates.config.openshift.io is forbidden: User "system:serviceaccount:openshift-dns-operator:dns-operator" cannot list resource "featuregates" in API group "config.openshift.io" at the cluster scope
E0410 22:58:02.495277       1 reflector.go:147] github.com/openshift/client-go/config/informers/externalversions/factory.go:116: Failed to watch *v1.FeatureGate: failed to list *v1.FeatureGate: featuregates.config.openshift.io is forbidden: User "system:serviceaccount:openshift-dns-operator:dns-operator" cannot list resource "featuregates" in API group "config.openshift.io" at the cluster scope
time="2024-04-10T22:58:22Z" level=error msg="<nil>timed out waiting for FeatureGate detection"
time="2024-04-10T22:58:22Z" level=fatal msg="failed to create operator: timed out waiting for FeatureGate detection"

Eventually RBAC will catch up, and the cluster will heal. But the crash-looping fails the CI test-case, which is expecting a more elegant transition.

Version-Release number of selected component

Updates that cross ~~CFE-852~~, e.g. 4.15 to new 4.16 nightlies.

How reproducible

Racy. Sometimes the CVO gets the ClusterRole bumped quickly enough for the Deployment bump to happen smoothly. I'm unclear on odds for the race.

Steps to Reproduce

Run a bunch of HyperShift e2e.

Actual results

Racy failures for the TestUpgradeControlPlane/Main/EnsureNoCrashingPods test case.

Expected results

Reliable success for this test case, with smooth updates.

Additional info

There are a number of possible approaches to make HyperShift e2e more happy about these updates. Personally I think we want something like OTA-951 long-term, so HyperShift would have the same "ClusterRole will be bumped first" handling that standalone is getting today. But that's a bigger architectural lift. One simpler pivot to cover the current HyperShift approach would be to backport the RBAC additions to the 4.15.z ClusterRole and raise minor_min to push clusters through that newer 4.15.z or later, where they'd pick up the new RBAC, before they were recommended to head off to 4.16 releases that would require the new RBAC to be in place. That 4.15.z 0000_70_dns-operator_00-cluster-role.yaml RBAC backport is what this ticket is asking for. Although if folks have even less invasive ideas for denoising these HyperShift updates, that would be great .

https://github.com/openshift/cluster-dns-operator/pull/407

Bug OCPBUGS-23862: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/88

Bug OCPBUGS-28902: egressIP with IPv6 not working on dualstack cluster on openstack

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27222~~. The following is the description of the original issue:
—
Description of problem:

On ipv6primary dualstack cluster, creating an ipv6 egressIP following this procedure:

https://docs.openshift.com/container-platform/4.14/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.html

is not working. ovnkube-cluster-manager shows below error:

2024-01-16T14:48:18.156140746Z I0116 14:48:18.156053       1 obj_retry.go:358] Adding new object: *v1.EgressIP egress-dualstack-ipv6
2024-01-16T14:48:18.161367817Z I0116 14:48:18.161269       1 obj_retry.go:370] Retry add failed for *v1.EgressIP egress-dualstack-ipv6, will try again later: cloud add request failed for CloudPrivateIPConfig: fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333, err: CloudPrivateIPConfig.cloud.network.openshift.io "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333" is invalid: [<nil>: Invalid value: "": "metadata.name" must validate at least one schema (anyOf), metadata.name: Invalid value: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333": metadata.name in body must be of type ipv4: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333"]
2024-01-16T14:48:18.161416023Z I0116 14:48:18.161357       1 event.go:298] Event(v1.ObjectReference{Kind:"EgressIP", Namespace:"", Name:"egress-dualstack-ipv6", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'CloudAssignmentFailed' egress IP: fd2e:6f44:5dd8:c956:f816:3eff:fef0:3333 for object EgressIP: egress-dualstack-ipv6 could not be created, err: CloudPrivateIPConfig.cloud.network.openshift.io "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333" is invalid: [<nil>: Invalid value: "": "metadata.name" must validate at least one schema (anyOf), metadata.name: Invalid value: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333": metadata.name in body must be of type ipv4: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333"]
2024-01-16T14:49:37.714410622Z I0116 14:49:37.714342       1 reflector.go:790] k8s.io/client-go/informers/factory.go:159: Watch close - *v1.Service total 8 items received
2024-01-16T14:49:48.155826915Z I0116 14:49:48.155330       1 obj_retry.go:296] Retry object setup: *v1.EgressIP egress-dualstack-ipv6
2024-01-16T14:49:48.156172766Z I0116 14:49:48.155899       1 obj_retry.go:358] Adding new object: *v1.EgressIP egress-dualstack-ipv6
2024-01-16T14:49:48.168795734Z I0116 14:49:48.168520       1 obj_retry.go:370] Retry add failed for *v1.EgressIP egress-dualstack-ipv6, will try again later: cloud add request failed for CloudPrivateIPConfig: fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333, err: CloudPrivateIPConfig.cloud.network.openshift.io "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333" is invalid: [<nil>: Invalid value: "": "metadata.name" must validate at least one schema (anyOf), metadata.name: Invalid value: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333": metadata.name in body must be of type ipv4: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333"]
2024-01-16T14:49:48.169400971Z I0116 14:49:48.168937       1 event.go:298] Event(v1.ObjectReference{Kind:"EgressIP", Namespace:"", Name:"egress-dualstack-ipv6", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'CloudAssignmentFailed' egress IP: fd2e:6f44:5dd8:c956:f816:3eff:fef0:3333 for object EgressIP: egress-dualstack-ipv6 could not be created, err: CloudPrivateIPConfig.cloud.network.openshift.io "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333" is invalid: [<nil>: Invalid value: "": "metadata.name" must validate at least one schema (anyOf), metadata.name: Invalid value: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333": metadata.name in body must be of type ipv4: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333"]

Same is observed with ipv6 subnet on slaac mode.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-06-062415
RHOS-16.2-RHEL-8-20230510.n.1

How reproducible: Always.
Steps to Reproduce:

Applying below:

$ oc label node/ostest-8zrlf-worker-0-4h78l k8s.ovn.org/egress-assignable=""

$ cat egressip_ipv4.yaml && cat egressip_ipv6.yaml 
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-dualstack-ipv4
spec:
  egressIPs:
    - 192.168.192.111
  namespaceSelector:
    matchLabels: 
      app: egress
      
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-dualstack-ipv6
spec:
  egressIPs:
    - fd2e:6f44:5dd8:c956:f816:3eff:fef0:3333
  namespaceSelector:
    matchLabels: 
      app: egress

$ oc apply -f egressip_ipv4.yaml
$ oc apply -f egressip_ipv6.yaml

But it shows only info about ipv4 egressIP. The IPv6 port is not even created in openstack:

oc logs -n openshift-cloud-network-config-controller cloud-network-config-controller-67cbc4bc84-786jm 
I0116 13:15:48.914323       1 controller.go:182] Assigning key: 192.168.192.111 to cloud-private-ip-config workqueue
I0116 13:15:48.928927       1 cloudprivateipconfig_controller.go:357] CloudPrivateIPConfig: "192.168.192.111" will be added to node: "ostest-8zrlf-worker-0-4h78l"
I0116 13:15:48.942260       1 cloudprivateipconfig_controller.go:381] Adding finalizer to CloudPrivateIPConfig: "192.168.192.111"
I0116 13:15:48.943718       1 controller.go:182] Assigning key: 192.168.192.111 to cloud-private-ip-config workqueue
I0116 13:15:49.758484       1 openstack.go:760] Getting port lock for portID 8854b2e9-3139-49d2-82dd-ee576b0a0cce and IP 192.168.192.111
I0116 13:15:50.547268       1 cloudprivateipconfig_controller.go:439] Added IP address to node: "ostest-8zrlf-worker-0-4h78l" for CloudPrivateIPConfig: "192.168.192.111"
I0116 13:15:50.602277       1 controller.go:160] Dropping key '192.168.192.111' from the cloud-private-ip-config workqueue
I0116 13:15:50.614413       1 controller.go:160] Dropping key '192.168.192.111' from the cloud-private-ip-config workqueue

$ openstack port list --network network-dualstack | grep -e 192.168.192.111 -e 6f44:5dd8:c956:f816:3eff:fef0:3333
| 30fe8d9a-c1c6-46c3-a873-9a02e1943cb7 | egressip-192.168.192.111      | fa:16:3e:3c:23:2a | ip_address='192.168.192.111', subnet_id='ae8a4c1f-d3e4-4ea2-bc14-ef1f6f5d0bbe'                         | DOWN   |

Actual results: ipv6 egressIP object is ignored.
Expected results: ipv6 egressIP is created and can be attached to a pod.
Additional info: must-gather linked in private comment.

Bug OCPBUGS-44514: Bump to kubernetes 1.28.15

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.15: Changelog: v1.28.15: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v12814

https://github.com/openshift/kubernetes/pull/2132

Bug OCPBUGS-19073: [4.15 HCP] label missing for aws-ebs-csi-driver-operator in HCP Guest cluster

View the Description View the linked PRs

Description of problem:

label missing for aws-ebs-csi-driver-operator in HCP Guest cluster

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-12-195514

How reproducible:

Always

Steps to Reproduce:

1. Install Hypershift kind cluster from flexy template
   aos-4_14/ipi-on-aws/versioned-installer-ovn-hypershift-ci

oc get deployment/aws-ebs-csi-driver-operator -n clusters-hypershift-ci-3366 -o jsonpath='{.spec.template.metadata.labels}'
{"name":"aws-ebs-csi-driver-operator"}

Actual results:

{"name":"aws-ebs-csi-driver-operator"}

Expected results:

need-management-kas-access

Additional info:

oc get deployment/cluster-storage-operator -n clusters-hypershift-ci-3366 -o jsonpath='{.spec.template.metadata.labels}'
{"hypershift.openshift.io/hosted-control-plane":"clusters-hypershift-ci-3366","hypershift.openshift.io/need-management-kas-access":"true","name":"cluster-storage-operator"}

Discussion: https://redhat-internal.slack.com/archives/GK0DA0JR5/p1694782231463969

Bug OCPBUGS-19133: Update 4.15 ose-machine-os-images image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-os-images/pull/30

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-os-images/pull/30

Bug OCPBUGS-19169: Update 4.15 ose-cluster-autoscaler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-autoscaler-operator/pull/286

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-autoscaler-operator/pull/286

Bug OCPBUGS-19190: Update 4.15 ose-machine-api-provider-aws image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-aws/pull/82

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-aws/pull/82

Bug OCPBUGS-44565: Post upgrading from 4.14 to 4.15.36, the observedGeneration count increased tremendously

View the Description View the linked PRs

post upgrading the cluster from 4.14 to 4.15.36, the observedGeneration count increased tremendously.

  ~ oc get mcp worker -oyaml
observedGeneration: 4240

 ~ oc get mcp master -oyaml
observedGeneration: 4724

oc get mcp

NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-688b23a51eb5ca3e34a7e9c76a28f82c   True      False      False      3              3                   3                     0                      3y208d
worker   rendered-worker-2c029b3defc09ae342d1874ab1755b3d   True      False      False      9              9                   9                     0                      3y208d

https://github.com/openshift/machine-config-operator/pull/4702

Bug OCPBUGS-17682: topologySpreadConstraints for UWM prometheus-operator does not work

View the Description View the linked PRs

Description of problem:

since in-cluster prometheus-operator and UWM prometheus-operator pods are scheduled to master nodes, see from

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/assets/prometheus-operator/deployment.yaml#L88-L97

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/assets/prometheus-operator-user-workload/deployment.yaml#L91-L103

enabled UWM and add topologySpreadConstraints for in-cluster prometheus-operator and UWM prometheus-operator(set topologyKey to node-role.kubernetes.io/master), topologySpreadConstraints takes effect for in-cluster prometheus-operator, but not for UWM prometheus-operator

apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true
    prometheusOperator:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring

in-cluster prometheus-operator, topologySpreadConstraints settings are loaded to prometheus-operator pod and deployment, see

$ oc -n openshift-monitoring get deploy prometheus-operator -oyaml | grep topologySpreadConstraints -A7
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
        maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
      volumes:

$ oc -n openshift-monitoring get pod -l app.kubernetes.io/name=prometheus-operator -o wide
NAME                                   READY   STATUS    RESTARTS   AGE    IP            NODE                                                 NOMINATED NODE   READINESS GATES
prometheus-operator-65496d5b78-fb9nq   2/2     Running   0          105s   10.128.0.71   juzhao-0813-szb9h-master-0.c.openshift-qe.internal   <none>           <none>

$ oc -n openshift-monitoring get pod prometheus-operator-65496d5b78-fb9nq -oyaml | grep topologySpreadConstraints -A7
    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus-operator
      maxSkew: 1
      topologyKey: node-role.kubernetes.io/master
      whenUnsatisfiable: DoNotSchedule
    volumes:

but the topologySpreadConstraints settings are not loaded to UWM prometheus-operator pod and deployment

$ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
apiVersion: v1
data:
  config.yaml: |
    prometheusOperator:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-14T08:10:49Z"
  labels:
    app.kubernetes.io/managed-by: cluster-monitoring-operator
    app.kubernetes.io/part-of: openshift-monitoring
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
  resourceVersion: "212490"
  uid: 048f91cb-4da6-4b1b-9e1f-c769096ab88c

$ oc -n openshift-user-workload-monitoring get deploy prometheus-operator -oyaml | grep topologySpreadConstraints -A7
no result

$ oc -n openshift-user-workload-monitoring get pod -l app.kubernetes.io/name=prometheus-operator
NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-77bcdcbd9c-m5x8z   2/2     Running   0          15m

$ oc -n openshift-user-workload-monitoring get pod prometheus-operator-77bcdcbd9c-m5x8z -oyaml | grep topologySpreadConstraints
no result

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

always

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

topologySpreadConstraints settings are not loaded to UWM prometheus-operator pod and deployment

Expected results:

topologySpreadConstraints settings loaded to UWM prometheus-operator pod and deployment

https://github.com/openshift/cluster-monitoring-operator/pull/2072

Bug OCPBUGS-26041: [release-4.15] There is no response when clicking on button "Select a version" when there is new update

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25780~~. The following is the description of the original issue:
—
Description of problem:

When there is new update for cluster, try to click "Select a version" from cluster settings page, there is no reaction.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-19-033450

How reproducible:

Always

Steps to Reproduce:

    1.Prepare a cluster with available update.
    2.Go to Cluster Settings page, choose a version by clicking on "Select a version" button.
    3.

Actual results:

2. There is no response when click on the button, user could not select a version from the page.

Expected results:

2. A modal should show up for user to select version after clicking on "Select a version" button

Additional info:

screenshot: https://drive.google.com/file/d/1Kpyu0kUKFEQczc5NVEcQFbf_uly_S60Y/view?usp=sharing

https://github.com/openshift/console/pull/13479

Bug OCPBUGS-19191: Update 4.15 ose-baremetal-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-operator/pull/302

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-operator/pull/302

Bug OCPBUGS-33277: built-in cluster role for "hostmount-anyuid" not functioning

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33184~~. The following is the description of the original issue:
—
Description of problem:

There are built-in cluster roles to provide access to the default OpenShift SCCs. The "hostmount-anyuid" SCC does not have a functioning build-in cluster role, as it appears to have a typo in the name.

Version-Release number of selected component (if applicable):

How reproducible:

Consistent

Steps to Reproduce:

    1. Attempt to use "system:openshift:scc:hostmount" cluster role
    2. 
    3.

Actual results:

No access provided as the name of the SCC is typod

Expected results:

Access provided to use the SCC

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1672

Bug OCPBUGS-33127: Azure Service Load Balancer taking long time to get deleted 4.15 and 4.16

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29012~~. The following is the description of the original issue:
—
Description of problem:

    We see failures in this test:

[Jira:"Networking / router"] monitor test service-type-load-balancer-availability setup expand_less 15m1s{ failed during setup error waiting for load balancer: timed out waiting for service "service-test" to have a load balancer: timed out waiting for the condition}

See this https://search.ci.openshift.org/?search=error+waiting+for+load+balancer&maxAge=168h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job to find recent ones.

example job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1754402739040817152

this has failed payloads like:

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.ci/release/4.16.0-0.ci-2024-02-01-211543
https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.ci/release/4.15.0-0.ci-2024-02-02-061913
https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.ci/release/4.15.0-0.ci-2024-02-02-001913

Version-Release number of selected component (if applicable):

    4.15 and 4.16

How reproducible:

    intermittent as shown in the search.ci query above

Steps to Reproduce:

    1. run the e2e tests on 4.15 and 4.16
    2.
    3.

Actual results:

    timeouts on getting load balancer

Expected results:

    no timeout and successful load balancer

Additional info:

    https://issues.redhat.com/browse/TRT-1486 has more info 
thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1707142256956139

https://github.com/openshift/cloud-provider-azure/pull/118

Bug OCPBUGS-4069: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2117

Bug OCPBUGS-26206: cluster-monitoring-operator watches on metal-ipi are higher

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26069~~. The following is the description of the original issue:
—
Component Readiness has found a potential regression in [sig-arch][Late] operators should not create watch channels very often [apigroup:apiserver.openshift.io] [Suite:openshift/conformance/parallel].

Probability of significant regression: 98.46%

Sample (being evaluated) Release: 4.15
Start Time: 2023-12-29T00:00:00Z
End Time: 2024-01-04T23:59:59Z
Success Rate: 83.33%
Successes: 15
Failures: 3
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 98.36%
Successes: 120
Failures: 2
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=sdn%20no-upgrade%20amd64%20metal-ipi%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&network=sdn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-01-04%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2023-12-29%2000%3A00%3A00&testId=openshift-tests%3A9ff4e9b171ea809e0d6faf721b2fe737&testName=%5Bsig-arch%5D%5BLate%5D%20operators%20should%20not%20create%20watch%20channels%20very%20often%20%5Bapigroup%3Aapiserver.openshift.io%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=serial&variant=serial

https://github.com/openshift/origin/pull/28506

Bug OCPBUGS-28917: Support PatternFly 5 dynamic module sharing for Console plugins

View the Description View the linked PRs

Description of problem:

We need to update Console dynamic plugin build infra (@openshift-console/dynamic-plugin-sdk-webpack) to support sharing of PatternFly 5 dynamic modules between dynamic plugins, as per ~~CONSOLE-3853~~.

This change is necessary for optimal performance of Console plugins that wish to migrate to PatternFly 5.

https://github.com/openshift/console/pull/13566

Bug OCPBUGS-31820: The ovs-if-br-ex.nmconnection.J1K8B2 like files breaks ovs-configuration.service

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22910~~. The following is the description of the original issue:
—
Description of problem:

The ovs-if-br-ex.nmconnection.J1K8B2 like files breaks ovs-configuration.service. Deleting the file fixes the issue.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4309

Bug OCPBUGS-5471: Installer can choose a worker as Node 0

View the Description View the linked PRs

If the user does not specify a rendezvousIP and instead leaves it to the installer to choose one of the configured static IPs, it always picks the lowest IP. If no roles are assigned, this host will become part of the control plane.

If the user assigns the lowest IP to a host to which they also assign a worker role, the install will fail.

It's not clear what will happen if the role is not explicitly set on the host with the lowest IP, but there are already sufficient control plane nodes assigned from among the other hosts. In any event, this wouldn't be good.

We should select a static IP among only the hosts that are eligible to become part of the control plane.

A user can work around this by explicitly specifying the rendezvousIP.

https://github.com/openshift/installer/pull/7443

Bug OCPBUGS-22562: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver-operator/pull/82

Bug OCPBUGS-24124: Update 4.15 ose-alibaba-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-alibaba/pull/46

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-alibaba/pull/47

Bug OCPBUGS-24156: Update 4.15 ose-cluster-kube-storage-version-migrator-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/100

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/100

Bug OCPBUGS-25707: Oh no! Something went wrong" in Topology -> Observese Tab

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25441~~. The following is the description of the original issue:
—
Description of problem:

    Oh no! Something went wrong" in Topology -> Observese Tab

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-14-115151

How reproducible:

    Always

Steps to Reproduce:

    1.Navigate to Topology -> click one deployment and go to Observer Tab
    2.
    3.

Actual results:

    The page crushed
ErrorDescription:Component trace:Copy to clipboardat te (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-plugins-shared~main-chunk-b3bd2b20c770a4e73b50.min.js:31:9773)
    at j (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-plugins-shared~main-chunk-b3bd2b20c770a4e73b50.min.js:12:3324)
    at div
    at s (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:60:70124)
    at div
    at g (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:6:11163)
    at div
    at d (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:1:174472)
    at t.a (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/dev-console/code-refs/topology-chunk-769d28af48dd4b29136f.min.js:1:487478)
    at t.a (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/dev-console/code-refs/topology-chunk-769d28af48dd4b29136f.min.js:1:486390)
    at div
    at l (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:60:106304)
    at div

Expected results:
{code:none}
    not crush

Additional info:

https://github.com/openshift/console/pull/13462

Bug OCPBUGS-27948: [4.15] PerformanceProfile render fails at Day-0 because the master/worker pools are not yet present

View the Description View the linked PRs

Description of problem:

The installer supports pre-rendering of the PerformanceProfile related manifests. However the MCO render is executed after the PerfProfile render and so the master and worker MachineConfigPools are created too late.

This causes the installation process to fail with:

Oct 18 18:05:25 localhost.localdomain bootkube.sh[537963]: I1018 18:05:25.968719       1 render.go:73] Rendering files into: /assets/node-tuning-bootstrap
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.008421       1 render.go:133] skipping "/assets/manifests/99_feature-gate.yaml" [1] manifest because of unhandled *v1.FeatureGate
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.013043       1 render.go:133] skipping "/assets/manifests/cluster-dns-02-config.yml" [1] manifest because of unhandled *v1.DNS
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.021978       1 render.go:133] skipping "/assets/manifests/cluster-ingress-02-config.yml" [1] manifest because of unhandled *v1.Ingress
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.023016       1 render.go:133] skipping "/assets/manifests/cluster-network-02-config.yml" [1] manifest because of unhandled *v1.Network
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.023160       1 render.go:133] skipping "/assets/manifests/cluster-proxy-01-config.yaml" [1] manifest because of unhandled *v1.Proxy
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.023445       1 render.go:133] skipping "/assets/manifests/cluster-scheduler-02-config.yml" [1] manifest because of unhandled *v1.Scheduler
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.024475       1 render.go:133] skipping "/assets/manifests/cvo-overrides.yaml" [1] manifest because of unhandled *v1.ClusterVersion
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: F1018 18:05:26.037467       1 cmd.go:53] no MCP found that matches performance profile node selector "node-role.kubernetes.io/master="

Version-Release number of selected component (if applicable):

4.14.0-rc.6

How reproducible:

Always

Steps to Reproduce:

1. Add an SNO PerformanceProfile to extra manifest in the installer. Node selector should be: "node-role.kubernetes.io/master="
2.
3.

Actual results:

no MCP found that matches performance profile node selector "node-role.kubernetes.io/master="

Expected results:

Installation completes

Additional info:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
 name: openshift-node-workload-partitioning-sno
spec:
 cpu:
   isolated: 4-X <- must match the topology of the node
   reserved: 0-3
 nodeSelector:
   node-role.kubernetes.io/master: ""

https://github.com/openshift/cluster-node-tuning-operator/pull/928

Bug OCPBUGS-15900: TestMTLSWithCRLs e2e test failures panic

View the Description View the linked PRs

Description of problem:

When the TestMTLSWithCRLs e2e test fails on a curl, it checks the stdout but the stdout could be empty, so it panics:

 --- FAIL: TestAll/parallel/TestMTLSWithCRLs (97.09s)
            --- FAIL: TestAll/parallel/TestMTLSWithCRLs/certificate-distributes-its-own-crl (97.09s)
panic: runtime error: slice bounds out of range [-3:] [recovered]
	panic: runtime error: slice bounds out of range [-3:]

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Experience a failure on the MTLS testing, such as seen in https://redhat-internal.slack.com/archives/CBWMXQJKD/p1688596054069399?thread_ts=1688596036.042119&cid=CBWMXQJKD

Search.ci shows two failures in the past two weeks: https://search.ci.openshift.org/?search=FAIL%3A+TestAll%2Fparallel%2FTestMTLSWithCRLs&maxAge=336h&context=1&type=bug%2Bissue%2Bjunit&name=cluster-ingress-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Steps to Reproduce:

1. N/A
2.
3.

Actual results:

Test panics when trying to report an error.

Expected results:

Test reports whatever error it can without panics.

Additional info:

stdout was empty, but https://github.com/openshift/cluster-ingress-operator/blob/4c92a6d1ee80b6b120dd750855a40145a530153c/test/e2e/client_tls_test.go#L1587 doesn't check that the value is empty before it tries to index it.

https://github.com/openshift/cluster-ingress-operator/pull/973

Bug OCPBUGS-24036: [4.15] CNO fails to apply ovnkube-master daemonset during upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22293~~. The following is the description of the original issue:
—
Description of problem:

Upgrading from 4.13.5 to 4.13.17 fails at network operator upgrade

Version-Release number of selected component (if applicable):

How reproducible:

Not sure since we only had one cluster on 4.13.5.

Steps to Reproduce:

1. Have a cluster on version 4.13.5 witn ovn kubernetes
2. Set desired update image to quay.io/openshift-release-dev/ocp-release@sha256:c1f2fa2170c02869484a4e049132128e216a363634d38abf292eef181e93b692
3. Wait until it reaches network operator

Actual results:

Error message: Error while updating operator configuration: could not apply (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: failed to apply / update (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: DaemonSet.apps "ovnkube-master" is invalid: [spec.template.spec.containers[1].lifecycle.preStop: Required value: must specify a handler type, spec.template.spec.containers[3].lifecycle.preStop: Required value: must specify a handler type]

Expected results:

Network operator upgrades successfully

Additional info:

Since I'm not able to attach files please gather all required debug data from https://access.redhat.com/support/cases/#/case/03645170

https://github.com/openshift/cluster-network-operator/pull/2167

Bug OCPBUGS-24070: Update 4.15 ose-baremetal-runtimecfg-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-runtimecfg/pull/289

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-runtimecfg/pull/289

Bug MGMT-16037: [STG][Scale] Failed to update cluster with 103 nodes

View the Description View the linked PRs

Cluster with 103 nodes failed to update in UI on Networking page
with Dialog error: "The service is down, undergoing maintenance, or experiencing another issue."
And error in UI:
"[10] Message Size Too Large: the server has a configurable maximum message size to avoid
unbounded memory allocation and the client attempted to produce a message larger than this maximum"

And in browser Debugger
PATCH https://api.stage.openshift.com/api/assisted-install/v2/clusters/674c7056-4db9-4ea6-9f1d-f976fc77897e 500 (Internal Server Error)

See attached screenshot

Steps to reproduce:
1. Create cluster, generate minimal ISO image, download to servers
2. Boot 103 nodes with ISO image
3. Wait all nodes finished discovering
4. Click Next , Next
5. Set API and Ingress VIP in Networking page

Actual results:
Raise error dialog: Unable to update cluster
The service is down, undergoing maintenance, or experiencing another issue.
and ask to Refresh. Which return back to Cluster details page

Expected results:
Should update cluster and allow continue to install cluster

https://github.com/openshift/assisted-service/pull/5628

Bug OCPBUGS-26535: [4.15] SDN Failues for [sig-network][Feature:tuning] sysctl allowlist update should start a pod with custom sysctl only [when the sysctl is added to whitelist [Suite:openshift/conformance/parallel]

View the Description View the linked PRs

Seeing failures for SDN periodics running [sig-network][Feature:tuning] sysctl allowlist update should start a pod with custom sysctl only when the sysctl is added to whitelist [Suite:openshift/conformance/parallel] beginning with 4.16.0-0.nightly-2024-01-05-205447

sippy: sysctl allowlist update should start a pod with custom sysctl only when the sysctl is added to whitelist

  Jan  5 23:14:22.066: INFO: At 2024-01-05 23:14:09 +0000 UTC - event for testpod: {kubelet ip-10-0-54-42.us-west-2.compute.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod_e2e-test-tuning-bzspr_2a9ce6e0-726d-47a6-ac64-71d430926574_0(968a55c5afd81e077b1d15a4129084d5f15002ac3ae6aa9fe32648e841940fe2): error adding pod e2e-test-tuning-bzspr_testpod to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): timed out waiting for the condition

That payload contains OCPBUGS-26222: Adds a wait on unix socket readiness not sure that is the cause but will investigate.

https://github.com/openshift/multus-cni/pull/209

Bug OCPBUGS-24212: Add ownership notifications to TLS artificates

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2158

Bug OCPBUGS-25601: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/251

Bug OCPBUGS-16736: ‘Oh no! Something went wrong’ will be shown when user go to MultiClusterEngine details -> Yaml tab

View the Description View the linked PRs

Description of problem:

    Oh no! Something went wrong’ will be shown when user go to MultiClusterEngine details -> Yaml tab

Version-Release number of selected component (if applicable):

    4.14.0-0.nightly-2023-07-20-215234

How reproducible:

    Always

Steps to Reproduce:

1. Install 'multicluster engine for Kubernetes' operator in the cluster
2. Use the default value to create a new MultiClusterEngine
3. Navigate to the MultiClusterEngine details -> Yaml Tab

Actual results: 
‘Oh no! Something went wrong.’ error will be shown with below details
TypeErrorDescription:
 Cannot read properties of null (reading 'editor')

Expected results:

    no error

Additional info:

    This bug fix is in conjunction with https://issues.redhat.com/browse/OCPBUGS-22778

https://github.com/openshift/console/pull/13195

Bug OCPBUGS-17218: GCP Shared VPC installation does not log when it cannot create firewall rules

View the Description View the linked PRs

Description of problem:

When installing OpenShift on GCP in a Shared VPC (formerly XPN) configuration, the service account used must have permissions to create firewall rules on the host project's network in order to proceed. If the account does not have permissions, the installation will fail but the explicit reason is not listed.

Version-Release number of selected component (if applicable):

4.14-ec.1

How reproducible:

100% of the time when the service account creating the cluster does not have Owner permissions or `compute.firewall.create` on the host project.

Steps to Reproduce:

1. Follow instructions at https://docs.openshift.com/container-platform/4.13/installing/installing_gcp/installing-gcp-shared-vpc.html
2. As part of the prerequisites, make a service account with the permissions listed at https://docs.openshift.com/container-platform/4.13/installing/installing_gcp/installing-gcp-account.html#minimum-required-permissions-ipi-gcp-xpn
3. Create a cluster using an install-config.yaml similar to the one attached

Actual results:

The cluster fails to bootstrap. The bootstrap node will be present, as will the masters, but components will not be able to reach the api-int load balancer.

Expected results:

The log files would include an error message regarding the missing permissions, and possibly abort the installation early.

Additional info:

https://docs.openshift.com/container-platform/4.13/installing/installing_gcp/installing-gcp-account.html#minimum-required-permissions-ipi-gcp-xpn does not list the `compute.firewalls.create` permission, which is included in the code at https://github.com/openshift/installer/blob/4f59664588c4472b7aba2838159651e729908dff/pkg/asset/cluster/tfvars.go#L79.
This is probably also a related docs improvement.

File attachment seems to have been disabled, so here is the text of the `install-config.yaml` that I was using:

additionalTrustBundlePolicy: Proxyonly
apiVersion: v1
baseDomain: installer.gcp.devcluster.openshift.com
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
credentialsMode: Passthrough
featureSet: TechPreviewNoUpgrade
metadata:
  creationTimestamp: null
  name: nrbxpn
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  gcp:
    projectID: openshift-installer-shared-vpc
    region: us-central1
    network: installer-shared-vpc
    computeSubnet: installer-shared-vpc-subnet-1
    controlPlaneSubnet: installer-shared-vpc-subnet-2
    networkProjectID: openshift-dev-installer
publish: Internal
pullSecret: <omitted>
sshKey: <omitted>

https://github.com/openshift/installer/pull/7417

Bug OCPBUGS-27681: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-rukpak/pull/72

Bug OCPBUGS-29929: [gcp] destroying the problem cluster unexpectedly deletes the dns record-sets not created by the installer

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27156~~. The following is the description of the original issue:
—
Description of problem:

   Trying to create the second cluster using the same cluster name and base domain as the first cluster would fail, as expected, because of the dns record-sets conflicts. But deleting the second cluster leads to the first cluster inaccessible, which is unexpected.

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2024-01-14-100410

How reproducible:

    Always

Steps to Reproduce:

1. create the first cluster and make sure it succeeds
2. try to create the second cluster, with the same cluster name, base domain, and region, and make sure it failed
3. destroy the second cluster which failed due to "Platform Provisioning Check"
4. check if the first cluster is still healthy

Actual results:

    The first cluster turns unhealthy, because the dns record-sets are deleted by step3

Expected results:

    The dns record-sets of the first cluster stay untouched during step3, and the the first cluster stays healthy after step3.

Additional info:

(1) the first cluster is by Flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/257549/, and it's healthy initially

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-01-14-100410   True        False         54m     Cluster version is 4.15.0-0.nightly-2024-01-14-100410
$ oc get nodes
NAME                                                       STATUS   ROLES                  AGE   VERSION
jiwei-0115y-lgns8-master-0.c.openshift-qe.internal         Ready    control-plane,master   73m   v1.28.5+c84a6b8
jiwei-0115y-lgns8-master-1.c.openshift-qe.internal         Ready    control-plane,master   73m   v1.28.5+c84a6b8
jiwei-0115y-lgns8-master-2.c.openshift-qe.internal         Ready    control-plane,master   74m   v1.28.5+c84a6b8
jiwei-0115y-lgns8-worker-a-gqq96.c.openshift-qe.internal   Ready    worker                 62m   v1.28.5+c84a6b8
jiwei-0115y-lgns8-worker-b-2h9xd.c.openshift-qe.internal   Ready    worker                 63m   v1.28.5+c84a6b8
$ 

(2) try to create the second cluster and expect failing due to dns record already exists

$ openshift-install version
openshift-install 4.15.0-0.nightly-2024-01-14-100410
built from commit b6f320ab7eeb491b2ef333a16643c140239de0e5
release image registry.ci.openshift.org/ocp/release@sha256:385d84c803c776b44ce77b80f132c1b6ed10bd590f868c97e3e63993b811cc2d
release architecture amd64
$ mkdir test1
$ cp install-config.yaml test1
$ yq-3.3.0 r test1/install-config.yaml baseDomain
qe.gcp.devcluster.openshift.com
$ yq-3.3.0 r test1/install-config.yaml metadata
creationTimestamp: null
name: jiwei-0115y
$ yq-3.3.0 r test1/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
$ openshift-install create cluster --dir test1
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
INFO Consuming Install Config from target directory 
FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": metadata.name: Invalid value: "jiwei-0115y": record(s) ["api.jiwei-0115y.qe.gcp.devcluster.openshift.com."] already exists in DNS Zone (openshift-qe/qe) and might be in use by another cluster, please remove it to continue 
$ 

(3) delete the second cluster

$ openshift-install destroy cluster --dir test1
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
INFO Deleted 2 recordset(s) in zone qe            
INFO Deleted 3 recordset(s) in zone jiwei-0115y-lgns8-private-zone 
WARNING Skipping deletion of DNS Zone jiwei-0115y-lgns8-private-zone, not created by installer 
INFO Time elapsed: 37s                            
INFO Uninstallation complete!                     
$ 

(4) check the first cluster status and the dns record-sets

$ oc get clusterversion
Unable to connect to the server: dial tcp: lookup api.jiwei-0115y.qe.gcp.devcluster.openshift.com on 10.11.5.160:53: no such host
$
$ gcloud dns managed-zones describe jiwei-0115y-lgns8-private-zone
cloudLoggingConfig:
  kind: dns#managedZoneCloudLoggingConfig
creationTime: '2024-01-15T07:22:55.199Z'
description: Created By OpenShift Installer
dnsName: jiwei-0115y.qe.gcp.devcluster.openshift.com.
id: '9193862213315831261'
kind: dns#managedZone
labels:
  kubernetes-io-cluster-jiwei-0115y-lgns8: owned
name: jiwei-0115y-lgns8-private-zone
nameServers:
- ns-gcp-private.googledomains.com.
privateVisibilityConfig:
  kind: dns#managedZonePrivateVisibilityConfig
  networks:
  - kind: dns#managedZonePrivateVisibilityConfigNetwork
    networkUrl: https://www.googleapis.com/compute/v1/projects/openshift-qe/global/networks/jiwei-0115y-lgns8-network
visibility: private
$ gcloud dns record-sets list --zone jiwei-0115y-lgns8-private-zone
NAME                                          TYPE  TTL    DATA
jiwei-0115y.qe.gcp.devcluster.openshift.com.  NS    21600  ns-gcp-private.googledomains.com.
jiwei-0115y.qe.gcp.devcluster.openshift.com.  SOA   21600  ns-gcp-private.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
$ gcloud dns record-sets list --zone qe --filter='name~jiwei-0115y'
Listed 0 items.
$

https://github.com/openshift/installer/pull/8062

Bug OCPBUGS-32895: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-workload-identity/pull/17

Bug OCPBUGS-27255: [4.15] SessionAffinity does not work after scaling down the Pods

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

4.14

How reproducible:

1. oc patch svc <svc> --type merge --patch '{"spec":{"sessionAffinity":"ClientIP"}}'

2. curl <svc>:<port>

3. oc scale --replicas=3 deploy/<deploy>

4. oc scale --replicas=0 deploy/<deploy>

5. oc scale --replicas=3 deploy/<deploy>

Actual results:

Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 54850
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 46668
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 46682
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 60144
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 60150
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 60160
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 51720

Expected results:

Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46914
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46928
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46944
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 40510
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 40520

Additional info:

See the hostname in the server log output for each command.

$ oc patch svc <svc> --type merge --patch '{"spec":{"sessionAffinity":"ClientIP"}}'

Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46914
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46928
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46944
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 40510
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 40520

$ oc scale --replicas=1 deploy/<deploy>

Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 47082
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 47088
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 54832
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 54848

$ oc scale --replicas=3 deploy/<deploy>

Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 54850
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 46668
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 46682
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 60144
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 60150
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 60160
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 51720

https://github.com/openshift/ovn-kubernetes/pull/2020

Bug OCPBUGS-29922: [4.15] Infinite PODs loop creation with "NodeAffinity" status

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17249~~. The following is the description of the original issue:
—
Description of problem:

When on projects is used the openshift.io/node-selector option in the annotation pointing to the default "node-role.kubernetes.io/worker=" key and a deployment with a running replicaset/PODS is set to use a different role then the default one the scheduler enter in an infinite loop of POD creation.

Version-Release number of selected component (if applicable):

Tested on 4.11

Steps to Reproduce:

1. Create a project via [oc create/apply] with annotation: openshift.io/node-selector: node-role.kubernetes.io/worker= 
2. Create a deployment that creates a running POD 
3. Edit the deployment and add the nodeName: option pointing to a different role than the worker one

Actual results:

Infinite POD creation: 
❯ oc get po
NAME                                     READY   STATUS         RESTARTS   AGE
infinite-pod-creation-7458cbbd88-98zpn   1/1     Running        0          8m40s
infinite-pod-creation-7688f685c7-2grmh   0/1     NodeAffinity   0          1s
infinite-pod-creation-7688f685c7-4g7dd   0/1     NodeAffinity   0          2s
infinite-pod-creation-7688f685c7-59zr6   0/1     NodeAffinity   0          1s
infinite-pod-creation-7688f685c7-5l5xl   0/1     NodeAffinity   0          2s
infinite-pod-creation-7688f685c7-5nw22   0/1     NodeAffinity   0          2s
infinite-pod-creation-7688f685c7-5qr7z   0/1     NodeAffinity   0          1s
infinite-pod-creation-7688f685c7-5wp2q   0/1     NodeAffinity   0          2s
infinite-pod-creation-7688f685c7-6kxjg   0/1     NodeAffinity   0          1s
infinite-pod-creation-7688f685c7-74d7m   0/1     NodeAffinity   0          2s
infinite-pod-creation-7688f685c7-78hzm   0/1     NodeAffinity   0          1s 
....
...
..
.

Expected results:

The scheduler should be able to find a conflict and throw an error

Additional info:

apiVersion: project.openshift.io/v1
kind: Project
metadata:
  annotations:
    openshift.io/description: ""
    openshift.io/display-name: ""
    openshift.io/node-selector: node-role.kubernetes.io/worker=
    openshift.io/requester: kube:admin
    openshift.io/sa.scc.mcs: s0:c29,c19
    openshift.io/sa.scc.supplemental-groups: 1000850000/10000
    openshift.io/sa.scc.uid-range: 1000850000/10000
  labels:
    kubernetes.io/metadata.name: infinite-pod-creation
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.24
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.24
  name: infinite-pod-creation
spec:
  finalizers:
  - kubernetes

======================================================

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    app: infinite-pod-creation
    app.kubernetes.io/component: infinite-pod-creation
    app.kubernetes.io/instance: infinite-pod-creation
  name: infinite-pod-creation
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      deployment: infinite-pod-creation
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
      creationTimestamp: null
      labels:
        deployment: infinite-pod-creation
        app: infinite-pod-creation
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: infinite-pod-creation
      containers:
      - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a9db83f67aa4389811bad29af878d038e18bc39f63673fe77fe30f9bf1bd97de
        imagePullPolicy: IfNotPresent
        name: infinite-pod-creation
        ports:
        - containerPort: 8080
          protocol: TCP
        - containerPort: 8888
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
 
============================================================

oc patch deployment infinite-pod-creation -p '{"spec":{"template":{"spec":{"nodeName": "$NODE-NAME-DIFFERENT-FROM-WORKER-ROLE"}}}}'

https://github.com/openshift/kubernetes/pull/1897

Bug OCPBUGS-31686: [4.15] FirmwareSchema not created for HPE iLO5 due to compression disabled

View the Description View the linked PRs

Description of problem:

Creating a `BareMetalHost` for an HPE ProLiant DL360 Gen10 Plus doesn't create the corresponding `FirmwareSchema` object.

Version-Release number of selected component (if applicable):

Cluster version is 4.15.0
iLO firmware version is 2.63 Jan 20 2022

How reproducible:

Always

Steps to Reproduce:

    1. Create a `BareMetalHost` (and BMC credentials secret) for the machine. 
    2. Wait for the `FirmwareSchema` to be created.

Actual results:

The `FirmwareSchema` object is never created.

Expected results:

The `FirmwareSchema` object should be created.

Additional info:

See the comments.

https://github.com/openshift/ironic-image/pull/470

Bug OCPBUGS-35736: Migrate HyperShift KAS to none endpoint reconciler type

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4228

Bug OCPBUGS-26600: HCP fails to deploy with TechPreviewNoUpgrade featue set

View the Description View the linked PRs

spec:
  configuration:
    featureGate:
      featureSet: TechPreviewNoUpgrade

$ oc get pod
NAME                                      READY   STATUS             RESTARTS      AGE
capi-provider-bd4858c47-sf5d5             0/2     Init:0/1           0             9m33s
cluster-api-85f69c8484-5n9ql              1/1     Running            0             9m33s
control-plane-operator-78c9478584-xnjmd   2/2     Running            0             9m33s
etcd-0                                    3/3     Running            0             9m10s
kube-apiserver-55bb575754-g4694           4/5     CrashLoopBackOff   6 (81s ago)   8m30s

$ oc logs kube-apiserver-55bb575754-g4694 -c kube-apiserver --tail=5
E0105 16:49:54.411837       1 controller.go:145] while syncing ConfigMap "kube-system/kube-apiserver-legacy-service-account-token-tracking", err: namespaces "kube-system" not found
I0105 16:49:54.415074       1 trace.go:236] Trace[236726897]: "Create" accept:application/vnd.kubernetes.protobuf, */*,audit-id:71496035-d1fe-4ee1-bc12-3b24022ea39c,client:::1,api-group:scheduling.k8s.io,api-version:v1,name:,subresource:,namespace:,protocol:HTTP/2.0,resource:priorityclasses,scope:resource,url:/apis/scheduling.k8s.io/v1/priorityclasses,user-agent:kube-apiserver/v1.29.0 (linux/amd64) kubernetes/9368fcd,verb:POST (05-Jan-2024 16:49:44.413) (total time: 10001ms):
Trace[236726897]: ---"Write to database call failed" len:174,err:priorityclasses.scheduling.k8s.io "system-node-critical" is forbidden: not yet ready to handle request 10001ms (16:49:54.415)
Trace[236726897]: [10.001615835s] [10.001615835s] END
F0105 16:49:54.415382       1 hooks.go:203] PostStartHook "scheduling/bootstrap-system-priority-classes" failed: unable to add default system priority classes: priorityclasses.scheduling.k8s.io "system-node-critical" is forbidden: not yet ready to handle request

https://github.com/openshift/hypershift/pull/3399

Bug OCPBUGS-27909: [release-4.15] Wrong disk size filled in when expanding a pvc in the UI

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27779~~. The following is the description of the original issue:
—
Description of problem:

When expanding a PVC of unit-less size (e.g., '2147483648'), the Expand PersistentVolumeClaim modal populates the spinner with a unit-less value (e.g., 2147483648) instead of a meaningful value.

Version-Release number of selected component (if applicable):

CNV - 4.14.3

How reproducible:

always

Steps to Reproduce:

1.Create a PVC using the following YAML.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:   
  name: task-pv-claim
spec: 
  storageClassName: gp3-csi
  accessModes:     
    - ReadWriteOnce
  resources: 
    requests:       
      storage: "2147483648"

apiVersion: v1
kind: Pod
metadata:   
  name: task-pv-pod
spec:   
  securityContext:     
    runAsNonRoot: true
    seccompProfile:       
      type: RuntimeDefault
  volumes:     
    - name: task-pv-storage
      persistentVolumeClaim:         
        claimName: task-pv-claim
  containers:     
    - name: task-pv-container
      image: nginx
      ports:         
        - containerPort: 80
          name: "http-server"
      volumeMounts:         
        - mountPath: "/usr/share/nginx/html"
          name: task-pv-storage

2. From the newly created PVC details page, Click Actions > Expand PVC.
3. Note the value in the spinner input.

See https://drive.google.com/file/d/1toastX8rCBtUzx5M-83c9Xxe5iPA8fNQ/view for a demo

https://github.com/openshift/console/pull/13542

Bug OCPBUGS-28948: openshift/builder - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28661~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/builder/pull/378

Bug OCPBUGS-29083: Upgrade blocked due to the OLM operator stuck in CrashLoopBackOff

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28744~~. The following is the description of the original issue:
—
Description of problem:

$ oc adm upgrade
info: An upgrade is in progress. Working towards 4.15.0-rc.4: 701 of 873 done (80% complete), waiting on operator-lifecycle-manager

Upstream: https://api.openshift.com/api/upgrades_info/v1/graph
Channel: candidate-4.15 (available channels: candidate-4.15, candidate-4.16)
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.


$ oc get pods -n openshift-operator-lifecycle-manager 
NAME                                      READY   STATUS             RESTARTS        AGE
catalog-operator-db86b7466-gdp4g          1/1     Running            0               9h
collect-profiles-28443465-9zzbk           0/1     Completed          0               34m
collect-profiles-28443480-kkgtk           0/1     Completed          0               19m
collect-profiles-28443495-shvs7           0/1     Completed          0               4m10s
olm-operator-56cb759d88-q2gr7             0/1     CrashLoopBackOff   8 (3m27s ago)   20m
package-server-manager-7cf46947f6-sgnlk   2/2     Running            0               9h
packageserver-7b795b79f-thxfw             1/1     Running            1               14d
packageserver-7b795b79f-w49jj             1/1     Running            0               4d17h

Version-Release number of selected component (if applicable):

How reproducible:

Unknown

Steps to Reproduce:

Upgrade from 4.15.0-rc.2 to 4.15.0-rc.4

Actual results:

The upgrade is unable to proceed

Expected results:

The upgrade can proceed

Additional info:

https://github.com/openshift/operator-framework-olm/pull/683

Vulnerability OCPBUGS-47313: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/940

Bug OCPBUGS-19249: Update 4.15 ose-haproxy-router-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/512

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/512

Bug OCPBUGS-23472: ignition-server-proxy deployment fails on y-stream upgrade 4.13->4.14

View the Description View the linked PRs

ignition-server-proxy pods fail to start after y-stream upgrade because the deployment is configured with a ServiceAccount, set in 4.13, that was deleted in 4.14 in PR https://github.com/openshift/hypershift/pull/2778. The 4.14 reconciliation does not unset the ServiceAccount that was set in 4.13.

https://github.com/openshift/hypershift/pull/3209

Bug OCPBUGS-23652: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-nutanix/pull/27

Bug MGMT-15425: [Staging] MCE operator installation version is 2.3 only

View the Description View the linked PRs

Description of the problem:

MCE operator installation version is 2.3 only , It should be dynamic and consider OCP version

ocp_mce_version_matrix:

'4.14': '2.4'

'4.13': '2.3'

'4.12': '2.2'

'4.11': '2.1'

'4.10': '2.0'

How reproducible:

100%{}

Steps to reproduce:

1. Create a 4.12 cluster

2. Select MCE operator to be installed on cluster

3. Install cluster

4. Verify OCP and MCE versions

Actual results:

OCP 4.12.26, MCE 2.3.0

Looks like service install 2.3 only and not consider OCP version
https://github.com/openshift/assisted-service/blob/master/internal/operators/mce/config.go

const (
    MceMinOpenshiftVersion string = "4.10.0"
    MceChannel             string = "stable-2.3"

Expected results:
MCE 2.2

MCE installation version should be dynamic and depends on OCP version

ocp_mce_version_matrix:

'4.14': '2.4'

'4.13': '2.3'

'4.12': '2.2'

'4.11': '2.1'

'4.10': '2.0'

https://github.com/openshift/assisted-service/pull/5716

Bug OCPBUGS-13152: Unnecessary API calls if TektonConfig is not minimal

View the Description View the linked PRs

Description of problem:
With ~~OCPBUGS-11099~~ our Pipeline Plugin supports the TektonConfig config "embedded-status: minimal" option that will be the default in OpenShift Pipelines 1.11+.

But since this change, the Pipeline pages loads the TaskRuns for any Pipeline and PipelineRun rows. To decrease the risk of a performance issue we should make this call only if the status.tasks wasn't defined.

Version-Release number of selected component (if applicable):

4.12-4.14, as soon as ~~OCPBUGS-11099~~ is backported.
Tested with Pipelines operator 1.10.1

How reproducible:
Always

Steps to Reproduce:

Install Pipelines operator
Import a Git repository and enable the Pipeline option
Open the browser network inspector
Navigate to the Pipeline page

Actual results:
The list page load a list of TaskRuns for each Pipeline / PipelineRun also if the PipelineRun contains the related data already (status.tasks)

Expected results:
No unnecessary network calls. When the admin changes the TektonConfig config "embedded-status" option to minimal the UI should still work and load the TaskRuns as it does it today.

Additional info:
None

https://github.com/openshift/console/pull/13065

Bug OCPBUGS-19192: Update 4.15 ose-network-interface-bond-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/bond-cni/pull/59

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/bond-cni/pull/59

Bug OCPBUGS-19220: Update 4.15 vmware-vsphere-syncer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/86

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/86

Bug OCPBUGS-28603: webhook release payload validation introduces resource ordering error

View the Description View the linked PRs

Description of problem:

    Recent introductions of a validation within the hypershift operator's webhook conflicts with the UI's ability to create HCP clusters. Previously the pull secret was not required to be posted before an HC or NP, but with a recent change, the pull secret is required because the pull secret is used to validate the release image payload.

This issue is isolated to 4.15

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    100% attempt to post a HC before the pull secret is posted and the HC will be rejected. 

The expected outcome is that it should be able to post the pull secret for a HC after the HC is posted, and the controller should be eventually consistent to this change.

https://github.com/openshift/hypershift/pull/3515

Bug OCPBUGS-45950: Unable to remove finally tasks in pipeline builder mode

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-45229~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-44873~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-44305. The following is the description of the original issue:
—
Description of problem:

The finally tasks do not get removed and remain in the pipeline.

Version-Release number of selected component (if applicable):

    In all supported OCP version

How reproducible:

    Always

Steps to Reproduce:

1. Create a finally task in a pipeline in pipeline builder
2. Save pipeline
3. Edit pipeline and remove finally task in pipeline builder
4. Save pipeline
5. Observe that the finally task has not been removed

Actual results:

The finally tasks do not get removed and remain in the pipeline.

Expected results:

Finally task gets removed from pipeline when removing the finally tasks and saving the pipeline in the "pipeline builder" mode.

Additional info:

https://github.com/openshift/console/pull/14606

Task OU-286: Dev Console: Add data-test-id for dashboard panels so e2e test don't rely on panel names

View the Description View the linked PRs

Background

e2e console tests uses the panel titles as a way to assert the panels exist, titles can change so an id can be added so testing does not rely on titles

Outcomes

a `data-test-id` attribute is added to panels, the `data-test` containing the title remains unchanged.

https://github.com/openshift/console/pull/13340

Bug OCPBUGS-21936: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/coredns/pull/101

Bug OCPBUGS-29739: dhcp-daemon pods have priority: 0 and no priority class

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29341~~. The following is the description of the original issue:
—
Description of problem:

If these pods are evicted, they loose all knowlage of exsisting dhcp leases, and any pods using dhcp ipam will fail to renew the dhcp lease. even after the pod is re-created.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. use a NAD with ipam: dhcp.
    2. delete the dhcp deamon pod on the smae node as your workload.
    3. observe the lease expire on dhcp server / get reissued to a different pod causing network outage from duplicate addresses.

Actual results:

dhcp-daemon

Expected results:

dhcp-daemon pod does not get evicted before workloads. because of system-node-critical

Additional info:

All other multus components system-node-critical 
  priority: 2000001000
  priorityClassName: system-node-critical

https://github.com/openshift/cluster-network-operator/pull/2283

Bug OCPBUGS-34809: Network node identity uses unescaped IPv6 addresses in the ValidatingWebhookConfiguration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34770~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34359~~. The following is the description of the original issue:
—
The issue was observed during testing of the k8s 1.30 rebase in which the webhook client started using http2 for loopback IPs: kubernetes/kubernetes#122558.
It looks like the issue is caused by how a http2 client handles this invalid address, I verified this change by setting up a cluster with openshift/kubernetes#1953 and this pr.

https://github.com/openshift/cluster-network-operator/pull/2396

Bug OCPBUGS-45839: [vSphere] network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44179~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-44047~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42660. The following is the description of the original issue:
—
There were remaining issues from the original issue. A new bug has been opened to address this. This is a clone of issue ~~OCPBUGS-32947~~. The following is the description of the original issue:
—
Description of problem:

    [vSphere] network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-04-23-032717

How reproducible:

    Always

Steps to Reproduce:

    1.Install a vSphere 4.16 cluster, we use automated template: ipi-on-vsphere/versioned-installer
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-04-23-032717   True        False         24m     Cluster version is 4.16.0-0.nightly-2024-04-23-032717     

    2.Check the controlplanemachineset, you can see network.devices, template and workspace have value.
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset     
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE    AGE
cluster   3         3         3       3                       Active   51m
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: "2024-04-25T02:52:11Z"
  finalizers:
  - controlplanemachineset.machine.openshift.io
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
  name: cluster
  namespace: openshift-machine-api
  resourceVersion: "18273"
  uid: f340d9b4-cf57-4122-b4d4-0f45f20e4d79
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  state: Active
  strategy:
    type: RollingUpdate
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: VSphere
        vsphere:
        - name: generated-failure-domain
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices:
              - networkName: devqe-segment-221
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: huliu-vs425c-f5tfl-rhcos-generated-region-generated-zone
            userDataSecret:
              name: master-user-data
            workspace:
              datacenter: DEVQEdatacenter
              datastore: /DEVQEdatacenter/datastore/vsanDatastore
              folder: /DEVQEdatacenter/vm/huliu-vs425c-f5tfl
              resourcePool: /DEVQEdatacenter/host/DEVQEcluster/Resources
              server: vcenter.devqe.ibmc.devcluster.openshift.com
status:
  conditions:
  - lastTransitionTime: "2024-04-25T02:59:37Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Error
  - lastTransitionTime: "2024-04-25T03:03:45Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-04-25T03:03:45Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-04-25T03:01:04Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasUpdated
    status: "False"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3     

    3.Delete the controlplanemachineset, it will recreate a new one, but those three fields that had values before are now cleared.

liuhuali@Lius-MacBook-Pro huali-test % oc delete controlplanemachineset cluster
controlplanemachineset.machine.openshift.io "cluster" deleted
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE      AGE
cluster   3         3         3       3                       Inactive   6s
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: "2024-04-25T03:45:51Z"
  finalizers:
  - controlplanemachineset.machine.openshift.io
  generation: 1
  name: cluster
  namespace: openshift-machine-api
  resourceVersion: "46172"
  uid: 45d966c9-ec95-42e1-b8b0-c4945ea58566
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  state: Inactive
  strategy:
    type: RollingUpdate
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: VSphere
        vsphere:
        - name: generated-failure-domain
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices: null
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: ""
            userDataSecret:
              name: master-user-data
            workspace: {}
status:
  conditions:
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Error
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasUpdated
    status: "False"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3     

    4.I active the controlplanemachineset and it does not trigger an update,  I continue to add these field values back and it does not trigger an update, I continue to edit these fields to add a second network device and it still does not trigger an update. 


            network:
              devices:
              - networkName: devqe-segment-221
              - networkName: devqe-segment-222


By the way, I can create worker machines with other network device or two network devices.
huliu-vs425c-f5tfl-worker-0a-ldbkh    Running                          81m
huliu-vs425c-f5tfl-worker-0aa-r8q4d   Running                          70m

Actual results:

    network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

Expected results:

    The fields value should not be changed when deleting the controlplanemachineset, 
    Updating these fields should trigger an update, or if these fields should not be modified, then it should not take effect when modifying the controlplanemachineset, as such an inconsistency seems confusing.

Additional info:

    Must gather:  https://drive.google.com/file/d/1mHR31m8gaNohVMSFqYovkkY__t8-E30s/view?usp=sharing

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/336

Bug OCPBUGS-23668: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-openstack/pull/97

Bug OCPBUGS-26481: Tab "VolumeSnapshots" crashed on PVC page.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26049~~. The following is the description of the original issue:
—
Description of problem:

Go to one pvc "VolumeSnapshots" tab, it shows error "Oh no! Something went wrong."

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2024-01-03-140457

How reproducible:

Always

Steps to Reproduce:

    1.Create a pvc in project. Go to the pvc's "VolumeSnapshots" tab.
    2.
    3.

Actual results:

1. The error "Oh no! Something went wrong." shows up on the page.

Expected results:

1. Should show volumesnapshot related to the pvc without error.

Additional info:

screenshot: https://drive.google.com/file/d/1l0i0DCFh_q9mvFHxnftVJL0AM1LaKFOO/view?usp=sharing

https://github.com/openshift/console/pull/13490

Bug OCPBUGS-27904: SRV lookup is failing after OpenShift Container Platform 4.13 update because of CoreDNS version 1.10.1

View the Description View the linked PRs

This is a manual "clone" of issue ~~OCPBUGS-27397~~. The following is the description of the original issue:

Description of problem:

After the update to OpenShift Container Platform 4.13, it was reported that the SRV query for _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net is failing. The query is sent to CoreDNS is not matching any configured forwardPlugin and therefore the default is applied. When revering the dns-default pod Image back to OpenShift Container Platform 4.12 it works and this is also the workaround that has been put in place as production application were affected. Testing shows that the problem is available in OpenShift Container Platform 4.13, 4.14 and even 4.15. Forcing TCP on pod level does not change the behavior and the query will still fail. But when configuring a specific forwardPlugin for the Domain and enforcing DNS over TCP it also works again.

 - Adjusting bufsize did/does not help as the result was still the same (suspecting this because of https://issues.redhat.com/browse/OCPBUGS-21901 - but again, as no effect)
 - Only way to make it work, is to force_tcp either in default ". /etc/resolv.conf" section or by configure a forwardPlugin and forcing TCP

Checking upstream, I found https://github.com/coredns/coredns/issues/5953 respectively https://github.com/coredns/coredns/pull/6277 which I suspect being related. When building from master CoreDNS branch it indeed starts to work again and resolving the SRV entry is possible again.

---

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.27   True        False         24h     Cluster version is 4.13.27

$ oc get pod -o wide
NAME                  READY   STATUS    RESTARTS   AGE     IP             NODE                                           NOMINATED NODE   READINESS GATES
dns-default-626td     2/2     Running   0          3m15s   10.128.2.49    aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
dns-default-74nnw     2/2     Running   0          87s     10.131.0.47    aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
dns-default-8mggz     2/2     Running   0          2m31s   10.128.1.121   aro-cluster-h78zv-h94mh-master-0               <none>           <none>
dns-default-clgkg     2/2     Running   0          109s    10.129.2.187   aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
dns-default-htdw2     2/2     Running   0          2m10s   10.129.0.43    aro-cluster-h78zv-h94mh-master-2               <none>           <none>
dns-default-wprln     2/2     Running   0          2m52s   10.130.1.70    aro-cluster-h78zv-h94mh-master-1               <none>           <none>
node-resolver-4dmgj   1/1     Running   0          17h     10.0.2.4       aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
node-resolver-5c6tj   1/1     Running   0          17h     10.0.0.10      aro-cluster-h78zv-h94mh-master-0               <none>           <none>
node-resolver-chfr6   1/1     Running   0          17h     10.0.0.7       aro-cluster-h78zv-h94mh-master-2               <none>           <none>
node-resolver-mnhsp   1/1     Running   0          17h     10.0.2.6       aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
node-resolver-snxsb   1/1     Running   0          17h     10.0.0.9       aro-cluster-h78zv-h94mh-master-1               <none>           <none>
node-resolver-sp7h8   1/1     Running   0          17h     10.0.2.5       aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>

$ oc get pod -o wide -n project-100
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
tools-54f4d6844b-lr6z9   1/1     Running   0          17h   10.131.0.40   aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>

$ oc get dns.operator default -o yaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  creationTimestamp: "2024-01-11T09:14:03Z"
  finalizers:
  - dns.operator.openshift.io/dns-controller
  generation: 4
  name: default
  resourceVersion: "4216641"
  uid: c8f5c627-2010-4c4a-a5fe-ed87f320e427
spec:
  logLevel: Normal
  nodePlacement: {}
  operatorLogLevel: Normal
  servers:
  - forwardPlugin:
      policy: Random
      protocolStrategy: ""
      upstreams:
      - 10.0.0.9
    name: example
    zones:
    - example.xyz
  upstreamResolvers:
    policy: Sequential
    transportConfig: {}
    upstreams:
    - port: 53
      type: SystemResolvConf
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2024-01-19T07:54:18Z"
    message: Enough DNS pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-01-19T07:55:02Z"
    message: All DNS and node-resolver pods are available, and the DNS service has
      a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2024-01-18T13:29:59Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster
      IP address.
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2024-01-11T09:14:04Z"
    message: DNS Operator can be upgraded
    reason: AsExpected
    status: "True"
    type: Upgradeable

$ oc rsh -n project-100 tools-54f4d6844b-lr6z9
sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net
Host _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net not found: 2(SERVFAIL)

$ oc logs dns-default-74nnw
Defaulted container "dns" out of: dns, kube-rbac-proxy
.:5353
hostname.bind.:5353
example.xyz.:5353
[INFO] plugin/reload: Running configuration SHA512 = 88c7c194d29d0a23b322aeee1eaa654ef385e6bd1affae3715028aba1d33cc8340e33184ba183f87e6c66a2014261c3e02edaea8e42ad01ec6a7c5edb34dfc6a
CoreDNS-1.10.1
linux/amd64, go1.19.13 X:strictfipsruntime, 
[INFO] 10.131.0.40:39333 - 54228 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.001868103s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size
[INFO] 10.131.0.40:39333 - 54228 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.003223099s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size

---

https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.12.47/release.txt - using quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c0de49c0e76f2ee23a107fc9397f2fd32e7a6a8a458906afd6df04ff5bb0f7b

$ oc get pod -o wide
NAME                  READY   STATUS    RESTARTS   AGE     IP             NODE                                           NOMINATED NODE   READINESS GATES
dns-default-8vrwd     2/2     Running   0          6m22s   10.129.0.45    aro-cluster-h78zv-h94mh-master-2               <none>           <none>
dns-default-fm59d     2/2     Running   0          7m4s    10.129.2.190   aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
dns-default-grtqs     2/2     Running   0          7m48s   10.130.1.73    aro-cluster-h78zv-h94mh-master-1               <none>           <none>
dns-default-l8mp2     2/2     Running   0          6m43s   10.131.0.49    aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
dns-default-slc4n     2/2     Running   0          8m11s   10.128.1.126   aro-cluster-h78zv-h94mh-master-0               <none>           <none>
dns-default-xgr7c     2/2     Running   0          7m25s   10.128.2.51    aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
node-resolver-2nmpx   1/1     Running   0          10m     10.0.2.4       aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
node-resolver-689j7   1/1     Running   0          10m     10.0.2.5       aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
node-resolver-8qhls   1/1     Running   0          10m     10.0.0.7       aro-cluster-h78zv-h94mh-master-2               <none>           <none>
node-resolver-nv8mq   1/1     Running   0          10m     10.0.2.6       aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
node-resolver-r52v7   1/1     Running   0          10m     10.0.0.10      aro-cluster-h78zv-h94mh-master-0               <none>           <none>
node-resolver-z8d4n   1/1     Running   0          10m     10.0.0.9       aro-cluster-h78zv-h94mh-master-1               <none>           <none>

$ oc get pod -n project-100 -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
tools-54f4d6844b-lr6z9   1/1     Running   0          18h   10.131.0.40   aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>

$ oc rsh -n project-100 tools-54f4d6844b-lr6z9
sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1032 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1039 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1043 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1048 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1049 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1050 x1-9-foobar.bla.example.net.

---

https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.15.0-rc.2/release.txt - using quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9e8ffba7854f3f02e8940ddcb2636ceb4773db77872ff639a447c4bab3a69ecc

$ oc get pod -o wide
NAME                  READY   STATUS    RESTARTS   AGE     IP             NODE                                           NOMINATED NODE   READINESS GATES
dns-default-gcs7s     2/2     Running   0          5m      10.128.2.52    aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
dns-default-mnbh4     2/2     Running   0          4m37s   10.129.0.46    aro-cluster-h78zv-h94mh-master-2               <none>           <none>
dns-default-p2s6v     2/2     Running   0          3m55s   10.130.1.77    aro-cluster-h78zv-h94mh-master-1               <none>           <none>
dns-default-svccn     2/2     Running   0          3m13s   10.128.1.128   aro-cluster-h78zv-h94mh-master-0               <none>           <none>
dns-default-tgktg     2/2     Running   0          3m34s   10.131.0.50    aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
dns-default-xd5vq     2/2     Running   0          4m16s   10.129.2.191   aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
node-resolver-2nmpx   1/1     Running   0          18m     10.0.2.4       aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
node-resolver-689j7   1/1     Running   0          18m     10.0.2.5       aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
node-resolver-8qhls   1/1     Running   0          18m     10.0.0.7       aro-cluster-h78zv-h94mh-master-2               <none>           <none>
node-resolver-nv8mq   1/1     Running   0          18m     10.0.2.6       aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
node-resolver-r52v7   1/1     Running   0          18m     10.0.0.10      aro-cluster-h78zv-h94mh-master-0               <none>           <none>
node-resolver-z8d4n   1/1     Running   0          18m     10.0.0.9       aro-cluster-h78zv-h94mh-master-1               <none>           <none>

$ oc get pod -n project-100 -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
tools-54f4d6844b-lr6z9   1/1     Running   0          18h   10.131.0.40   aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>

$ oc rsh -n project-100 tools-54f4d6844b-lr6z9
sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net
Host _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net not found: 2(SERVFAIL)

$ oc logs dns-default-tgktg
Defaulted container "dns" out of: dns, kube-rbac-proxy
.:5353
hostname.bind.:5353
example.net.:5353
[INFO] plugin/reload: Running configuration SHA512 = 8efa6675505d17551d17ca1e2ca45506a731dbab1f53dd687d37cb98dbaf4987a90622b6b030fe1643ba2cd17198a813ba9302b84ad729de4848f8998e768605
CoreDNS-1.11.1
linux/amd64, go1.20.10 X:strictfipsruntime, 
[INFO] 10.131.0.40:35246 - 61734 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.003577431s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size
[INFO] 10.131.0.40:35246 - 61734 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.000969251s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size

---

quay.io/rhn_support_sreber/coredns:latest - based on https://github.com/coredns/coredns master branch build on January 19th 2024 (suspecting https://github.com/coredns/coredns/pull/6277 to be the fix)

$ oc get pod -o wide
NAME                  READY   STATUS    RESTARTS   AGE     IP             NODE                                           NOMINATED NODE   READINESS GATES
dns-default-bpjpn     2/2     Running   0          2m22s   10.130.1.78    aro-cluster-h78zv-h94mh-master-1               <none>           <none>
dns-default-c7wcz     2/2     Running   0          99s     10.131.0.51    aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
dns-default-d7qjz     2/2     Running   0          3m6s    10.129.2.193   aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
dns-default-dkvtp     2/2     Running   0          78s     10.128.1.131   aro-cluster-h78zv-h94mh-master-0               <none>           <none>
dns-default-t6sv7     2/2     Running   0          2m44s   10.129.0.47    aro-cluster-h78zv-h94mh-master-2               <none>           <none>
dns-default-vf9f6     2/2     Running   0          2m      10.128.2.53    aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
node-resolver-2nmpx   1/1     Running   0          24m     10.0.2.4       aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
node-resolver-689j7   1/1     Running   0          24m     10.0.2.5       aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
node-resolver-8qhls   1/1     Running   0          24m     10.0.0.7       aro-cluster-h78zv-h94mh-master-2               <none>           <none>
node-resolver-nv8mq   1/1     Running   0          24m     10.0.2.6       aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
node-resolver-r52v7   1/1     Running   0          24m     10.0.0.10      aro-cluster-h78zv-h94mh-master-0               <none>           <none>
node-resolver-z8d4n   1/1     Running   0          24m     10.0.0.9       aro-cluster-h78zv-h94mh-master-1               <none>           <none>

$ oc get pod -n project-100 -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
tools-54f4d6844b-lr6z9   1/1     Running   0          18h   10.131.0.40   aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>

$ oc rsh -n project-100 tools-54f4d6844b-lr6z9
sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1032 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1039 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1043 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1048 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1049 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1050 x1-9-foobar.bla.example.net.

---

Back wth OpenShift Container Platform 4.13.27 but adjusting `CoreDNS` configuration. Defining specific forwardPlugin and enforcing TCP

$ oc get dns.operator default -o yaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  creationTimestamp: "2024-01-11T09:14:03Z"
  finalizers:
  - dns.operator.openshift.io/dns-controller
  generation: 7
  name: default
  resourceVersion: "4230436"
  uid: c8f5c627-2010-4c4a-a5fe-ed87f320e427
spec:
  logLevel: Normal
  nodePlacement: {}
  operatorLogLevel: Normal
  servers:
  - forwardPlugin:
      policy: Random
      protocolStrategy: TCP
      upstreams:
      - 10.0.0.9
    name: example
    zones:
    - example.net
  upstreamResolvers:
    policy: Sequential
    transportConfig: {}
    upstreams:
    - port: 53
      type: SystemResolvConf
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2024-01-19T08:27:21Z"
    message: Enough DNS pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-01-19T08:28:03Z"
    message: All DNS and node-resolver pods are available, and the DNS service has
      a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2024-01-19T08:00:02Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster
      IP address.
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2024-01-11T09:14:04Z"
    message: DNS Operator can be upgraded
    reason: AsExpected
    status: "True"
    type: Upgradeable

$ oc get pod -o wide
NAME                  READY   STATUS    RESTARTS   AGE     IP             NODE                                           NOMINATED NODE   READINESS GATES
dns-default-frdkm     2/2     Running   0          3m5s    10.131.0.52    aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
dns-default-jsfkb     2/2     Running   0          99s     10.129.0.49    aro-cluster-h78zv-h94mh-master-2               <none>           <none>
dns-default-jzzqc     2/2     Running   0          2m21s   10.128.2.54    aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
dns-default-sgf4h     2/2     Running   0          2m      10.130.1.79    aro-cluster-h78zv-h94mh-master-1               <none>           <none>
dns-default-t8nn7     2/2     Running   0          2m44s   10.129.2.194   aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
dns-default-xmvqg     2/2     Running   0          3m27s   10.128.1.133   aro-cluster-h78zv-h94mh-master-0               <none>           <none>
node-resolver-2nmpx   1/1     Running   0          29m     10.0.2.4       aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
node-resolver-689j7   1/1     Running   0          29m     10.0.2.5       aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
node-resolver-8qhls   1/1     Running   0          29m     10.0.0.7       aro-cluster-h78zv-h94mh-master-2               <none>           <none>
node-resolver-nv8mq   1/1     Running   0          29m     10.0.2.6       aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
node-resolver-r52v7   1/1     Running   0          29m     10.0.0.10      aro-cluster-h78zv-h94mh-master-0               <none>           <none>
node-resolver-z8d4n   1/1     Running   0          29m     10.0.0.9       aro-cluster-h78zv-h94mh-master-1               <none>           <none>

$ oc get pod -n project-100 -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
tools-54f4d6844b-lr6z9   1/1     Running   0          18h   10.131.0.40   aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>

$ oc rsh -n project-100 tools-54f4d6844b-lr6z9
sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1032 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1039 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1043 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1048 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1049 x1-9-foobar.bla.example.net.
_example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1050 x1-9-foobar.bla.example.net.

---

Back wth OpenShift Container Platform 4.13.27 but now, forcing TCP on pod level

$ oc get deployment tools -n project-100 -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    alpha.image.policy.openshift.io/resolve-names: '*'
    app.openshift.io/route-disabled: "false"
    deployment.kubernetes.io/revision: "5"
    image.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"tools:latest","namespace":"project-100"},"fieldPath":"spec.template.spec.containers[?(@.name==\"tools\")].image","pause":"false"}]'
    openshift.io/generated-by: OpenShiftWebConsole
  creationTimestamp: "2024-01-17T11:22:05Z"
  generation: 5
  labels:
    app: tools
    app.kubernetes.io/component: tools
    app.kubernetes.io/instance: tools
    app.kubernetes.io/name: tools
    app.kubernetes.io/part-of: tools
    app.openshift.io/runtime: other-linux
    app.openshift.io/runtime-namespace: project-100
  name: tools
  namespace: project-100
  resourceVersion: "4232839"
  uid: a8157243-71e1-4597-9aa5-497afed5f722
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: tools
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        openshift.io/generated-by: OpenShiftWebConsole
      creationTimestamp: null
      labels:
        app: tools
        deployment: tools
    spec:
      containers:
      - command:
        - /bin/bash
        - -c
        - while true; do sleep 1;done
        image: image-registry.openshift-image-registry.svc:5000/project-100/tools@sha256:fba289d2ff20df2bfe38aa58fa3e491bbecf09e90e96b3c9b8c38f786dc2efb8
        imagePullPolicy: Always
        name: tools
        ports:
        - containerPort: 8080
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsConfig:
        options:
        - name: use-vc
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2024-01-17T11:23:56Z"
    lastUpdateTime: "2024-01-17T11:23:56Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-01-17T11:22:05Z"
    lastUpdateTime: "2024-01-19T08:33:28Z"
    message: ReplicaSet "tools-6749b4cf47" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 5
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

$ oc get pod -o wide
NAME                  READY   STATUS    RESTARTS   AGE     IP             NODE                                           NOMINATED NODE   READINESS GATES
dns-default-7kfzh     2/2     Running   0          2m25s   10.129.2.196   aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
dns-default-g4mtd     2/2     Running   0          2m25s   10.128.2.55    aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
dns-default-l4xkg     2/2     Running   0          2m26s   10.129.0.50    aro-cluster-h78zv-h94mh-master-2               <none>           <none>
dns-default-l7rq8     2/2     Running   0          2m25s   10.128.1.135   aro-cluster-h78zv-h94mh-master-0               <none>           <none>
dns-default-lt6zx     2/2     Running   0          2m26s   10.131.0.53    aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>
dns-default-t6bzl     2/2     Running   0          2m25s   10.130.1.82    aro-cluster-h78zv-h94mh-master-1               <none>           <none>
node-resolver-279mf   1/1     Running   0          2m24s   10.0.2.6       aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh   <none>           <none>
node-resolver-2bzfc   1/1     Running   0          2m24s   10.0.2.4       aro-cluster-h78zv-h94mh-worker-eastus3-jhvff   <none>           <none>
node-resolver-bdz4m   1/1     Running   0          2m24s   10.0.0.7       aro-cluster-h78zv-h94mh-master-2               <none>           <none>
node-resolver-jrv2w   1/1     Running   0          2m24s   10.0.0.9       aro-cluster-h78zv-h94mh-master-1               <none>           <none>
node-resolver-lbfg5   1/1     Running   0          2m23s   10.0.0.10      aro-cluster-h78zv-h94mh-master-0               <none>           <none>
node-resolver-qnm92   1/1     Running   0          2m24s   10.0.2.5       aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>

$ oc get pod -n project-100 -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
tools-6749b4cf47-gmw9v   1/1     Running   0          50s   10.131.0.54   aro-cluster-h78zv-h94mh-worker-eastus1-99l7n   <none>           <none>

$ oc rsh -n project-100 tools-6749b4cf47-gmw9v
sh-4.4$ cat /etc/resolv.conf 
search project-100.svc.cluster.local svc.cluster.local cluster.local khrmlwa2zp4e1oisi1qjtoxwrc.bx.internal.cloudapp.net
nameserver 172.30.0.10
options ndots:5 use-vc

sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net
Host _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net not found: 2(SERVFAIL)

$ oc logs dns-default-lt6zx
Defaulted container "dns" out of: dns, kube-rbac-proxy
.:5353
hostname.bind.:5353
example.xyz.:5353
[INFO] plugin/reload: Running configuration SHA512 = 79d17b9fc0f61d2c6db13a0f7f3d0a873c4d86ab5cba90c3819a5b57a48fac2ef0fb644b55e959984cd51377bff0db04f399a341a584c466e540a0d7501340f7
CoreDNS-1.10.1
linux/amd64, go1.19.13 X:strictfipsruntime, 
[INFO] 10.131.0.40:51367 - 22867 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.00024781s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size
[INFO] 10.131.0.40:51367 - 22867 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.00096551s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size
[INFO] 10.131.0.54:44935 - 3087 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.000619524s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size
[INFO] 10.131.0.54:44935 - 3087 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.000369584s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.13, 4.14, 4.15

How reproducible:

Always

Steps to Reproduce:

1. Run "host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net" inside a pod

Actual results:

dns-default pod is reporting below error when running the query.

[INFO] 10.131.0.40:39333 - 54228 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.001868103s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size
[INFO] 10.131.0.40:39333 - 54228 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.003223099s
[ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size

And the command "host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net" will fail.

sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net
Host _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net not found: 2(SERVFAIL)

Expected results:

No error reported in dns-default pod and query to actually return expected result

Additional info:

I suspect https://github.com/coredns/coredns/issues/5953 respectively https://github.com/coredns/coredns/pull/6277 being related. Hence built CoreDNS from master branch and created quay.io/rhn_support_sreber/coredns:latest. When running that Image in dns-default pod resolving the host query works again.

https://github.com/openshift/coredns/pull/110

Bug OCPBUGS-28754: capi-ibmcloud-controller-manager ContainerCreating shouldn't happen on IBMCloud

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28539~~. The following is the description of the original issue:
—
Description of problem:

Pod capi-ibmcloud-controller-manager stuck in ContainerCreating on IBM cloud

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Built a cluster on ibm cloud and enable TechPreviewNoUpgrade
    2.
    3.

Actual results:

4.16 cluster
$ oc get po                        
NAME                                                READY   STATUS              RESTARTS      AGE
capi-controller-manager-6bccdc844-jsm4s             1/1     Running             9 (24m ago)   175m
capi-ibmcloud-controller-manager-75d55bfd7d-6qfxh   0/2     ContainerCreating   0             175m
cluster-capi-operator-768c6bd965-5tjl5              1/1     Running             0             3h

  Warning  FailedMount       5m15s (x87 over 166m)  kubelet            MountVolume.SetUp failed for volume "credentials" : secret "capi-ibmcloud-manager-bootstrap-credentials" not found

$ oc get clusterversion               
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-01-21-154905   True        False         156m    Cluster version is 4.16.0-0.nightly-2024-01-21-154905

4.15 cluster
$ oc get po                           
NAME                                                READY   STATUS              RESTARTS        AGE
capi-controller-manager-6b67f7cff4-vxtpg            1/1     Running             6 (9m51s ago)   35m
capi-ibmcloud-controller-manager-54887589c6-6plt2   0/2     ContainerCreating   0               35m
cluster-capi-operator-7b7f48d898-9r6nn              1/1     Running             1 (17m ago)     39m
$ oc get clusterversion           
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-01-22-160236   True        False         11m     Cluster version is 4.15.0-0.nightly-2024-01-22-160236

Expected results:

No pod is in ContainerCreating status

Additional info:

must-gather: https://drive.google.com/file/d/1F5xUVtW-vGizAYgeys0V5MMjp03zkSEH/view?usp=sharing

https://github.com/openshift/cluster-capi-operator/pull/158

Bug OCPBUGS-33208: nil pointer dereference in AzurePathFix controller

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33172~~. The following is the description of the original issue:
—
Seeing this in hypershift e2e. I think it is racing with the Infrastructure status being populated and PlatformStatus being nil.

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-hypershift-release-4.16-periodics-e2e-aws-ovn/1785458059246571520/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestAutoscaling_Teardown/namespaces/e2e-clusters-rjhhw-example-g6tsn/core/pods/logs/cluster-image-registry-operator-5597f9f4d4-dfvc6-cluster-image-registry-operator-previous.log

I0501 00:13:11.951062       1 azurepathfixcontroller.go:324] Started AzurePathFixController
I0501 00:13:11.951056       1 base_controller.go:73] Caches are synced for LoggingSyncer 
I0501 00:13:11.951072       1 imageregistrycertificates.go:214] Started ImageRegistryCertificatesController
I0501 00:13:11.951077       1 base_controller.go:110] Starting #1 worker of LoggingSyncer controller ...
E0501 00:13:11.951369       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 534 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2d6bd00?, 0x57a60e0})
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x3bcb370?})
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x2d6bd00?, 0x57a60e0?})
	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).sync(0xc000003d40)
	/go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:171 +0x97
github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).processNextWorkItem(0xc000003d40)
	/go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:154 +0x292
github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).runWorker(...)
	/go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:133
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001186820?, {0x3bd1320, 0xc000cace40}, 0x1, 0xc000ca2540)
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0011bac00?, 0x3b9aca00, 0x0, 0xd0?, 0x447f9c?)
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0xc001385f68?, 0xc001385f78?)
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e
created by github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).Run in goroutine 248
	/go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:322 +0x1a6
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x2966e97]

https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/operator/azurepathfixcontroller.go#L171

https://github.com/openshift/cluster-image-registry-operator/pull/1031

Bug OCPBUGS-24252: [UI] ODF installation console in an AWS STS enabled cluster has issues

View the Description View the linked PRs

Openshift data foundation installation wizard will be having option to enter role arn details in an AWS STS enabled OCP cluster. But this particular field is not letting to enter any values, the moment we type anything it got auto populated with [object Object] and after that we cant add or paste anything to it.

Tried to inspect the page and add element and on pressing install button. It throws below error:

An error occurred

Converting circular structure to JSON --> starting at object with constructor 'HTMLInputElement' | property '__reactFiber$rrh47yimfa' -> object with constructor 'Lu' — property 'stateNode' closes the circle

https://github.com/openshift/console/pull/13416

Bug OCPBUGS-25643: Alert, Metrics page not loading in OCP Console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25313~~. The following is the description of the original issue:
—
Description of problem:

Unable to view the alerts, metrics page, getting a blank page.

Version-Release number of selected component (if applicable):

4.15.0-nightly

How reproducible:

Always

Steps to Reproduce:

Click on any alert under "Notification Panel" to view more, and you will be redirected to the alert page.

Actual results:

User is unable to view any alerts, metrics.

Expected results:

User should be able to view all/individual alerts, metrics.

Additional info:

N.A

https://github.com/openshift/monitoring-plugin/pull/89

Bug OCPBUGS-31799: Improve PipelineRun list view performance

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23480~~. The following is the description of the original issue:
—
Description of problem:

PipelineRun list view contains Task status column, which shows the overall task status of the pipelinerurn. Inorder to render this column we fetch all the tasksruns of that pipelinerun. Every pipelinerun row will have to have all the related TaskRuns information, which is causing performance issue in the pipelinerun list view.

Customer is facing issue of UI slowness and rendering problem for large number of pipelineruns with and without results enabled. In both cases, there is significant slowness being observed which is hampering their daily operations.

How reproducible:

Always

Steps to Reproduce:

1. Create few pipelineruns
2. Navigate to pipelineruns list view

Actual results:

All the Taskruns are being fetched and the pipelinerun list view renders this column asynchronously with loading indicator.

Expected results:

Taskruns should not be fetched at all, rather UI need to parse the `` string to render this column.

Additional info:

Pipelinerun status message gets updated on every task completion.

pipelinerun.status.conditions:

lastTransitionTime: '2023-11-15T07:51:42Z'
message: 'Tasks Completed: 3 (Failed: 0, Cancelled 0), Skipped: 0'
reason: Succeeded
status: 'True'
type: Succeeded

we can parse the above information to derive the following object and use this for rendering the column, this will increase the performance of this page hugely.

{
 completed: 3, // 3 (total count) - 0 (failed count) - 0 (cancelled count),
 failed: 0,
 cancelled: 0,
 skipped: 0,
 pending: 0 
}

Slack thread for more details - thread

https://github.com/openshift/console/pull/13731

Bug OCPBUGS-19107: Update 4.15 ose-egress-http-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/150

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/150

Bug OCPBUGS-19173: Update 4.15 openshift-enterprise-console-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console-operator/pull/794

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-20356: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-18945~~.

https://github.com/openshift/installer/pull/7616

Bug OCPBUGS-21815: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1170

Bug OCPBUGS-37704: No ability to debug node-ip detection logic

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35891~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-32348~~. The following is the description of the original issue:
—
After fixing https://issues.redhat.com/browse/OCPBUGS-29919 by merging https://github.com/openshift/baremetal-runtimecfg/pull/301 we have lost ability to properly debug the logic of selection Node IP used in runtimecfg.

In order to preserve debugability of this component, it should be possible to selectively enable verbose logs.

Bug OCPBUGS-43582: Panic seen in CI job for MCC pod

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42722~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42256~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41631. The following is the description of the original issue:
—
Description of problem:

Panic seen in below CI job when run the below command

$ w3m -dump -cols 200 'https://search.dptools.openshift.org/?name=^periodic&type=junit&search=machine-config-controller.*Observed+a+panic' | grep 'failures match'
periodic-ci-openshift-insights-operator-stage-insights-operator-e2e-tests-periodic (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-insights-operator-release-4.17-insights-operator-e2e-tests-periodic (all) - 2 runs, 100% failed, 50% of failures match = 50% impact

Panic observed:

E0910 09:00:04.283647       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 268 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x36c8b40, 0x5660c90})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000ce8540?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x36c8b40?, 0x5660c90?})
	/usr/lib/golang/src/runtime/panic.go:770 +0x132
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).updateNode(0xc000d6e360, {0x3abd580?, 0xc00224a608}, {0x3abd580?, 0xc001bd2308})
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:585 +0x1f3
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:246
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:976 +0xea
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001933f70, {0x3faaba0, 0xc000759710}, 0x1, 0xc00097bda0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000750f70, 0x3b9aca00, 0x0, 0x1, 0xc00097bda0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000dc2630)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 261
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x33204b3]

Version-Release number of selected component (if applicable):

How reproducible:

Seen in this CI run -https://prow.ci.openshift.org/job-history/test-platform-results/logs/periodic-ci-openshift-insights-operator-stage-insights-operator-e2e-tests-periodic

Steps to Reproduce:

$ w3m -dump -cols 200 'https://search.dptools.openshift.org/?name=^periodic&type=junit&search=machine-config-controller.*Observed+a+panic' | grep 'failures match'

Actual results:

Expected results:

 No panic to observe

Additional info:

https://github.com/openshift/machine-config-operator/pull/4653

Bug OCPBUGS-21621: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24127: Update 4.15 ose-cluster-cloud-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/302

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/302

Bug OCPBUGS-25864: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2180

Bug OCPBUGS-33205: [gcp] Bootstrap node should honor http proxy when fetching bootstrap ignition

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-12890~~. The following is the description of the original issue:
—
Description of problem:

Openshift Installer supports HTTP Proxy configuration in a restricted environment. However, it seems the bootstrap node doesn't use the given proxy when it grabs ignition assets.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-27-113605

How reproducible:

Always

Steps to Reproduce:

1. try IPI installation in a restricted/disconnected network with "publish: Internal", and without using Google Private Access

Actual results:

The installation failed, because bootstrap node failed to fetch its ignition config.

Expected results:

The installation should succeed.

Additional info:

We'd ever fixed similar issue on AWS (and Alibabacloud) by https://bugzilla.redhat.com/show_bug.cgi?id=2090836.

https://github.com/openshift/installer/pull/8339

Bug OCPBUGS-44240: Setting ESP offload off for bonds does not work reliably

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44043~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43917~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42987. The following is the description of the original issue:
—
It is been observed that the esp_offload kernel module might be loaded by libreswan even if bond ESP offloads have been correctly turned off.

This might be because ipsec service and configure-ovs run at the same time, so it is possible that ipsec service starts when bond offloads are not yet turned off and trick libreswan into thinking they should be used.

The potential fix would be to run ipsec service after configure-ovs.

https://github.com/openshift/machine-config-operator/pull/4684

Bug MGMT-15653: [BE] Domain with double -- (cat--rahul.com) rejected in network validation

View the Description View the linked PRs

Description of the problem:

Base domain contains double `–` like cat–rahul.com allowed by UI and BE and when node discovered , network validation fails.

Current domain is a private case for using – but note that UI and BE allows to send many – chars as part of domain name.

from agent logs:

Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Creating execution step for ntp-synchronizer ntp-synchronizer-70565cf4 args <[{\"ntp_source\":\"\"}]>" file="step_processor.go:123" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Creating execution step for domain-resolution domain-resolution-f3917dea args <[{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]>" file="step_processor.go:123" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating domain resolution with args [{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating inventory with args [fea3d7b9-a990-48a6-9a46-4417915072b0]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=error msg="Failed to validate domain resolution: data, {\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}" file="action.go:42" error="validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating ntp synchronizer with args [{\"ntp_source\":\"\"}]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating free addresses with args [[\"192.168.123.0/24\"]]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- sh -c cp /etc/mtab /root/mtab-fea3d7b9-a990-48a6-9a46-4417915072b0 && podman run --privileged --pid=host --net=host --rm --quiet -v /var/log:/var/log -v /run/udev:/run/udev -v /dev/disk:/dev/disk -v /run/systemd/journal/socket:/run/systemd/journal/socket -v /var/log:/host/var/log:ro -v /proc/meminfo:/host/proc/meminfo:ro -v /sys/kernel/mm/hugepages:/host/sys/kernel/mm/hugepages:ro -v /proc/cpuinfo:/host/proc/cpuinfo:ro -v /root/mtab-fea3d7b9-a990-48a6-9a46-4417915072b0:/host/etc/mtab:ro -v /sys/block:/host/sys/block:ro -v /sys/devices:/host/sys/devices:ro -v /sys/bus:/host/sys/bus:ro -v /sys/class:/host/sys/class:ro -v /run/udev:/host/run/udev:ro -v /dev/disk:/host/dev/disk:ro registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-agent-rhel8:v1.0.0-279 inventory]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=error msg="Unable to create runner for step <domain-resolution-f3917dea>, args <[{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]>" file="step_processor.go:126" error="validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- findmnt --raw --noheadings --output SOURCE,TARGET --target /run/media/iso]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- sh -c podman ps --format '{{.Names}}' | grep -q '^free_addresses_scanner$' || podman run --privileged --net=host --rm --quiet --name free_addresses_scanner -v /var/log:/var/log -v /run/systemd/journal/socket:/run/systemd/journal/socket registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-agent-rhel8:v1.0.0-279 free_addresses '[\"192.168.123.0/24\"]']" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- timeout 30 chronyc -n sources]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=warning msg="Sending step <domain-resolution-f3917dea> reply output <> error <validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'> exit-code <-1>" file="step_processor.go:76" request_id=5467e025-2683-4119-a55a-976bb7787279

How reproducible:

Create a cluster with domain cat–rahul.com with UI fix that allowing it.

Once node discovered , network validation fails on :

DNS wildcard not configured: DNS wildcard check cannot be performed yet because the host has not yet performed DNS resolution.

Steps to reproduce:

see above

Actual results:

Unable to install cluster due to network validation failure

Expected results:
The domain should be allowed in regex

Bug OCPBUGS-17757: GCP CLI authentication should only be allowed in manual mode

View the Description View the linked PRs

Description of problem:

Authenticate using the gcloud cli. The gcp credentials should no longer be using the data from osServiceAccount.json file. The installer should only allow installs to proceed when using Manual credentials mode.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Remove ~/.gcp/osServiceAccount.json
2. ensure that GOOGLE_APPLICATION_CREDENTIALS environment variable is not set.
3. gcloud auth application-default login.
4. Run the installer

Actual results:

Install succeeds

Expected results:

Install should fail noting the install mode is not Manual

Additional info:

https://github.com/openshift/installer/pull/7422

Bug OCPBUGS-19217: Update 4.15 ibm-vpc-node-label-updater image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-node-label-updater/pull/25

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-node-label-updater/pull/25

Bug OCPBUGS-24153: Update 4.15 telemeter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/telemeter/pull/496

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-43646: [4.15] Cloud Credentials operator generating millions of messages per day in GCP clusters

View the Description View the linked PRs

The customer's cloud credentials operator generates millions of the below messages per day in the GCP cluster.

And they want to reduce/stop these logs as it is consuming more disks. Also, their "cloud credentials" operator runs in manual mode.

time="2024-06-21T08:37:42Z" level=warning msg="read-only creds not found, using root creds client" actuator=gcp cr=openshift-cloud-credential-operator/openshift-gcp-ccm secret=openshift-cloud-credential-operator/cloud-credential-operator-gcp-ro-creds
time="2024-06-21T08:37:42Z" level=error msg="error creating GCP client" error="Secret \"gcp-credentials\" not found"
time="2024-06-21T08:37:42Z" level=error msg="error determining whether a credentials update is needed" actuator=gcp cr=openshift-cloud-credential-operator/openshift-gcp-ccm error="unable to check whether credentialsRequest needs update"
time="2024-06-21T08:37:42Z" level=error msg="error syncing credentials: error determining whether a credentials update is needed" controller=credreq cr=openshift-cloud-credential-operator/openshift-gcp-ccm secret=openshift-cloud-controller-manager/gcp-ccm-cloud-credentials
time="2024-06-21T08:37:42Z" level=error msg="errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential-operator/openshift-gcp-ccm secret=openshift-cloud-controller-manager/gcp-ccm-cloud-credentials
time="2024-06-21T08:37:42Z" level=info msg="reconciling clusteroperator status"
time="2024-06-21T08:37:42Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-gcp-pd-csi-driver-operator
time="2024-06-21T08:37:42Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-gcp-pd-csi-driver-operator
time="2024-06-21T08:37:42Z" level=warning msg="read-only creds not found, using root creds client" actuator=gcp cr=openshift-cloud-credential-operator/openshift-gcp-pd-csi-driver-operator secret=openshift-cloud-credential-operator/cloud-credential-operator-gcp-ro-creds

https://github.com/openshift/cloud-credential-operator/pull/774

Bug OCPBUGS-18649: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/708

Bug OCPBUGS-30280: Openshift API server should not go unavailable during upgrade of an HA control plane

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23457~~. The following is the description of the original issue:
—
Description of problem:

During the control plane upgrade e2e test, it seems that the openshift apiserver becomes unavailable during the upgrade process. The test is run on an HA control plane, and this should not happen.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Often

Steps to Reproduce:

1. Create a hosted cluster with HA control plane and wait for it to become available
2. Upgrade the hosted cluster to a newer release
3. While upgrading, monitor whether the openshift apiserver is available by either querying APIService resources or resources served by the openshift apiserver.

Actual results:

The openshift apiserver is unavailable at some point during the upgrade

Expected results:

The openshift apiserver is available throughout the upgrade

Additional info:

https://github.com/openshift/hypershift/pull/3692

Story HOSTEDCP-1283: Investigate and fix why some azure nodes not joining the cluster

View the Description View the linked PRs

Currently when creating an Azure cluster, only the first node of the nodePool will be ready and join the cluster, all other azure machines are stuck in the `Creating` state.

https://github.com/openshift/hypershift/pull/3174

Bug OCPBUGS-19239: Update 4.15 openshift-enterprise-egress-router image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/151

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/151

Bug OCPBUGS-32259: installation failing if proxy set with % character in the credentials

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27965~~. The following is the description of the original issue:
—
Description of problem:

    If a cluster is installed using proxy and the username used for connecting to the proxy contains the characters "%40" for encoding a "@" in case of providing a doamin, the instalation fails. The failure is because the proxy variables implemented in the file "/etc/systemd/system.conf.d/10-default-env.conf" in the bootstrap node are ignored by systemd. This issue seems was already fixed in MCO (BZ 1882674 - fixed in RHOCP 4.7), but looks like is affecting the bootstrap process in 4.13 and 4.14, causing the installation to not start at all.

Version-Release number of selected component (if applicable):

    4.14, 4.13

How reproducible:

    100% always

Steps to Reproduce:

    1. create a install-config.yaml file with "%40" in the middle of the username used for proxy.
    2. start cluster installation.
    3. bootstrap will fail for not using proxy variables.

Actual results:

Installation fails because systemd fails to load the proxy varaibles if "%" is present in the username.

Expected results:

    Installation to succeed using a username with "%40" for the proxy.

Additional info:

File "/etc/systemd/system.conf.d/10-default-env.conf" for the bootstrap should be generated in a way accepted by systemd.

https://github.com/openshift/installer/pull/8272

Bug OCPBUGS-42171: High number of redundant kubeproxy rules present in OCP 4.15 with OpenshiftSDN

View the Description View the linked PRs

backport of ~~OCPBUGS-42159~~

https://github.com/openshift/sdn/pull/634

Bug OCPBUGS-24142: Update 4.15 ose-kube-storage-version-migrator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/201

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/201

Bug OCPBUGS-24148: Update 4.15 cluster-network-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-network-operator/pull/2133

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-network-operator/pull/2133

Bug OCPBUGS-36971: [release-4.15] Operand details page shows incorrect API version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36841~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34901~~. The following is the description of the original issue:
—
Description of problem:

My CSV recently added a v1beta2 API version in addition to the existing v1beta1 version. When I create a v1beta2 CR and view it in the console, I see v1beta1 API fields and not the expected v1beta2 fields.

Version-Release number of selected component (if applicable):

4.15.14 (could affect other versions)

How reproducible:

Install 3.0.0 development version of Cryostat Operator

Steps to Reproduce:

    1. operator-sdk run bundle quay.io/ebaron/cryostat-operator-bundle:ocpbugs-34901
    2. cat << 'EOF' | oc create -f -
    apiVersion: operator.cryostat.io/v1beta2
    kind: Cryostat
    metadata:
      name: cryostat-sample
    spec:
      enableCertManager: false
    EOF
    3. Navigate to https://<openshift console>/k8s/ns/openshift-operators/clusterserviceversions/cryostat-operator.v3.0.0-dev/operator.cryostat.io~v1beta2~Cryostat/cryostat-sample
    4. Observe v1beta1 properties are rendered including "Minimal Deployment"
    5. Attempt to toggle "Minimal Deployment", observe that this fails.

Actual results:

v1beta1 properties are rendered in the details page instead of v1beta2 properties

Expected results:

v1beta2 properties are rendered in the details page

Additional info:

https://github.com/openshift/console/pull/14058

Bug OCPBUGS-16756: Cluster dropdown items are not marked for i18n when ACM/MCE installed

View the Description View the linked PRs

Description of problem:

After install ACM/MCE, there is dropdown list for switching cluster on the top masthead, the items in dropdown list are not marked for i18n, There is no translations for different languages.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-25-000711

How reproducible:

Always

Steps to Reproduce:

1.From operatorhub, install MCE operator and install required operand by default.
2.After refresh browser, check the translation for clusters dropdown list:"All Clusters/local-cluster".
3.

Actual results:

2. There are not marked for i18n, and don't have translation for different languages.

Expected results:

3. They should have translation for different languages.

Additional info:

https://github.com/openshift/console/pull/13238

Bug OCPBUGS-19356: Expose and propagate TopologySpreadConstraints for admission webhook

View the Description View the linked PRs

Backport facilitator for linked issue.

https://github.com/openshift/cluster-monitoring-operator/pull/2073

Bug OCPBUGS-22912: the value of ELB subnet tag should be 1 or empty, not true

View the Description View the linked PRs

Description of problem:

To make AWS Load Balancer Operator work on HyperShift, one of the requirements is the ELB tag should be set on subnets. see https://github.com/openshift/aws-load-balancer-operator/blob/main/docs/prerequisites.md#vpc-and-subnets   

The value of `kubernetes.io/role/elb` or `kubernetes.io/role/internal-elb`should be 1 or ``. 

but from the code below, hypershift uses "true"  

https://github.com/openshift/hypershift/blob/3e1db35d562d069797f9dec2b47227744f689684/cmd/infra/aws/ec2.go#L226

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. install hypershift cluster
2. check subnet tags
3.

Actual results:

value of `kubernetes.io/role/elb` is "true"

Expected results:

value of `kubernetes.io/role/elb` is 1 or ``

Additional info:

https://github.com/openshift/hypershift/pull/3198

Bug OCPBUGS-28769: CVE-2024-1139 cluster-monitoring-operator-container: cluster-monitoring-operator: credentials leak [openshift-4.15]

View the Description View the linked PRs

Security Tracking Issue

Do not make this issue public.

NOTE THIS ISSUE IS CURRENTLY EMBARGOED, DO NOT MAKE PUBLIC COMMITS OR COMMENTS ABOUT THIS ISSUE.

WARNING: NOTICE THAT CHANGING THE SECURITY LEVEL FROM "SECURITY ISSUE" TO "RED HAT INTERNAL" MAY BREAK THE EMBARGO.

Flaw:

EMBARGOED CVE-2024-1139 cluster-monitoring-operator: credentials leak
https://bugzilla.redhat.com/show_bug.cgi?id=2262158

The below issue was reported to ProdSec by Simon Pasquier:

In OCP, the telemeter-client pod running in the
openshift-monitoring has an annotation containing the cluster's pull secret
for the cloud.openshift.com and quay.io registries.

The cause of the bug is that we use the token string concatenated with the
hash [2] instead of writing the token string to the hash object and calling
Sum() with a nil slice.

The impact is that any user which can read the definition of the
telemeter-client pod and/or deployment gets access to the pull secret
token. Users with permissions from the cluster-reader clusterrole already
have access to the original pull secret because they can read the
"pull-secret" Secret in the openshift-config namespace.

The issue has been present since OCP 4.12 [3] [4].

[1] https://issues.redhat.com/browse/OCPBUGS-28650
[2]
https://github.com/openshift/cluster-monitoring-operator/blob/d45a3335c2bbada0948adef9fcba55c4e14fa1d7/pkg/manifests/manifests.go#L3135
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2114721
[4] https://github.com/openshift/cluster-monitoring-operator/pull/1747

This security tracking issue was filed based on manifesting data available to Product Security in https://deptopia.prodsec.redhat.com/ui/home. This data indicates that the component noted in the "pscomponent" label was found to be affected by this vulnerability. If you believe this issue is not actionable and was created erroneously, please fill out the following form and close this issue as Closed with a resolution of Obsolete. This will prompt Product Security to review what type of error caused this Jira issue to be created, and prevent further mistakes of this type in the future.

https://forms.gle/LnXaf5aCAHaV6g8T8

To better understand the distinction between a component being Affected vs Not Affected, please read the following article:
https://docs.engineering.redhat.com/pages/viewpage.action?spaceKey=PRODSEC&title=Understanding+Affected+and+Not+Affected

https://github.com/openshift/cluster-monitoring-operator/pull/2303

Bug OCPBUGS-38943: hypershift periodic conformance are failing due to coreos changes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38942~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-38941~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38925. The following is the description of the original issue:
—
Description of problem:

periodics are failing due to a change in coreos.

Version-Release number of selected component (if applicable):

    4.15,4.16,4.17,4.18

How reproducible:

    100%

Steps to Reproduce:

    1. Check any periodic conformance jobs
    2.
    3.

Actual results:

    periodic conformance fails with hostedcluster creation

Expected results:

    periodic conformance test suceeds

Additional info:

https://github.com/openshift/hypershift/pull/4613

Bug OCPBUGS-37630: Disruption monitor failing when running conformance against hypershift cluster

View the Description View the linked PRs

Description of problem:

When running a conformance suite against a hypershift cluster (for example, CNI conformance) the MonitorTests step fails because of missing files from the disruption monitor.

Version-Release number of selected component (if applicable):

4.15.13

How reproducible:

Consistent

Steps to Reproduce:

    1. Create a hypershift cluster
    2. Attempt to run an ose-tests suite. For example, the CNI conformance suite documented here: https://access.redhat.com/documentation/en-us/red_hat_software_certification/2024/html/red_hat_software_certification_workflow_guide/con_cni-certification_openshift-sw-cert-workflow-working-with-cloud-native-network-function#running-the-cni-tests_openshift-sw-cert-workflow-working-with-container-network-interface
    3. Note errors in logs

Actual results:

found errors fetching in-cluster data: [failed to list files in disruption event folder on node ip-10-0-130-177.us-west-2.compute.internal: the server could not find the requested resource failed to list files in disruption event folder on node ip-10-0-152-10.us-west-2.compute.internal: the server could not find the requested resource]
Failed to write events from in-cluster monitors, err: open /tmp/artifacts/junit/AdditionalEvents__in_cluster_disruption.json: no such file or directory

Expected results:

No errors

Additional info:

The first error can be avoided by creating the directory it's looking for on all nodes:
for node in $(oc get nodes -oname); do oc debug -n default $node -- chroot /host mkdir -p /var/log/disruption-data/monitor-events; done
However, I'm not sure if this directory not being created is due to the disruption monitor working properly on hypershift, or if this should be skipped on hypershift entirely.

The second error is related to the ARTIFACT_DIR env var not being set locally, and can be avoided by creating a directory, setting that directory as the ARTIFACT_DIR, and then creating an empty "junit" dir inside of it.
It looks like ARTIFACT_DIR defaults to a temporary directory if it's not set in the env, but the "junit" directory doesn't exist inside of it, so file creation in that non-existant directory fails.

https://github.com/openshift/origin/pull/28956

Vulnerability OCPBUGS-46933: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/940

Task OSASINFRA-3236: Speed up container deletion in cluster-destroy

View the Description View the linked PRs

Work has been done in Gophercloud; we now need to bump Gophercloud in Installer.

https://github.com/openshift/installer/pull/7208

Bug OCPBUGS-23128: buildah has trouble with transient mounting of nodev/noexec/nosuid/readonly items

View the Description View the linked PRs

Description of problem:

When building images, items such as the /run/secrets/redhat.repo file from the build container are bind-mounted into the rootfs of the image being built for the benefit of RUN instructions.  For a privileged build, the fact that the bind includes the nodev/noexec/nosuid flags doesn't cause any problems.  When attempting the build without privileges, where the source file (itself mounted into the build container from the host) is not owned by the user the builder container is running as, this can fail because the kernel won't allow a bind mount that tries to remove any of these flags, and the logic which handled transient mounts when using chroot isolation wasn't taking enough care to avoid that possibility.

Version-Release number of selected component (if applicable):

buildah-1.32.0 and earlier

How reproducible:

Always

Steps to Reproduce:

1. On a single-node setup, `touch` /etc/yum.repos.d/redhat.repo, which is the target of a symbolic link in /usr/share/rhel/secrets, which /usr/share/containers/mounts.conf tells CRI-O should have its contents exposed in containers.
2. Attempt to build this spec:
{{
apiVersion: build.openshift.io/v1
kind: Build
metadata:
  name: unprivileged
spec:
  source:
    type: Dockerfile
    dockerfile: |
      FROM registry.fedoraproject.org/fedora-minimal
      RUN find /run/secrets -ls
      RUN head /proc/self/uid_map /proc/self/gid_map /run/secrets/redhat.repo
  strategy:
    type: Docker
    dockerStrategy:
      env:
      - name: BUILD_PRIVILEGED
        value: "false"
}}
3.

Actual results:

error running subprocess: remounting "/tmp/buildahXXX/mnt/rootfs/run/secrets/redhat.repo" in mount namespace with expected flags: operation not permitted

Expected results:

No such mount error.  Depending on the permissions on the file, the unprivileged build may still fail if it attempts to use the contents of that file, but that's not a bug in the builder so much as a consequence of access controls.

Additional info:

https://github.com/openshift/builder/pull/358

Bug OCPBUGS-23756: After PatternFly5 update: YAML editor view shortcuts text and icon is missaligned

View the Description View the linked PRs

Issue 19 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Horizontal alignment is slightly off between text and icon

Screenshot: https://drive.google.com/file/d/1nzFHCeorlVIMbwlnjzEc1fCW0GXQa1KT/view

https://github.com/openshift/console/pull/13374

Bug OCPBUGS-27432: y-stream upgrade fails because CVO has no Upgradeable condition

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23518~~. The following is the description of the original issue:
—
When upgrading a HC from 4.13 to 4.14, after admin-acking the API deprecation check, the upgrade is still blocked by the ClusterVersionUpgradeble condition on the HC being Unknown. This is because the CVO in the guest cluster does not have an Upgradeable condition anymore.

https://github.com/openshift/hypershift/pull/3444

Bug OCPBUGS-31641: Internal Registry does not recognize the `ca-west-1` AWS Region

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29233~~. The following is the description of the original issue:
—
Description of problem:

Internal registry Pods will panic while deploying OCP on `ca-west-1` AWS Region

Version-Release number of selected component (if applicable):

4.14.2

How reproducible:

Every time

Steps to Reproduce:

    1. Deploy OCP on `ca-west-1` AWS Region

Actual results:

$ oc logs image-registry-85b69cd9fc-b78sb -n openshift-image-registry
time="2024-02-08T11:43:09.287006584Z" level=info msg="start registry" distribution_version=v3.0.0+unknown go.version="go1.20.10 X:strictfipsruntime" openshift_version=4.14.0-202311021650.p0.g5e7788a.assembly.stream-5e7788a
time="2024-02-08T11:43:09.287365337Z" level=info msg="caching project quota objects with TTL 1m0s" go.version="go1.20.10 X:strictfipsruntime"
panic: invalid region provided: ca-west-1goroutine 1 [running]:
github.com/distribution/distribution/v3/registry/handlers.NewApp({0x2873f40?, 0xc00005c088?}, 0xc000581800)
    /go/src/github.com/openshift/image-registry/vendor/github.com/distribution/distribution/v3/registry/handlers/app.go:130 +0x2bf1
github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp({0x2873f40, 0xc00005c088}, 0x0?, {0x2876820?, 0xc000676cf0})
    /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0xb9
github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp({0x2873f40?, 0xc00005c088}, {0x285ffd0?, 0xc000916070}, 0xc000581800, 0xc00095c000, {0x0?, 0x0})
    /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x485
github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer({0x2873f40, 0xc00005c088}, 0xc000581800, 0xc00095c000)
    /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:212 +0x38a
github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute({0x2858b60, 0xc000916000})
    /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:166 +0x86b
main.main()
    /go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x496

Expected results:

The internal registry is deployed with no issues

Additional info:

This is a new AWS Region we are adding support to. The support will be backported to 4.14.z

Bug OCPBUGS-31694: e2e: [Workloadhints]: Workload hints test cases gets stuck for certain test cases

View the Description View the linked PRs

Description of problem:

    Workload hints test cases get stuck  when the existing profile is similar to changes proposed in some of the test cases

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/1012

Bug OCPBUGS-38370: 4.12 -> 4.13 upgrade using IPI on Azure does not work

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37534~~. The following is the description of the original issue:
—
Description of problem:

Prow jobs upgrading from 4.9 to 4.16 are failing when they upgrade from 4.12 to 4.13.

Nodes become NotReady when MCO tries to apply the new 4.13 configuration to the MCPs.

The failing job is: periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.9-azure-ipi-f28

We have reproduced the issue and we found an ordering cycle error in the journal log

Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 systemd-journald.service[838]: Runtime Journal (/run/log/journal/960b04f10e4f44d98453ce5faae27e84) is 8.0M, max 641.9M, 633.9M free.
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found ordering cycle on network-online.target/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on node-valid-hostname.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on ovs-configuration.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on firstboot-osupdate.target/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on machine-config-daemon-firstboot.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on machine-config-daemon-pull.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Job network-online.target/start deleted to break ordering cycle starting with machine-config-daemon-pull.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: Queued start job for default target Graphical Interface.
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: (This warning is only shown for the first unit using IP firewalling.)
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: systemd-journald.service: Deactivated successfully.

Version-Release number of selected component (if applicable):

    Using IPI on Azure, these are the version involved in the current issue upgrading from 4.9 to 4.13:
    
      version: 4.13.0-0.nightly-2024-07-23-154444
      version: 4.12.0-0.nightly-2024-07-23-230744
      version: 4.11.59
      version: 4.10.67
      version: 4.9.59

How reproducible:

    Always

Steps to Reproduce:

    1. Upgrade an IPI on Azure cluster from 4.9 to 4.13. Theoretically, upgrading from 4.12 to 4.13 should be enough, but we reproduced it following the whole path.

Actual results:


    Nodes become not ready
$ oc get nodes
NAME                                                 STATUS                        ROLES    AGE     VERSION
ci-op-g94jvswm-cc71e-998q8-master-0                  Ready                         master   6h14m   v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-master-1                  Ready                         master   6h13m   v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-master-2                  NotReady,SchedulingDisabled   master   6h13m   v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-worker-centralus1-c7ngb   NotReady,SchedulingDisabled   worker   6h2m    v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-worker-centralus2-2ppf6   Ready                         worker   6h4m    v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-worker-centralus3-nqshj   Ready                         worker   6h6m    v1.25.16+306a47e

And in the NotReady nodes we can see the ordering cycle error mentioned in the description of this ticket.

Expected results:

No ordering cycle error should happen and the upgrade should be executed without problems.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4525

Bug OCPBUGS-29396: [4.15] nto: e2e: Adding labels for testing

View the Description View the linked PRs

Open in order to backport this work: https://github.com/openshift/cluster-node-tuning-operator/pull/936 to 4.15

This is needed since the MixedCPUs feature is part of 4.15 payload and we need to have the e2e test there as well in order to make sure the feature is in a good shape and none regression is happening.

The tests themselves would not affect the payload though.

https://github.com/openshift/cluster-node-tuning-operator/pull/949

Bug OCPBUGS-20016: Annotation and label modals do not update after opening

View the Description View the linked PRs

Description of problem:

Once the annotation or labels modals are opened, any changes to the underlying resources will not be reflected in the modal.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. Log into a cluter as kubeadmin via cli and console
2. Create a project named test
3. Vist the namespaces list page in the console (Administration > Namespaces)
4. Click "Edit annotations" via the kebab menu for namespace "foo"
5. From the cli, run the command:
  oc annotate namespace test foo=bar
6. Observe that the annotation modal did not update
7. Click cancel to close the annoatation modal
8. Open the annoation modal again and observe that the annoation added from the cli is now shown.
9. Repeat 5 - 8 using the labels modal and the command:
  oc label namespace test baz=qux

Actual results:

Annotation and labels modals do not update when the underlying resource labels or annotations change.

Expected results:

We should handle this case in some way

Additional info:

We can't necessarily just update the currently displayed data, as this could cause data loss or conflicts. 

The current behavior can also cause data loss in this situation:
- user opens modal
- a background update to annotations/modals occur
- user makes their own change and saves
- The annotations/labels from the background update are lost/squashed

Bug OCPBUGS-24073: Update 4.15 prometheus-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/258

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/258

Bug OCPBUGS-27814: install-config should not allow openshiftsdn

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27813~~. The following is the description of the original issue:
—
Description of problem:

 This bug is created to enable merging of the 4.15 backport https://github.com/openshift/installer/pull/7935 which is based on https://github.com/openshift/installer/pull/7720.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Specify OpenShiftSDN in installconfig.networking.networkType

Actual results:

Expected results:

Installer should validate and not allow OpenShiftSDN.

Additional info:

https://github.com/openshift/installer/pull/7935

Bug OCPBUGS-16080: File /var/log/kube-apiserver/termination.log for kube-apiserver has too permissive mode

View the Description View the linked PRs

Description of problem:

All files under path /var/log/kube-apiserver/ should have 600 permission. File /var/log/kube-apiserver/termination.log for kube-apiserver on some nodes have 644 permission.
$ for node in `oc get node -l node-role.kubernetes.io/control-plane= --no-headers|awk '{print $1}'`;do oc debug node/$node -- chroot /host ls -l /var/log/kube-apiserver/;done
Temporary namespace openshift-debug-gj262 is created for debugging node...
Starting pod/ip-x-us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
total 221752
-rw-------. 1 root root 209714718 Jul 12 05:47 audit-2023-07-12T05-47-16.625.log
-rw-------. 1 root root  13233368 Jul 12 05:54 audit.log
-rw-------. 1 root root    646569 Jul 12 04:19 termination.logRemoving debug pod ...
Temporary namespace openshift-debug-gj262 was removed.
Temporary namespace openshift-debug-cmdgm is created for debugging node...
Starting pod/ip-xus-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
total 49640
-rw-------. 1 root root 49826363 Jul 12 05:54 audit.log
-rw-------. 1 root root   826226 Jul 12 04:23 termination.logRemoving debug pod ...
Temporary namespace openshift-debug-cmdgm was removed.
Temporary namespace openshift-debug-fdqtv is created for debugging node...
Starting pod/ip-xus-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
total 270276
-rw-------. 1 root root 209714252 Jul 12 05:34 audit-2023-07-12T05-34-34.205.log
-rw-------. 1 root root  51250736 Jul 12 05:54 audit.log
-rw-r--r--. 1 root root         4 Jul 12 04:15 termination.logRemoving debug pod ...
Temporary namespace openshift-debug-fdqtv was removed.
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-07-11-092038   True        False         91m     Cluster version is 4.14.0-0.nightly-2023-07-11-092038

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-11-092038

How reproducible:

Always

Steps to Reproduce:

1.$ for node in `oc get node -l node-role.kubernetes.io/control-plane= --no-headers|awk '{print $1}'`;do oc debug node/$node -- chroot /host ls -l /var/log/kube-apiserver/;done 2.
3.

Actual results:

File /var/log/kube-apiserver/termination.log for kube-apiserver on some nodes has 644 permission.

Expected results:

All files under path /var/log/kube-apiserver/ should have 600 permission.

Additional info:

https://github.com/openshift/kubernetes/pull/1638

Bug OCPBUGS-23347: VSphereConnectionForm link uncorrect resources

View the Description View the linked PRs

Description of problem:

Looking at the vSphere connection configuration via UI we can see that the value for VCenter cluster is populated with the "networks" value instead of the "computeCluster" one

Additional info:

- https://github.com/openshift/console/blob/fdcd7738612cd5685c100b15d348134c96b2fa39[...]ackages/vsphere-plugin/src/components/VSphereConnectionForm.tsx
- https://github.com/openshift/console/blob/fdcd7738612cd5685c100b15d348134c96b2fa39/frontend/packages/vsphere-plugin/src/hooks/use-connection-form.ts#L69

From the form query it seems it is linked to the Network:
======================================
vCenterCluster = domain?.topology?.networks?.[0] || '';
======================================

Our understanding it that it should pickup the cluster name:
======================================
topology.computeCluster
======================================

https://github.com/openshift/console/pull/13209

Bug OCPBUGS-24115: Update 4.15 ose-cluster-update-keys-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/52

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/52

Bug OCPBUGS-20338: New Feature in 4.14 - Node Dashboard in OCP

View the Description View the linked PRs

Description of problem:

In 4.14 RHOCP version, New feature that is Node dashboard is not showing expected metric/dashboard data.

[hjaiswal@hjaiswal 4_14]$ oc get nodes
NAME                                             STATUS     ROLES                  AGE     VERSION
ip-10-0-26-232.ap-southeast-1.compute.internal   Ready      control-plane,master   6h12m   v1.27.6+1648878
ip-10-0-42-100.ap-southeast-1.compute.internal   Ready      control-plane,master   6h12m   v1.27.6+1648878
ip-10-0-46-197.ap-southeast-1.compute.internal   Ready      worker                 6h3m    v1.27.6+1648878
ip-10-0-66-225.ap-southeast-1.compute.internal   NotReady   worker                 6h3m    v1.27.6+1648878
ip-10-0-8-20.ap-southeast-1.compute.internal     Ready      worker                 6h5m    v1.27.6+1648878
ip-10-0-80-84.ap-southeast-1.compute.internal    Ready      control-plane,master   6h12m   v1.27.6+1648878

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1. Check whether all the nodes are in ready state. (cluster version 4.14)
2. ssh/debug to any worker node.
3. Stop the kubelet service. 
4. check whether node went into notready state.
5. Open openshift console and goto observe--> dashboard ---> then select new feature that is "Node cluster".
6. Its showing "0" nodes in notready state but it should display "1" node in notready state.

Actual results:

In Node cluster there is no count for not ready node.

Expected results:

In Node cluster the notready node should be 1

Additional info:

Tested in AWS IPI cluster

https://github.com/openshift/machine-config-operator/pull/3964

Bug OCPBUGS-26068: IP and CIDR CEL validation for OpenShift 4.15

View the Description View the linked PRs

Description of problem:

We would like to include the CEL IP and CIDR validations in 4.16. They have been mergeded upstream and can be backported into OpenShift to improve out validation downstream.

Upstream PR: https://github.com/kubernetes/kubernetes/pull/121912

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/kubernetes/pull/1843

Bug OCPBUGS-31265: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3802

Bug OCPBUGS-41819: [4.15] Install plan is unable to move forward and is stuck in Pending state when the amount of CRs is too high.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41677~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-41549~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-35358. The following is the description of the original issue:
—
I'm working with the Gitops operator (1.7) and when there is a high amount of CR (38.000 applications objects in this case) the related install plan get stuck with the following error:

- lastTransitionTime: "2024-06-11T14:28:40Z"
    lastUpdateTime: "2024-06-11T14:29:42Z"
    message: 'error validating existing CRs against new CRD''s schema for "applications.argoproj.io":
      error listing resources in GroupVersionResource schema.GroupVersionResource{Group:"argoproj.io",
      Version:"v1alpha1", Resource:"applications"}: the server was unable to return
      a response in the time allotted, but may still be processing the request'

Even waiting for a long time the operator is unable to move forward not removing or reinstalling its components.

Over a lab, the issue was not present until we started to add load to the cluster (applications.argoproj.io) and when we hit 26.000 applications we were not able to upgrade or reinstall the operator anymore.

https://github.com/openshift/operator-framework-olm/pull/864

Bug OCPBUGS-23255: Baremetal clusters installed with the agent installer are not skipping the first boot if they use FIPS

View the Description View the linked PRs

Description of problem:

When a cluster is using FIPS in an installation with the agent installer, the reboot in the machine-config-daemon-firstboot.service is not skipped.

Since https://issues.redhat.com/browse/MCO-706 the agent installer should be able to skip the firstboot service reboot.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. We cause these prow jobs to install a cluster

without fips (HA): periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-baremetal-pxe-ha-agent-ipv4-static-connected-f14

with fips (SNO):  periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-baremetal-sno-agent-ipv4-static-connected-f7


We can find the firstboot service's logs in the must-gather.tar file.

2.
3.

Actual results:

In the machine-config-daemon-firstboot.service logs we can see that the reboot is not skipped when the installation is using fips=true.

You can find the logs in the "additional info" section below.

Expected results:

The firstboot service should skip the reboot in the installation.

Additional info:

This is the machine-config-daemon-firstboot logs for a baremetal HA cluster with fips and installed using agent installer: (FIRST REBOOT NOT SKIPPED)


Nov 14 11:26:59 worker-00 systemd[1]: Starting Machine Config Daemon Firstboot...
Nov 14 11:26:59 worker-00 sh[4182]: sed: can't read /etc/yum.repos.d/*.repo: No such file or directory
Nov 14 11:26:59 worker-00 podman[4183]: W1114 11:26:59.393738       1 daemon.go:1673] Failed to persist NIC names: open /rootfs/etc/systemd/network: no such file or directory
Nov 14 11:26:59 worker-00 podman[4296]: I1114 11:26:59.866300    4348 daemon.go:457] container is rhel8, target is rhel9
Nov 14 11:26:59 worker-00 podman[4296]: I1114 11:26:59.896550    4348 daemon.go:525] Invoking re-exec /run/bin/machine-config-daemon
Nov 14 11:26:59 worker-00 podman[4296]: I1114 11:26:59.955660    4348 update.go:2120] Running: systemctl daemon-reload
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.537582    4348 rpm-ostree.go:88] Enabled workaround for bug 2111817
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.537944    4348 rpm-ostree.go:263] Linking ostree authfile to /etc/mco/internal-registry-pull-secret.json
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.833062    4348 daemon.go:270] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a9bdfdf95023b7aebbbc9d5d335c973832fceb795ed943f365fefea7db646b66 (415.92.202311130854-0) 67df227c04e9306ddcb78331654ecf0ebb2cb1433498f9c12e832c7d5e74c1d9
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.833303    4348 rpm-ostree.go:308] Running captured: rpm-ostree --version
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.893156    4348 daemon.go:1076] rpm-ostree has container feature
Nov 14 11:27:00 worker-00 podman[4296]: I1114 11:27:00.893582    4348 rpm-ostree.go:308] Running captured: rpm-ostree kargs
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.008588    4348 update.go:2157] Adding SIGTERM protection
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.008821    4348 update.go:599] Checking Reconcilable for config mco-empty-mc to rendered-worker-ef30fce69107b4fc38dc1020038ebd6a
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.009121    4348 update.go:1064] FIPS is configured and enabled
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.009345    4348 update.go:2135] Starting update from mco-empty-mc to rendered-worker-ef30fce69107b4fc38dc1020038ebd6a: &{osUpdate:true kargs:true fips:false passwd:false files:false units:false kernelType:false extensions:false}
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055403    4348 update.go:1349] Updating files
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055415    4348 update.go:1412] Deleting stale data
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055419    4348 update.go:1818] updating the permission of the kubeconfig to: 0o600
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055484    4348 update.go:1784] Checking if absent users need to be disconfigured
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055610    4348 update.go:2210] Already in desired image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a9bdfdf95023b7aebbbc9d5d335c973832fceb795ed943f365fefea7db646b66
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.055616    4348 update.go:2120] Running: rpm-ostree cleanup -p
Nov 14 11:27:01 worker-00 podman[4296]: Deployments unchanged.
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.224788    4348 update.go:2135] Running rpm-ostree [kargs --append=systemd.unified_cgroup_hierarchy=1 --append=cgroup_no_v1="all" --append=psi=1]
Nov 14 11:27:01 worker-00 podman[4296]: I1114 11:27:01.271647    4348 update.go:2120] Running: rpm-ostree kargs --append=systemd.unified_cgroup_hierarchy=1 --append=cgroup_no_v1="all" --append=psi=1
Nov 14 11:27:03 worker-00 podman[4296]: Staging deployment...done
Nov 14 11:27:05 worker-00 podman[4296]: Changes queued for next boot. Run "systemctl reboot" to start a reboot
Nov 14 11:27:05 worker-00 podman[4296]: I1114 11:27:05.081854    4348 update.go:2135] Rebooting node
Nov 14 11:27:05 worker-00 podman[4296]: I1114 11:27:05.127794    4348 update.go:2165] Removing SIGTERM protection
Nov 14 11:27:05 worker-00 podman[4296]: I1114 11:27:05.127853    4348 update.go:2135] initiating reboot: Completing firstboot provisioning to rendered-worker-ef30fce69107b4fc38dc1020038ebd6a
Nov 14 11:27:05 worker-00 podman[4296]: I1114 11:27:05.235062    4348 update.go:2135] reboot successful
Nov 14 11:27:05 worker-00 systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=killed, status=15/TERM
Nov 14 11:27:05 worker-00 systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'signal'.
Nov 14 11:27:05 worker-00 systemd[1]: Stopped Machine Config Daemon Firstboot.
-- Boot 2f510f83bdb047bb921fc429d67b8e6a --




This is the logs for a baremetal HA cluster without fips and installed using agent installer:  (FIST REBOOT SKIPPED)


Nov 08 14:27:30 worker-00 systemd[1]: Starting Machine Config Daemon Firstboot...
Nov 08 14:27:30 worker-00 sh[4171]: sed: can't read /etc/yum.repos.d/*.repo: No such file or directory
Nov 08 14:27:30 worker-00 podman[4172]: W1108 14:27:30.970986       1 daemon.go:1673] Failed to persist NIC names: open /rootfs/etc/systemd/network: no such file or directory
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.172975    4320 daemon.go:457] container is rhel8, target is rhel9
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.202238    4320 daemon.go:525] Invoking re-exec /run/bin/machine-config-daemon
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.237492    4320 update.go:2120] Running: systemctl daemon-reload
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.436217    4320 rpm-ostree.go:88] Enabled workaround for bug 2111817
Nov 08 14:27:31 worker-00 podman[4273]: E1108 14:27:31.436346    4320 rpm-ostree.go:285] Merged secret file could not be validated; defaulting to cluster pull secret <nil>
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.436375    4320 rpm-ostree.go:263] Linking ostree authfile to /var/lib/kubelet/config.json
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.555415    4320 daemon.go:270] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e03c9248f78a107efb8b12430d46304e8d93981d23fd932e159d518ed675bc92 (415.92.202311061558-0) b8e1dca18619a2e497edf5346d5018615a226da380989ef6720a1a8cdc27adeb
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.555920    4320 rpm-ostree.go:308] Running captured: rpm-ostree --version
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.571985    4320 daemon.go:1076] rpm-ostree has container feature
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.572484    4320 rpm-ostree.go:308] Running captured: rpm-ostree kargs
Nov 08 14:27:31 worker-00 podman[4273]: I1108 14:27:31.600313    4320 update.go:186] No changes from mco-empty-mc to rendered-worker-30da1eef7a5d361fc395f2726c8210d5
Nov 08 14:27:31 worker-00 systemd[1]: Finished Machine Config Daemon Firstboot.

https://github.com/openshift/machine-config-operator/pull/4033

Bug OCPBUGS-25700: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1994

Bug OCPBUGS-32391: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13771

Bug OCPBUGS-31464: Autoscaler should scale from zero when taints do not have a "value" field

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31421~~. The following is the description of the original issue:
—
Description of problem:

When scaling from zero replicas, the cluster autoscaler can panic if there are taints on the machineset with no "value" field defined.

Version-Release number of selected component (if applicable):

4.16/master

How reproducible:

always

Steps to Reproduce:

    1. create a machineset with a taint that has no value field and 0 replicas
    2. enable the cluster autoscaler
    3. force a workload to scale the tainted machineset

Actual results:

a panic like this is observed

I0325 15:36:38.314276       1 clusterapi_provider.go:68] discovered node group: MachineSet/openshift-machine-api/k8hmbsmz-c2483-9dnddr4sjc (min: 0, max: 2, replicas: 0)
panic: interface conversion: interface {} is nil, not string

goroutine 79 [running]:
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredToTaint(...)
	/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:246
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredScalableResource.Taints({0xc000103d40?, 0xc000121360?, 0xc002386f98?, 0x2?})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:214 +0x8a5
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.(*nodegroup).TemplateNodeInfo(0xc002675930)
	/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go:266 +0x2ea
k8s.io/autoscaler/cluster-autoscaler/core/utils.GetNodeInfoFromTemplate({0x276b230, 0xc002675930}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60?, 0xc0023ffe90?})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/core/utils/utils.go:41 +0x9d
k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider.(*MixedTemplateNodeInfoProvider).Process(0xc00084f848, 0xc0023f7680, {0xc001dcdb00, 0x3, 0x0?}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60, 0xc0023ffe90}, ...)
	/go/src/k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go:155 +0x599
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000617550, {0x4?, 0x0?, 0x3a56f60?})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:352 +0xcaa
main.run(0x0?, {0x2761b48, 0xc0004c04e0})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x2cd
main.main.func2({0x0?, 0x0?})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:617 +0x25
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
	/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x105

Expected results:

expect the machineset to scale up

Additional info:
i think the e2e test that exercises this is only running on periodic jobs and as such we missed this error in ~~OCPBUGS-27509~~ .

this search shows some failed results

https://github.com/openshift/kubernetes-autoscaler/pull/293

Bug OCPBUGS-18248: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/975

Bug OCPBUGS-19125: Update 4.15 ose-cluster-authentication-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-authentication-operator/pull/634

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-authentication-operator/pull/634

Bug OCPBUGS-23463: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-scheduler-operator/pull/512

Bug OCPBUGS-23775: After PatternFly5 update: Form error is missing when import a container image

View the Description View the linked PRs

Issue 49 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Form error is missing when import a container image while the import from Git form shows an error correctly.

Screenshot: https://drive.google.com/file/d/1aUfUefnF3IxVzNjn7D3Q05pK9z4prVtN/view?usp=drive_link

https://github.com/openshift/console/pull/13365

Bug OCPBUGS-23761: After PatternFly5 update: Quick search input field is broken

View the Description View the linked PRs

Issue 22 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Screenshot: https://drive.google.com/file/d/1JNaMRpOGEcGoyPg7xuoy5hHzyI1jCE3s/view?usp=sharing

https://github.com/openshift/console/pull/13398

Bug OCPBUGS-25984: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13475

Bug OCPBUGS-29842: Switch to service to get the data from the Tekton results summary API

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13627

Bug OCPBUGS-33547: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/343

Bug OCPBUGS-17157: OLMv0 Is Inefficient In Memory Use

View the Description View the linked PRs

Description of problem:

OLMv0 over-uses listers and consumes too much memory. Also, $GOMEMLIMIT is not used and the runtime overcommits on RSS. See the following doc for more detail:

https://docs.google.com/document/d/11J7lv1HtEq_c3l6fLTWfsom8v1-7guuG4DziNQDU6cY/edit#heading=h.ttj9tfltxgzt

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/operator-framework-olm/pull/551

Bug OCPBUGS-18250: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus-alertmanager/pull/74

Bug OCPBUGS-19231: Update 4.15 ose-cluster-openshift-apiserver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/548

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/548

Bug OCPBUGS-22078: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-nutanix/pull/40

Task OPRUN-3075: Downstream Sync for operator-controller v0.7.0

View the Description View the linked PRs

Bring the downstream operator-controller repo up-to-date with the v0.7.0 upstream release.

https://github.com/openshift/operator-framework-operator-controller/pull/31

Bug OCPBUGS-19179: Update 4.15 ose-machine-config-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-config-operator/pull/3919

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-config-operator/pull/3919

Bug OCPBUGS-2889: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/924

Bug OCPBUGS-29001: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2052

Bug OCPBUGS-36106: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-powervs/pull/74

Bug OCPBUGS-16801: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc-mirror/pull/774

Bug OCPBUGS-19014: private-router network policy breaks ignition access for 4.13.z OCP clusters

View the Description View the linked PRs

Description of problem:

In 4.13.z releases, the request-serving label is not present in the ignition-server-proxy deployment. The network policy in place prevents egress from the private router to pods that do not have the label, resulting in the ignition-server endpoint not being available from the outside.

Version-Release number of selected component (if applicable):

4.13.12 OCP, 4.14 HO

How reproducible:

Always

Steps to Reproduce:

1. Install latest HO
2. Create a HostedCluster with version 4.13.12
3. Wait for nodes to join

Actual results:

Nodes never join

Expected results:

Nodes join

Additional info:

Nodes are not joining because of the blocked egress from the router to the ignition-server-proxy

https://github.com/openshift/hypershift/pull/3012

Bug OCPBUGS-8079: one extra $ before {{ $labels.reason }} for description of ClusterOperatorDown alert

View the Description View the linked PRs

Description of problem:

description for ClusterOperatorDown, there is one $ before {{ $labels.reason }}

$ oc -n openshift-cluster-version get prometheusrules cluster-version-operator -oyaml
....
    - alert: ClusterOperatorDown
      annotations:
        description: The {{ $labels.name }} operator may be down or disabled because
          ${{ $labels.reason }}, and the components it manages may be unavailable
          or degraded.  Cluster upgrades may not complete. For more information refer
          to 'oc get -o yaml clusteroperator {{ $labels.name }}'{{ with $console_url
          := "console_url" | query }}{{ if ne (len (label "url" (first $console_url
          ) ) ) 0}} or {{ label "url" (first $console_url ) }}/settings/cluster/{{
          end }}{{ end }}.
        summary: Cluster operator has not been available for 10 minutes.
      expr: |
        max by (namespace, name, reason) (cluster_operator_up{job="cluster-version-operator"} == 0)
      for: 10m
      labels:
        severity: critical

the description is like below if ClusterOperatorDown alert is fired

The insights operator may be down or disabled because $UploadFailed,and the components it manages....

if it's intended, we could close this bug

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-27-101545

How reproducible:

always

https://github.com/openshift/cluster-version-operator/pull/992

Bug MGMT-16235: Agent controller does not watch secrets

View the Description View the linked PRs

Description of the problem:

The Agent CR can reference a Secret containing a token for pulling ignition. This is generally used by HyperShift. The agent controller takes the token from the referenced Secret and applies it to the host in the DB. However, if the token is rotated, the agent controller doesn't notice this, and the agent continues to pull ignition with the old token, which obviously fails. The agent controller must watch these Secrets so that it will reconcile when the Secret is updated.

How reproducible:

100%

Steps to reproduce:

1. Create a hosted cluster and another host to be added

2. Wait for the token to be rotated in the Secret

3. Notice that the agent is still pulling with the old token

Actual results:

The agent is still pulling with the old token

Expected results:

The agent is pulls with the old token

https://github.com/openshift/assisted-service/pull/5736

Bug OCPBUGS-19240: Update 4.15 ose-cluster-capi-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/129

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/129

Bug OCPBUGS-30117: [release-4.15] CAPI manifests missing CustomNoUpgrade annotation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29519~~. The following is the description of the original issue:
—
Description of problem:

CAPI manifests have the TechPreviewNoUpgrade annotation but are missing the CustomNoUpgrade annotation

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-27896: Add flags to hide Pipeline list pages and details pages from static plugin

View the Description View the linked PRs

Description of problem:

Add flags to hide Pipeline list pages and details pages from static plugin. So that list and details pages from the Pipeline dynamic plugin is shown in the console

https://github.com/openshift/console/pull/13536

Bug OCPBUGS-17589: CBO crashes if internal IP is nil

View the Description View the linked PRs

This bug has been seen during the analysis of another issue

If the Server Internal IP is not defined, CBO crashes as nil is not handled in https://github.com/openshift/cluster-baremetal-operator/blob/release-4.12/provisioning/utils.go#L99

I0809 17:33:09.683265       1 provisioning_controller.go:540] No Machines with cluster-api-machine-role=master found, set provisioningMacAddresses if the metal3 pod fails to start

I0809 17:33:09.690304       1 clusteroperator.go:217] "new CO status" reason=SyncingResources processMessage="Applying metal3 resources" message=""

I0809 17:33:10.488862       1 recorder_logging.go:37] &Event{ObjectMeta:{dummy.1779c769624884f4  dummy    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:ValidatingWebhookConfigurationUpdated,Message:Updated ValidatingWebhookConfiguration.admissionregistration.k8s.io/baremetal-operator-validating-webhook-configuration because it changed,Source:EventSource{Component:,Host:,},FirstTimestamp:2023-08-09 17:33:10.488745204 +0000 UTC m=+5.906952556,LastTimestamp:2023-08-09 17:33:10.488745204 +0000 UTC m=+5.906952556,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}

panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1768fd4]

 

goroutine 574 [running]:

github.com/openshift/cluster-baremetal-operator/provisioning.getServerInternalIP({0x1e774d0?, 0xc0001e8fd0?})

        /go/src/github.com/openshift/cluster-baremetal-operator/provisioning/utils.go:75 +0x154

github.com/openshift/cluster-baremetal-operator/provisioning.GetIronicIP({0x1ea2378?, 0xc000856840?}, {0x1bc1f91, 0x15}, 0xc0004c4398, {0x1e774d0, 0xc0001e8fd0})

        /go/src/github.com/openshift/cluster-baremetal-operator/provisioning/utils.go:98 +0xfb

https://github.com/openshift/cluster-baremetal-operator/pull/359

Bug OCPBUGS-23778: After PatternFly5 update: Details page uses a bold font for the action dropdown

View the Description View the linked PRs

Issue 51 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Detail page Action dropdown uses an bolder font now. That is not used on other actions buttons.

Investigation findings: PF5 sets button elements to font-family: inherit and since this button is inside an <h1> it gets RedHatDisplay instead of RedHatText font-family. A quick fix would be to add font-family: var(-~~pf-v5-globalFontFamily~~-text) to .co-actions

Screenshots:

https://github.com/openshift/console/pull/13375

Bug OCPBUGS-25309: [4.15] don't find "scrape.timestamp-tolerance" setting in prometheus

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2197

Bug OCPBUGS-25824: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/253

Bug OCPBUGS-30577: Power VS: Removal of unmaintained package (bluemix-go)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30200~~. The following is the description of the original issue:
—
Description of problem:

    Package that we use for Power VS has recently been revealed to be unmaintained. We should remove it in favor of maintained solutions.

Version-Release number of selected component (if applicable):

    4.13.0 onward

How reproducible:

It's always used

Steps to Reproduce:

    1. Deploy with IPI on Power VS
    2. Use bluemix-go
    3.

Actual results:

    bluemix-go is used

Expected results:

bluemix-go should be avoided

Additional info:

https://github.com/openshift/installer/pull/8118

Bug OCPBUGS-32505: Topology links between VMs and non VMs (such as Pod or Deployment) don't show

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13114~~. The following is the description of the original issue:
—
Description of problem:

Topology links between VMs and non VMs (such as Pod or Deployment) don't show

Version-Release number of selected component (if applicable):

4.12.14

How reproducible:

every time via UI or annoation

Steps to Reproduce:

1. Create VM
2. Create Pod/Deployment
3. Add annoation or link via UI

Actual results:

annotation is updated only

Expected results:

topology shows linkage

Additional info:

 app.openshift.io/connects-to: >-
      [{"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","name":"es-master00"},{"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","name":"es-master01"},{"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","name":"es-master02"}]

https://github.com/openshift/console/pull/13782

Bug OCPBUGS-36225: Fix vSphere installer to not provide double slash in resourcepool path

View the Description View the linked PRs

Description of problem:

The OpenShift installer is generating a double "/" in the resourcepool definition for vSphere.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. Perform IPI vSphere install using interactive installer
2. Check the infrastructure / failure domain configurations and verify double /

Actual results:

ResourcePool has double slashes

Expected results:

ResourcePool should have no double slashes

Additional info:

Currently when attempting installs, our infrastructure "cluster" CR is containing resource with a path such as:

"Workspace.ResourcePool: /DEVQEdatacenter/host/DEVQEcluster//Resources

This is causing issue with CPMSO resulting in rollout of new masters after initial control plane is established

I0219 07:46:32.076950       1 updates.go:478] "msg"="Machine requires an update" "controller"="controlplanemachineset" "diff"=["Workspace.ResourcePool: /DEVQEdatacenter/host/DEVQEcluster//Resources != /DEVQEdatacenter/host/DEVQEcluster/Resources"] "index"=2 "name"="sgao-devqe-vblw8-master-2" "namespace"="openshift-machine-api" "reconcileID"="5f47f5a5-0a90-4168-bfcc-dae0fad9b953" "updateStrategy"="RollingUpdate"

https://github.com/openshift/installer/pull/8664

Bug OCPBUGS-18762: Not all control plane components are returned to release image after controlPlaneRelease field is removed in the HostedCluster CR

View the Description View the linked PRs

Description of problem:

After control plane release upgrade, and controlPlaneRelease field is removed in the HostedCluster CR, only capi-provider, cluster-api and control-plane-operator are restarted and run release image, other components are not restarted and still run control plane release image

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. create a cluster in 4.14-2023-09-06-180503
2. control plane release upgrade to 4.14-2023-09-07-180503
3. remove controlPlaneRelease in the HostedCluster CR
4. check all pods/containers images in the control plane namespace

Actual results:

only capi-provider, cluster-api and control-plane-operator are restarted and run release image 4.14-2023-09-06-180503, other components are not restarted and still run control plane release image 4.14-2023-09-07-180503.

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters NAME       VERSION                         KUBECONFIG                  PROGRESS    AVAILABLE   PROGRESSING   MESSAGE jie-test   4.14.0-0.ci-2023-09-06-180503   jie-test-admin-kubeconfig   Completed   True        False         The hosted control plane is available 

jiezhao-mac:hypershift jiezhao$
- lastTransitionTime: "2023-09-08T01:54:54Z"       
message: '[cluster-api deployment has 1 unavailable replicas, control-plane-operator         deployment has 1 unavailable replicas]'       
observedGeneration: 5       
reason: UnavailableReplicas       
status: "True"       
type: Degraded

Expected results:

The control plane should return to release image 4.14-2023-09-06-180503 with all components in a healthy state.

Additional info:

https://github.com/openshift/hypershift/pull/3004

Bug OCPBUGS-19269: Update 4.15 ose-machine-api-provider-azure image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-azure/pull/75

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-azure/pull/75

Bug OCPBUGS-19970: Home-Projects-Default-workloads-AddPage: upload JAR file's i18n misses

View the Description View the linked PRs

Description of problem:

Change UI to non en_US locale
Navigate to Home - Projects - Default - Workloads - Add Page
Click on 'Upload JAR file'
"Browse" and "Clear" are in English
Please see reference screenshot

Version-Release number of selected component (if applicable):

4.14.0-rc.2

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Content is in English

Expected results:

Content should be localized

Additional info:

Reference screenshot
https://drive.google.com/file/d/1hgP_Rnkn4J4_gVC-T8pUUvAEiAWbfrJq/view?usp=drive_link

Bug OCPBUGS-23543: Deployment option is missing in 'Deploy Image'

View the Description View the linked PRs

Description of problem:

The Deployment option is missing in 'Click on the names to access advanced options' list in Deploy image page, user cannot set up ENV related function anymore

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-20-205649

How reproducible:

Always

Steps to Reproduce:

1. Login OCP, and change to Developer perspective, navigate to Deploy Image page (+Add -> Container image)
   /deploy-image/ns/default
2. Scroll down and check if 'deployment' is list in the advance list
3.

Actual results:

deployment is missing in the advance list, user is not able to update the Environment variables anymore

Expected results:

deployment exist

Additional info:

https://drive.google.com/file/d/1ixQ33DdGzZTAWgzrpp57OqHGFS4v1_3T/view?usp=drive_link
https://drive.google.com/file/d/1dpgFtsr45IovSriwu0RPd0kq0DejRSAm/view?usp=drive_link

https://github.com/openshift/console/pull/13354

Bug OCPBUGS-27781: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-agent-image/pull/105

Bug OCPBUGS-19205: Update 4.15 ose-cluster-cloud-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/278

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/278

Bug OCPBUGS-19377: Upgrade from OpenShift 4.13 to 4.14 Leaves Network Operator Degraded

View the Description View the linked PRs

Description of problem:

After upgrading from OpenShift 4.13 to 4.14 with Kuryr network type, the network operator shows as Degraded and the cluster version reports that it's unable to apply the 4.14 update. The issue seems to be related to mtu settings, as indicated by the message: "Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]."

Version-Release number of selected component (if applicable):

Upgrading from 4.13 to 4.14
4.14.0-0.nightly-2023-09-15-233408
Kuryr network type
RHOS-17.1-RHEL-9-20230907.n.1

How reproducible:

Consistently reproducible on attempting to upgrade from 4.13 to 4.14.

Steps to Reproduce:

1.Install OpenShift version 4.13 on OpenStack. 
2.Initiate an upgrade to OpenShift version 4.14.

Actual results:

The network operator shows as Degraded with the message:

network                                    4.13.13                              True        False         True       13h     Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.
 
Additionally, "oc get clusterversions" shows:

Unable to apply 4.14.0-0.nightly-2023-09-15-233408: wait has exceeded 40 minutes for these operators: network

Expected results:

The upgrade should complete successfully without any operator being degraded.

Additional info:

Some components remain at version 4.13.13 despite the upgrade attempt. Specifically, the dns, machine-config, and network operators are still at version 4.13.13. :

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE                                                                                                         
authentication                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
baremetal                                  4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cloud-controller-manager                   4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cloud-credential                           4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cluster-autoscaler                         4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
config-operator                            4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
console                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
control-plane-machine-set                  4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
csi-snapshot-controller                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
dns                                        4.13.13                              True        False         False      13h                                                                                                                     
etcd                                       4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
image-registry                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
ingress                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
insights                                   4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-apiserver                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-controller-manager                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-scheduler                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-storage-version-migrator              4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-api                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-approver                           4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-config                             4.13.13                              True        False         False      13h                                                                                                                     
marketplace                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
monitoring                                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
network                                    4.13.13                              True        False         True       13h     Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.
node-tuning                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
openshift-apiserver                        4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
openshift-controller-manager               4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
openshift-samples                          4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
operator-lifecycle-manager                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
operator-lifecycle-manager-catalog         4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
operator-lifecycle-manager-packageserver   4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
service-ca                                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
storage                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h

https://github.com/openshift/cluster-network-operator/pull/2007

Bug OCPBUGS-23314: CLI outputs stack trace when creating a new cluster

View the Description View the linked PRs

Description of problem:

A stack trace is output when creating a hosted cluster via the hypershift CLI

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Run hypershift create cluster aws ... to create a hosted cluster

Actual results:

The output will contain:
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
	>  goroutine 1 [running]:
	>  runtime/debug.Stack()
	>  	/opt/homebrew/Cellar/go/1.21.4/libexec/src/runtime/debug/stack.go:24 +0x64
	>  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	>  	/Users/xinjiang/Codes/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:60 +0xa0
	>  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0x14000845480, {0x10321d605, 0x14})
	>  	/Users/xinjiang/Codes/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:147 +0x34
	>  github.com/go-logr/logr.Logger.WithName({{0x10490a710, 0x14000845480}, 0x0}, {0x10321d605, 0x14})
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/go-logr/logr/logr.go:336 +0x5c
	>  sigs.k8s.io/controller-runtime/pkg/client.newClient(0x1400097a900, {0x0, 0x140004a42a0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
	>  	/Users/xinjiang/Codes/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:122 +0xf8
	>  sigs.k8s.io/controller-runtime/pkg/client.New(0x14000ef98c0, {0x0, 0x140004a42a0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
	>  	/Users/xinjiang/Codes/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:103 +0x78
	>  github.com/openshift/hypershift/cmd/util.GetClient()
	>  	/Users/xinjiang/Codes/hypershift/cmd/util/client.go:50 +0x4f4
	>  github.com/openshift/hypershift/cmd/cluster/core.apply({0x104906d88, 0x140008dfb80}, {{0x10490acf8, 0x140009dd410}, 0x0}, 0x14000ef9200, 0x0, 0x0)
	>  	/Users/xinjiang/Codes/hypershift/cmd/cluster/core/create.go:324 +0xc8
	>  github.com/openshift/hypershift/cmd/cluster/core.CreateCluster({0x104906d88, 0x140008dfb80}, 0x1400056f600, 0x1048c4360)
	>  	/Users/xinjiang/Codes/hypershift/cmd/cluster/core/create.go:461 +0x264
	>  github.com/openshift/hypershift/cmd/cluster/aws.CreateCluster({0x104906d88, 0x140008dfb80}, 0x1400056f600)
	>  	/Users/xinjiang/Codes/hypershift/cmd/cluster/aws/create.go:79 +0x78
	>  github.com/openshift/hypershift/cmd/cluster/aws.NewCreateCommand.func1(0x14000d1ac00, {0x14000a6cf70, 0x0, 0xd})
	>  	/Users/xinjiang/Codes/hypershift/cmd/cluster/aws/create.go:65 +0x148
	>  github.com/spf13/cobra.(*Command).execute(0x14000d1ac00, {0x1400014c040, 0xd, 0xe})
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/spf13/cobra/command.go:940 +0x90c
	>  github.com/spf13/cobra.(*Command).ExecuteC(0x14000c91800)
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/spf13/cobra/command.go:1068 +0x770
	>  github.com/spf13/cobra.(*Command).Execute(0x14000c91800)
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/spf13/cobra/command.go:992 +0x30
	>  github.com/spf13/cobra.(*Command).ExecuteContext(0x14000c91800, {0x104906d88, 0x140008dfb80})
	>  	/Users/xinjiang/Codes/hypershift/vendor/github.com/spf13/cobra/command.go:985 +0x70
	>  main.main()
	>  	/Users/xinjiang/Codes/hypershift/main.go:70 +0x46c
2023-11-15T18:24:26+08:00	INFO	Applied Kube resource	{"kind": "Namespace", "namespace": "", "name": "clusters"}

Expected results:

No stack trace is output

Additional info:

The function is not affected, the cluster still creates.

https://github.com/openshift/hypershift/pull/3199

Bug OCPBUGS-38798: [4.15] redfish-virtualmedia fails on xFusion nodes

View the Description View the linked PRs

Description of problem:

 When trying to onboard a xFusion baremetal node using redfish-virtual media (no provisioning network), it fails after the node registration with this error:

Normal InspectionError 60s metal3-baremetal-controller Failed to inspect hardware. Reason: unable to start inspection: The attribute Links/ManagedBy is missing from the resource /redfish/v1/Systems/1

Version-Release number of selected component (if applicable):

    4.14.18

How reproducible:

    Just add a xFusion baremetal node, specifing in the manifest

Spec: 
  Automated Cleaning Mode: metadata 
  Bmc: 
    Address: redfish-virtualmedia://w.z.x.y/redfish/v1/Systems/1 
    Credentials Name: hu28-tovb-bmc-secret 
    Disable Certificate Verification: true 
  Boot MAC Address: <MAC> 
  Boot Mode: UEFI Online: false 
  Preprovisioning Network Data Name: openstack-hu28-tovb-network-config-secret

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Inspection fails with afore mentioned error, no preprovisioning image is mounted on the hoste virtualmedia

Expected results:

    VirtualMedia get mounted and inspection starts.

Additional info:

https://github.com/openshift/ironic-image/pull/565

Bug OCPBUGS-24069: Update 4.15 ose-multus-admission-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/multus-admission-controller/pull/77

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/multus-admission-controller/pull/77

Bug OCPBUGS-44035: [IBMCloud] install only checks first set of subnets (no pagination support)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43476~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43329~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-36236. The following is the description of the original issue:
—
Description of problem:

    The installer for IBM Cloud currently only checks the first group of subnets (50) when searching for Subnet details by name. It should provide pagination support to search all subnets.

Version-Release number of selected component (if applicable):

    4.17

How reproducible:

    100%, dependent on order of subnets returned by IBM Cloud API's however

Steps to Reproduce:

    1. Create 50+ IBM Cloud VPC Subnets
    2. Use Bring Your Own Network (BYON) configuration (with Subnet names for CP and/or Compute) in install-config.yaml
    3. Attempt to create manifests (openshift-install create manifests)

Actual results:

    ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: [platform.ibmcloud.controlPlaneSubnets: Not found: "eu-de-subnet-paginate-1-cp-eu-de-1", platform.ibmcloud.controlPlaneSubnets: Not found: "eu-de-subnet-paginate-1-cp-eu-de-2", platform.ibmcloud.controlPlaneSubnets: Not found: "eu-de-subnet-paginate-1-cp-eu-de-3", platform.ibmcloud.controlPlaneSubnets: Invalid value: []string{"eu-de-subnet-paginate-1-cp-eu-de-1", "eu-de-subnet-paginate-1-cp-eu-de-2", "eu-de-subnet-paginate-1-cp-eu-de-3"}: number of zones (0) covered by controlPlaneSubnets does not match number of provided or default zones (3) for control plane in eu-de, platform.ibmcloud.computeSubnets: Not found: "eu-de-subnet-paginate-1-compute-eu-de-1", platform.ibmcloud.computeSubnets: Not found: "eu-de-subnet-paginate-1-compute-eu-de-2", platform.ibmcloud.computeSubnets: Not found: "eu-de-subnet-paginate-1-compute-eu-de-3", platform.ibmcloud.computeSubnets: Invalid value: []string{"eu-de-subnet-paginate-1-compute-eu-de-1", "eu-de-subnet-paginate-1-compute-eu-de-2", "eu-de-subnet-paginate-1-compute-eu-de-3"}: number of zones (0) covered by computeSubnets does not match number of provided or default zones (3) for compute[0] in eu-de]

Expected results:

    Successful manifests and cluster creation

Additional info:

    IBM Cloud is working on a fix

https://github.com/openshift/installer/pull/9160

Story MGMT-16291: Default values with broken OCP version

View the linked PRs

https://github.com/openshift/assisted-service/pull/5737

Bug OCPBUGS-19278: Update 4.15 ovn-kubernetes-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1882

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1882

Bug OCPBUGS-19283: Update 4.15 ose-cli-artifacts image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1546

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1546

Bug OCPBUGS-21823: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/route-controller-manager/pull/32

Bug OCPBUGS-21863: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/252

Bug OCPBUGS-19250: Update 4.15 ose-cluster-machine-approver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-machine-approver/pull/201

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-machine-approver/pull/201

Bug OCPBUGS-19293: Update 4.15 openshift-enterprise-tests image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/origin/pull/28264

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/origin/pull/28264

Bug OCPBUGS-20267: [GCP] Unit tests have deadlock condition in termination handler

View the Description View the linked PRs

Description of problem:

Due to the way that the termination handlers unit tests are configured, it is possible in some cases for the counter of http requests to the mock handler can cause the test to deadlock and time out. This happens randomly as the ordering of the tests has an effect on when the bug occurs.

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

It happens randomly when run in CI, or when the full suite is run. But if the tests are focused it will happen every time.
Focusing on "poll URL cannot be reached" will exploit the unit test.

Steps to Reproduce:

1. add `-focus "poll URL cannot be reached"` to unit test ginkgo arguments
2. run `make unit`

Actual results:

test suite hangs after this output:
"Handler Suite when running the handler when polling the termination endpoint and the poll URL cannot be reached should return an error /home/mike/dev/machine-api-provider-aws/pkg/termination/handler_test.go:197"

Expected results:

Tests pass

Additional info:

to fix this we need to isolate the test in its own context block, this patch should do the trick:

diff --git a/pkg/termination/handler_test.go b/pkg/termination/handler_test.go
index 2b98b08b..0f85feae 100644
--- a/pkg/termination/handler_test.go
+++ b/pkg/termination/handler_test.go
@@ -187,7 +187,9 @@ var _ = Describe("Handler Suite", func() {
                                        Consistently(nodeMarkedForDeletion(testNode.Name)).Should(BeFalse())
                                })
                        })
+               })
 
+               Context("when the termination endpoint is not valid", func() {
                        Context("and the poll URL cannot be reached", func() {
                                BeforeEach(func() {
                                        nonReachable := "abc#1://localhost"

https://github.com/openshift/machine-api-provider-gcp/pull/61

Bug OCPBUGS-31469: 4.15 Do imports on imagestreams respect ImageTagMirrorSet?

View the Description View the linked PRs

Not sure which component this bug should be associated with.

I am not even sure if importing respects ImageTagMirrorSet.

We could not figure out in the slack conversion.

https://redhat-internal.slack.com/archives/C013VBYBJQH/p1709583648013199

Description of problem:

The expecting behaviour of ImageTagMirrorSet of redirecting the pulling of a proxy to quay.io did not work out.

Version-Release number of selected component (if applicable):

oc --context build02 get clusterversion version
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-ec.3   True        False         7d4h

Steps to Reproduce:

oc --context build02 get ImageTagMirrorSet quay-proxy -o yaml
apiVersion: config.openshift.io/v1
kind: ImageTagMirrorSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"config.openshift.io/v1","kind":"ImageTagMirrorSet","metadata":{"annotations":{},"name":"quay-proxy"},"spec":{"imageTagMirrors":[{"mirrors":["quay.io/openshift/ci"],"source":"quay-proxy.ci.openshift.org/openshift/ci"}]}}
  creationTimestamp: "2024-03-05T03:49:59Z"
  generation: 1
  name: quay-proxy
  resourceVersion: "4895378740"
  uid: 69fb479e-85bd-4a16-a38f-29b08f2636c3
spec:
  imageTagMirrors:
  - mirrors:
    - quay.io/openshift/ci
    source: quay-proxy.ci.openshift.org/openshift/ci


oc --context build02 tag --source docker quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest hongkliu-test/proxy-test-2:011 --as system:admin
Tag proxy-test-2:011 set to quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest.

oc --context build02 get is proxy-test-2 -o yaml
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    openshift.io/image.dockerRepositoryCheck: "2024-03-05T20:03:02Z"
  creationTimestamp: "2024-03-05T20:03:02Z"
  generation: 2
  name: proxy-test-2
  namespace: hongkliu-test
  resourceVersion: "4898915153"
  uid: f60b3142-1f5f-42ae-a936-a9595e794c05
spec:
  lookupPolicy:
    local: false
  tags:
  - annotations: null
    from:
      kind: DockerImage
      name: quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest
    generation: 2
    importPolicy:
      importMode: Legacy
    name: "011"
    referencePolicy:
      type: Source
status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/hongkliu-test/proxy-test-2
  publicDockerImageRepository: registry.build02.ci.openshift.org/hongkliu-test/proxy-test-2
  tags:
  - conditions:
    - generation: 2
      lastTransitionTime: "2024-03-05T20:03:02Z"
      message: 'Internal error occurred: quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest:
        Get "https://quay-proxy.ci.openshift.org/v2/": EOF'
      reason: InternalError
      status: "False"
      type: ImportSuccess
    items: null
    tag: "011"

Actual results:

The status of the stream shows that it still tries to connect to quay-proxy.

Expected results:

The request goes to quay.io directly.

Additional info:

The proxy has been shut down completely just to simplify the case. If it was on, there are Access logs showing the proxy get the requests for the image.
oc scale deployment qci-appci -n ci --replicas 0
deployment.apps/qci-appci scaled

I also checked the pull secret in the namespace and it has correct pull credentials to both proxy and quay.io.

https://github.com/openshift/openshift-apiserver/pull/422

Bug OCPBUGS-43575: MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42719~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42200~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41920. The following is the description of the original issue:
—
Description of problem:

When we move one node from one custom MCP to another custom MCP, the MCPs are reporting a wrong number of nodes.

For example, we reach this situation (worker-perf MCP is not reporting the right number of nodes)

$ oc get mcp,nodes
NAME                                                                     CONFIG                                                         UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master               rendered-master-c8d23b071e1ccf6cf85c7f1b31c0def6               True      False      False      3              3                   3                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker               rendered-worker-36ee1fdc485685ac9c324769889c3348               True      False      False      1              1                   1                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker-perf          rendered-worker-perf-6b5fbffac62c3d437e307e849c44b556          True      False      False      2              2                   2                     0                      24m
machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary   rendered-worker-perf-canary-6b5fbffac62c3d437e307e849c44b556   True      False      False      1              1                   1                     0                      7m52s

NAME                                             STATUS   ROLES                       AGE    VERSION
node/ip-10-0-13-228.us-east-2.compute.internal   Ready    worker,worker-perf-canary   138m   v1.30.4
node/ip-10-0-2-250.us-east-2.compute.internal    Ready    control-plane,master        145m   v1.30.4
node/ip-10-0-34-223.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-35-61.us-east-2.compute.internal    Ready    worker,worker-perf          136m   v1.30.4
node/ip-10-0-79-232.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-86-124.us-east-2.compute.internal   Ready    worker                      139m   v1.30.4



After 20 minutes or half an hour the MCPs start reporting the right number of nodes

Version-Release number of selected component (if applicable):
IPI on AWS version:

$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.17.0-0.nightly-2024-09-13-040101 True False 124m Cluster version is 4.17.0-0.nightly-2024-09-13-040101

How reproducible:
Always

Steps to Reproduce:

    1. Create a MCP
    
     oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-perf
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,worker-perf]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-perf: ""
EOF

    
    2. Add 2 nodes to the MCP
    
   $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf=
   $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[1].metadata.name}") node-role.kubernetes.io/worker-perf=

    3. Create another MCP
    oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-perf-canary
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,worker-perf,worker-perf-canary]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-perf-canary: ""
EOF

    3. Move one node from the MCP created in step 1 to the MCP created in step 3
    $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf-canary=
    $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf-

Actual results:

The worker-perf pool is not reporting the right number of nodes. It continues reporting 2 nodes even though one of them was moved to the worker-perf-canary MCP.
$ oc get mcp,nodes
NAME                                                                     CONFIG                                                         UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master               rendered-master-c8d23b071e1ccf6cf85c7f1b31c0def6               True      False      False      3              3                   3                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker               rendered-worker-36ee1fdc485685ac9c324769889c3348               True      False      False      1              1                   1                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker-perf          rendered-worker-perf-6b5fbffac62c3d437e307e849c44b556          True      False      False      2              2                   2                     0                      24m
machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary   rendered-worker-perf-canary-6b5fbffac62c3d437e307e849c44b556   True      False      False      1              1                   1                     0                      7m52s

NAME                                             STATUS   ROLES                       AGE    VERSION
node/ip-10-0-13-228.us-east-2.compute.internal   Ready    worker,worker-perf-canary   138m   v1.30.4
node/ip-10-0-2-250.us-east-2.compute.internal    Ready    control-plane,master        145m   v1.30.4
node/ip-10-0-34-223.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-35-61.us-east-2.compute.internal    Ready    worker,worker-perf          136m   v1.30.4
node/ip-10-0-79-232.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-86-124.us-east-2.compute.internal   Ready    worker                      139m   v1.30.4

Expected results:

MCPs should always report the right number of nodes

Additional info:

It is very similar to this other issue 
https://bugzilla.redhat.com/show_bug.cgi?id=2090436
That was discussed in this slack conversation
https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1653479831004619

https://github.com/openshift/machine-config-operator/pull/4647

Bug OCPBUGS-15538: Address manager primary node IP constantly being "updated"

View the Description View the linked PRs

Description of problem:

It was seen in downstream and upstream that ovn-controller was constantly restarting. This was due to ovnkube-node telling it to exit after it thought that the encap IP (the primary node IP) had changed.

This has been mitigated by:
https://github.com/ovn-org/ovn-kubernetes/pull/3711

But we still need to know why the c.nodePrimaryAddrChanged() function is returning true when nothing is really changing on the node. Example after the fix above:

ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 22:37:02.020612 1670 node_ip_handler_linux.go:212] Node primary address changed to 172.18.0.3. Updating OVN encap IP.
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 22:37:02.037852 1670 node_ip_handler_linux.go:343] Will not update encap IP, value: 172.18.0.3 is the already configured
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:03:03.115881 16698 node_ip_handler_linux.go:212] Node primary address changed to 172.18.0.3. Updating OVN encap IP.
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:03:03.122365 16698 node_ip_handler_linux.go:343] Will not update encap IP, value: 172.18.0.3 is the already configured
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:18:08.381694 27220 node_ip_handler_linux.go:212] Node primary address changed to 172.18.0.3. Updating OVN encap IP.
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:18:08.389655 27220 node_ip_handler_linux.go:343] Will not update encap IP, value: 172.18.0.3 is the already configured
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:19:26.638221 28746 node_ip_handler_linux.go:212] Node primary address changed to 172.18.0.3. Updating OVN encap IP.
ovn-control-plane/ovn-kubernetes/ovnkube.log:I0627 23:19:26.644217 28746 node_ip_handler_linux.go:343] Will not update encap IP, value: 172.18.0.3 is the already configured

This can be observed in kind deployments as well.

Version-Release number of selected component (if applicable):

Could affect versions earlier than 4.14

https://github.com/openshift/ovn-kubernetes/pull/1935

Bug OCPBUGS-19875: Console plugin requests show error message with 304 status and "request method or response status code does not allow body"

View the Description View the linked PRs

This issue has been updated to capture a larger ongoing issue around console 304 status responses for plugins. This has been observed for ODF, ACM, MCE, monitoring, and other plugins going back to 4.12. Related links:

Original report from this bug:

Description of problem:

find error logs under console pod logs

Version-Release number of selected component (if applicable):

% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2023-09-27-073353   True        False         37m     Cluster version is 4.15.0-0.nightly-2023-09-27-073353

How reproducible:

100% on ipv6 clusters

Steps to Reproduce:

1.% oc -n openshift-console logs console-6fbf69cc49-7jq5b
...
E0928 00:35:24.098808       1 handlers.go:172] GET request for "monitoring-plugin" plugin failed with 304 status code
E0928 00:35:24.098822       1 utils.go:43] Failed sending HTTP response body: http: request method or response status code does not allow body
E0928 00:35:39.611569       1 handlers.go:172] GET request for "monitoring-plugin" plugin failed with 304 status code
E0928 00:35:39.611583       1 utils.go:43] Failed sending HTTP response body: http: request method or response status code does not allow body
E0928 00:35:54.442150       1 handlers.go:172] GET request for "monitoring-plugin" plugin failed with 304 status code
E0928 00:35:54.442167       1 utils.go:43] Failed sending HTTP response body: http: request method or response status code does not allow body

Actual results:

GET request for "monitoring-plugin" plugin failed with 304 status code

Expected results:

no monitoring-plugin related error logs

https://github.com/openshift/console/pull/13272

Bug OCPBUGS-23685: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/72

Bug OCPBUGS-32351: [4.15] Tracker for issues in the Ironic servicing feature

View the Description View the linked PRs

This is a tracker bug for issues discovered when working on https://issues.redhat.com/browse/METAL-940. No QA verification will be possible until the feature is implemented much later.

https://github.com/openshift/ironic-image/pull/479

Bug ACM-7713: Must Gather does not get ConfigurationPolicies

View the Description View the linked PRs

Description of problem:

When looking at an ACM must-gather for a managed cluster, no information for the ConfigurationPolicies can be seen. It appears that this command in the must-gather script has an error:

oc adm inspect configurationpolicies.policy.open-cluster-management.io --all-namespaces  --dest-dir=must-gather

The error (which is not logged in the must-gather itself...) looks like:

error: errors ocurred while gathering data:
    skipping gathering  due to error: the server doesn't have a resource type ""

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

...

Actual results:

Expected results:

ConfigurationPolicy YAML should be collected in the must-gather to help in debugging.

Additional info:

https://github.com/openshift/oc/pull/1550

Bug OCPBUGS-20179: Nodepool metric does not correctly reflect nodepool state

View the Description View the linked PRs

Description of problem:

hypershift_nodepools_available_replicas does not properly reflect the nodepool.

$ oc get nodepools -n ocm-production-12345678
NAME              CLUSTER   DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
re-test-workers   re-test   2               0               False         True         4.12.35                                      Minimum availability requires 2 replicas, current 0 available

Meanwhile, there are 3 hypershift_nodepools_available_replicas time series for the nodepools:
- re-test-worker2 reporting 1
- re-test-worker3 reporting 1
- re-test-workers reporting 0 (accurate)

The issue here is the two extra time series, which should not exist if the nodepool doesn't exist.

Version-Release number of selected component (if applicable):

4.12.35

How reproducible:

This particular cluster had its OIDC configuration along with other customer AWS account resources deleted, which might be connected to the misbehaviour of the metric.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Adding must-gather and metric time series in the ticket

https://github.com/openshift/hypershift/pull/2671

Bug OCPBUGS-21650: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/25

Bug OCPBUGS-27235: The web console's JS client is using stale tokens

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-16814~~. The following is the description of the original issue:
—
Description of problem:

Starting OpenShift 4.8 (https://docs.openshift.com/container-platform/4.8/release_notes/ocp-4-8-release-notes.html#ocp-4-8-notable-technical-changes), all pods are getting bound SA tokens.

Currently, instead of expiring the token, we use the `service-account-extend-token-expiration` that extends a bound token validity to 1yr and warns in case of a use of a token that would've otherwise been expired.

We want to disable this behavior in a future OpenShift release, which would break the OpenShift web console.

Version-Release number of selected component (if applicable):

4.8 - 4.14

How reproducible:

100%

Steps to Reproduce:

1. install a fresh cluster
2. wait ~1hr since console pods were deployed for the token rotation to occur
3. log in to the console and click around
4. check the kube-apiserver audit logs events for the "authentication.k8s.io/stale-token" annotation

Actual results:

many occurrences (I doubt I'll be able to upload a text file so I'll show a few audit events in the first comment.

Expected results:

The web-console re-reads the SA token regularly so that it never uses an expired token

Additional info:

In a theoretical case where a console pod lasts for a year, it's going to break and won't be able to authenticate to the kube-apiserver.

We are planning on disallowing the use of stale tokens in a future release and we need to make sure that the core platform is not broken so that the metrics we collect from the clusters in the wild are not polluted.

https://github.com/openshift/console/pull/13514

Bug OCPBUGS-31924: [aws] s3:HeadBucket permission does not exist

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31678~~. The following is the description of the original issue:
—
Description of problem:

    The code requires the `s3:HeadObject` permission (https://github.com/openshift/cloud-credential-operator/blob/master/pkg/aws/utils.go#L57) but it doesn't exist. The AWS docs say the permission needed is `s3:ListBucket`: https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadBucket.html

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    always

Steps to Reproduce:

    1. Try to install cluster with minimal permissions without s3:HeadBucket
    2.
    3.

Actual results:

level=warning msg=Action not allowed with tested creds action=iam:DeleteUserPolicy
level=warning msg=Tested creds not able to perform all requested actions
level=warning msg=Action not allowed with tested creds action=s3:HeadBucket
level=warning msg=Tested creds not able to perform all requested actions
level=fatal msg=failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Permissions Check": validate AWS credentials: AWS credentials cannot be used to either create new creds or use as-is
Installer exit with code 1

Expected results:

    Only `s3:ListBucket` should be checked.

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/691

Bug OCPBUGS-20305: Extra space is in the translation text(Chinese) of 'Create rolebinding' and 'replicate rolebinding'

View the Description View the linked PRs

Description of problem:

Extra space is in the translation text(Chinese) of Duplicate RoleBinding' in kebab list

The change of PR https://github.com/openshift/console/pull/12099 for some reason are not included into the master/release4.12-4.14 branch

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-220853

How reproducible:

Always

Steps to Reproduce:

1. Login OCP, update language to Chinese
2. Navigate to RoleBindings page, choose one rolebinding, click the kebab icon on the end, check the translation text of 'Duplicate RoleBinding'
3.

Actual results:

2. It's shown '重复 角色绑定' and "重复 集群角色绑定"

Expected results:

Remove extra space
It's shown '重复角色绑定' and "重复集群角色绑定"

Additional info:

https://github.com/openshift/console/pull/13236

Bug OCPBUGS-22425: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/420

Bug OCPBUGS-25251: OKD: Agent-based Installer is broken on OKD/FCOS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19303~~. The following is the description of the original issue:
—
Description of problem:

OKD/FCOS uses FCOS for its bootimage which lacks several tools and services such as oc and crio that the rendezvous host of the Agent-based Installer needs to set up a bootstrap control plane.

Version-Release number of selected component (if applicable):

4.13.0
4.14.0
4.15.0

https://github.com/openshift/installer/pull/7830

Task MON-3398: Request for sending new RHACM metric via Telemetry

View the Description View the linked PRs

Request for sending data via telemetry

The goal is to collect metrics about user page interaction to better understand how customers use the console, and in turn develop a better experience.

acm_console_page_count:sum

acm_console_page_count:sum represents a counter for page visits across the main product pages.

Labels

page, possible values are: overview-classic, overview-fleet, search, search-details, clusters, application, governance

The cardinality of the metric is at most 7 (7 page labels listed above - PrometheusRule is implemented to sum the page visit counts across Pods).

https://github.com/openshift/cluster-monitoring-operator/pull/2100

Bug OCPBUGS-16794: The file permission of the controller manager pod specification file should be set to 600 to conform with CIS benchmarks

View the Description View the linked PRs

Description of problem:

Observation from CISv1.4 pdf:
1.1.3 Ensure that the controller manager pod specification file



When I checked I found description of the controller manager pod specification file in CIS v1.4 PDF is as follows:
"Ensure that the controller manager pod specification file has permissions of 600 or more
restrictive.
 
OpenShift 4 deploys two API servers: the OpenShift API server and the Kube API server. The OpenShift API server delegates requests for Kubernetes objects to the Kube API server.
The OpenShift API server is managed as a deployment. The pod specification yaml for openshift-apiserver is stored in etcd.
The Kube API Server is managed as a static pod. The pod specification file for the kube-apiserver is created on the control plane nodes at /etc/kubernetes/manifests/kube-apiserver-pod.yaml. The kube-apiserver is mounted via hostpath to the kube-apiserver pods via /etc/kubernetes/static-pod-resources/kube-apiserver-pod.yaml with permissions 600."
 
To conform with CIS benchmarks, the controller manager pod specification file should be updated to 600.

$ for i in $( oc get pods -n openshift-kube-controller-manager -o name -l app=kube-controller-manager)
do                          
oc exec -n openshift-kube-controller-manager $i -- stat -c %a /etc/kubernetes/static-pod-resources/kube-controller-manager-pod.yaml  
done                                                                    
644
644
644

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

The controller manager pod specification file for the kube-apiserver is 644.

Expected results:

The controller manager pod specification file for the kube-apiserver is 644.

Additional info:

https://github.com/openshift/library-go/commit/19a42d2bae8ba68761cfad72bf764e10d275ad6e

Bug OCPBUGS-17542: [Azure-Disk-CSI-Driver] Message correction for "The performancePlus flag can only be set on disks at least 512 GB in size"

View the Description View the linked PRs

Description of problem:

When using the performancePlus in storageclass in azure-disk-csi-driver, it asks for volume size large than 512GB, but the message shows "The performancePlus flag can only be set on disks at least 512 GB in size" which means 512 is supported. It will make confuse to users.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create sc with 
parameters:
  enablePerformancePlus: "true" 

2. Create pvc with 512Gi 

3. Get ProvisioningFailed message as below which is a bit confused:  
  Warning  ProvisioningFailed    <invalid> (x5 over <invalid>)  disk.csi.azure.com_wduan0810manual-b5dng-master-1_d7a29bbf-3f49-4207-af33-056e0814f6e2  failed to provision volume with StorageClass "managed-csi-test-28-sssdlrs-enableperformanceplus": rpc error: code = Internal desc = Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: {
  "error": {
    "code": "BadRequest",
    "message": "The performancePlus flag can only be set on disks at least 512 GB in size."
  }
}

Actual results:

Expected results:

Message should mention larger than 512GB, but not "at least".

Additional info:

https://github.com/openshift/azure-disk-csi-driver/pull/52

Bug OCPBUGS-20049: Agent-based install on vSphere with multiple workers fails

View the Description View the linked PRs

Description of problem:

Agent-based install on vSphere with multiple workers fails

Version-Release number of selected component (if applicable):

4.13.4

How reproducible:

Always

Steps to Reproduce:

1. Create agent-config, install-config for 3 master, 3+ worker cluster
2. Create Agent ISO image
3. Boot targets from Agent ISO

Actual results:

Deployment hangs waiting on cluster operators

Expected results:

Deployment completes

Additional info:

Multiple pods cannot start due to tainted nodes:"4 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}"

https://github.com/openshift/assisted-installer/pull/739

Bug OCPBUGS-30237: Handle kubeconfig changes like CA rotation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27429~~. The following is the description of the original issue:
—

Description of problem:

A long-lived cluster updating into 4.16.0-ec.1 was bitten by the Engineering Candidate's month-or-more-old api-int CA rotation (details on early rotation in API-1687). After manually updating /var/lib/kubelet/kubeconfig to include the new CA (which ~~OCPBUGS-25821~~ is working on automating), multus pods still complained about untrusted api-int:

$ oc -n openshift-multus logs multus-pz7zp | grep api-int | tail -n5
E0119 19:33:52.983918    3194 reflector.go:148] k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dbuild0-gstfj-m-2.c.openshift-ci-build-farm.internal&resourceVersion=4723865081": tls: failed to verify certificate: x509: certificate signed by unknown authority
2024-01-19T19:33:55Z [error] Multus: [openshift-machine-api/cluster-autoscaler-default-f8dd547c7-dg9t5/f79ff01a-71c2-4f02-b48b-8c23c9e875ce]: error waiting for pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/namespaces/openshift-machine-api/pods/cluster-autoscaler-default-f8dd547c7-dg9t5?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority
2024-01-19T19:33:55Z [verbose] ADD finished CNI request ContainerID:"b554f8edca8ea7672119c1aa71a69e0368fefeb5f8ae2c2659f822b7fa8d3f62" Netns:"/var/run/netns/36923fe0-e28d-422f-8213-233086527baa" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-machine-api;K8S_POD_NAME=cluster-autoscaler-default-f8dd547c7-dg9t5;K8S_POD_INFRA_CONTAINER_ID=b554f8edca8ea7672119c1aa71a69e0368fefeb5f8ae2c2659f822b7fa8d3f62;K8S_POD_UID=f79ff01a-71c2-4f02-b48b-8c23c9e875ce" Path:"", result: "", err: error configuring pod [openshift-machine-api/cluster-autoscaler-default-f8dd547c7-dg9t5] networking: Multus: [openshift-machine-api/cluster-autoscaler-default-f8dd547c7-dg9t5/f79ff01a-71c2-4f02-b48b-8c23c9e875ce]: error waiting for pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/namespaces/openshift-machine-api/pods/cluster-autoscaler-default-f8dd547c7-dg9t5?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority
2024-01-19T19:34:00Z [error] Multus: [openshift-kube-storage-version-migrator/migrator-558d4d48b9-ggjpj/769153af-350b-492b-9589-ede2574aea85]: error waiting for pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/namespaces/openshift-kube-storage-version-migrator/pods/migrator-558d4d48b9-ggjpj?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority
2024-01-19T19:34:00Z [verbose] ADD finished CNI request ContainerID:"cfd0b8ca596411f1e26ae058fc9f015d6edeac407668420c023ff459860423eb" Netns:"/var/run/netns/bc7fbf17-c049-4241-a7dc-7e27acd3c8af" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-storage-version-migrator;K8S_POD_NAME=migrator-558d4d48b9-ggjpj;K8S_POD_INFRA_CONTAINER_ID=cfd0b8ca596411f1e26ae058fc9f015d6edeac407668420c023ff459860423eb;K8S_POD_UID=769153af-350b-492b-9589-ede2574aea85" Path:"", result: "", err: error configuring pod [openshift-kube-storage-version-migrator/migrator-558d4d48b9-ggjpj] networking: Multus: [openshift-kube-storage-version-migrator/migrator-558d4d48b9-ggjpj/769153af-350b-492b-9589-ede2574aea85]: error waiting for pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/namespaces/openshift-kube-storage-version-migrator/pods/migrator-558d4d48b9-ggjpj?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority

The multus pod needed a delete/replace, and after that it recovered:

$ oc --as system:admin -n openshift-multus delete pod multus-pz7zp
pod "multus-pz7zp" deleted
$ oc -n openshift-multus get -o wide pods | grep 'NAME\|build0-gstfj-m-2.c.openshift-ci-build-farm.internal'
NAME                                           READY   STATUS              RESTARTS      AGE     IP               NODE                                                              NOMINATED NODE   READINESS GATES
multus-additional-cni-plugins-wrdtt            1/1     Running             1             28h     10.0.0.3         build0-gstfj-m-2.c.openshift-ci-build-farm.internal               <none>           <none>
multus-admission-controller-74d794678b-9s7kl   2/2     Running             0             27h     10.129.0.36      build0-gstfj-m-2.c.openshift-ci-build-farm.internal               <none>           <none>
multus-hxmkz                                   1/1     Running             0             11s     10.0.0.3         build0-gstfj-m-2.c.openshift-ci-build-farm.internal               <none>           <none>
network-metrics-daemon-dczvs                   2/2     Running             2             28h     10.129.0.4       build0-gstfj-m-2.c.openshift-ci-build-farm.internal               <none>           <none>
$ oc -n openshift-multus logs multus-hxmkz | grep -c api-int
0

That need for multus-pod deletion should be automated, to reduce the number of things that need manual touches when the api-int CA rolls.

Version-Release number of selected component

Seen in 4.16.0-ec.1.

How reproducible:

Several multus on this cluster were bit. But others were not, including some on clusters with old kubeconfigs that did not contain the new CA. I'm not clear on what the trigger is, perhaps some clients escape immediate trouble by having exsting api-int connections to servers from back when the servers used the old CA? But deleting the multus pod on a cluster whose /var/lib/kubelet/kubeconfig has not yet been updated will likely reproduce the breakage, at least until ~~OCPBUGS-25821~~ is fixed.

Steps to Reproduce:

Not entirely clear, but something like:

Install 4.16.0-ec.1.
Wait a month or more for the Kube API server operator to decide to roll the CA signing api-int.
Delete a multus pod, so the replacement comes up broken on api-int trust.
Manually update /var/lib/kubelet/kubeconfig.

Actual results:

Multus still fails to trust api-int until the broken pod is deleted or the container otherwise restarts to notice the updated kubeconfig.

Expected results:

Multus pod automatically pulls in the updated kubeconfig.

Additional info:

One possible implementation would be a liveness probe failing on api-int trust issues, triggering the kubelet to roll the multus container, and the replacement multus container to come up and load the fresh kubeconfig.

https://github.com/openshift/multus-cni/pull/223

Bug OCPBUGS-37171: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4489

Bug OCPBUGS-16905: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3895

Bug OCPBUGS-18114: [CNO] nodeSelector[beta.kubernetes.io/os]: deprecated since v1.14

View the Description View the linked PRs

Description of problem:


Warning: spec.template.spec.nodeSelector[beta.kubernetes.io/os]: deprecated since v1.14; use "kubernetes.io/os" instead

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-17-145803

How reproducible:
Always

Steps to Reproduce:

1. oc rollout restart ds/ovnkube-node
2.
3.

Actual results:

Warning: spec.template.spec.nodeSelector[beta.kubernetes.io/os]: deprecated since v1.14; use "kubernetes.io/os" instead

Expected results:

No warning

https://github.com/openshift/cluster-network-operator/pull/1845

Story ETCD-187: Create dashboard that shows CPU iotwait on master nodes

View the Description View the linked PRs

This came out of the https://bugzilla.redhat.com/show_bug.cgi?id=1943704.

Add dashboard for iowait CPU on master nodes, this will help customers and customer support or us identify problems that result in leader election - we can see that often due to high iowait, aligning with large spikes in fsync and or peer to peer latency.

Query:

(sum(irate(node_cpu_seconds_total {mode="iowait"} [2m])) without (cpu)) / count(node_cpu_seconds_total) without (cpu) * 100
AND on (instance) label_replace( kube_node_role{role="master"}, "instance", "$1", "node", "(.+)" )

https://github.com/openshift/cluster-etcd-operator/pull/1119

Bug OCPBUGS-14994: Ingress operator attempts spurious deletes of the client CA configmap when deleting an IngressController that has a client TLS configured

View the Description View the linked PRs

Description of problem

When the ingress operator's clientca-configmap controller reconciles an IngressController, this controller attempts to add a finalizer to the IngressController if that finalizer is absent. This controller erroneously attempts to add the missing finalizer even if the IngressController is marked for deletion, which results in an error. This error causes the controller to retry the deletion and log the error multiple times.

Version-Release number of selected component (if applicable)

I observed this in CI for OCP 4.14 and was able to reproduce it on 4.11.37, and it probably affects earlier versions as well. The problematic code was added in https://github.com/openshift/cluster-ingress-operator/pull/450/commits/0f36470250c3089769867ebd72e25c413a29cda2 in OCP 4.9 to implement ~~NE-323~~.

How reproducible

Easily.

Steps to Reproduce

1. Create a configmap in the "openshift-config" namespace (to reproduce this issue, it is not necessary that the configmap have a valid TLS certificate and key):

oc -n openshift-config create configmap client-ca-cert

2. Create an IngressController that specifies spec.clientTLS.clientCA.name to point to the configmap from the previous step:

oc create -f - <<EOF
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: test-client-ca-configmap
  namespace: openshift-ingress-operator
spec:
  domain: example.xyz
  endpointPublishingStrategy:
    type: Private
  clientTLS:
    clientCA:
      name: client-ca-cert
    clientCertificatePolicy: Required
EOF

3. Delete the IngressController:

oc -n openshift-ingress-operator delete ingresscontrollers/test-client-ca-configmap

4. Check the ingress operator's logs:

oc -n openshift-ingress-operator logs -c ingress-operator deployments/ingress-operator

Actual results

The ingress operator logs several attempts to add the finalizer to the IngressController after it has been marked for deletion:

2023-06-15T02:17:12.419Z        ERROR   operator.init   controller/controller.go:273    Reconciler error        {"controller": "clientca_configmap_controller", "object": {"name":"test-client-ca-configmap","namespace":"openshift-ingress-operator"}, "namespace": "openshift-ingress-operator", "name": "test-client-ca-configmap", "reconcileID": "2274f55e-e5bd-4fdb-973e-821a44cf2ebf", "error": "failed to add client-ca-configmap finalizer: IngressController.operator.openshift.io \"test-client-ca-configmap\" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{\"ingresscontroller.operator.openshift.io/finalizer-clientca-configmap\"}"}

The deletion does succeed, errors notwithstanding.

Expected results

The ingress operator should succeed in deleting the IngressController without attempting to re-add the finalizer to the IngressController after it has been marked for deletion.

https://github.com/openshift/cluster-ingress-operator/pull/948

Bug OCPBUGS-21814: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3988

Bug OCPBUGS-37962: Incorrect OVN-K alerts pre & post IC (was: There is no runbook url for alert OVNKubernetesNorthdInactive)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37362~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33758~~. The following is the description of the original issue:
—
Description of problem:

We have runbook for OVNKubernetesNorthdInactive: https://github.com/openshift/runbooks/blob/master/alerts/cluster-network-operator/OVNKubernetesNorthdInactive.md

But the runbook url is not added for alert OVNKubernetesNorthdInactive:
4.12: https://github.com/openshift/cluster-network-operator/blob/c1a891129c310d01b8d6940f1eefd26058c0f5b6/bindata/network/ovn-kubernetes/managed/alert-rules-control-plane.yaml#L350
4.13: https://github.com/openshift/cluster-network-operator/blob/257435702312e418be694f4b98b8fe89557030c6/bindata/network/ovn-kubernetes/managed/alert-rules-control-plane.yaml#L350

Version-Release number of selected component (if applicable):

4.12.z, 4.13.z

How reproducible:

always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-32156: TaskRun status is not displayed near the name

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31745~~. The following is the description of the original issue:
—
Description of problem:

The TaskRun status is not displayed near the TaskRun name on the TaskRun details page

All temporal resources like PipelineRuns, Builds, Shipwright BuildRuns, etc show the status of the resource (succeeded, failed, etc) near the name on the resource details page.

https://github.com/openshift/console/pull/13752

Bug OCPBUGS-36724: 4.14: Build Tests Reference EOL Ruby Image

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36182~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33486~~. The following is the description of the original issue:
—
Description of problem:

Build tests in OCP 4.14 reference Ruby images that are now EOL. The related code in our sample ruby build was deleted.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

    1. Run the build suite for OCP 4.14 against a 4.14 cluster

Actual results:

Test [sig-builds][Feature:Builds][Slow] builds with a context directory s2i context directory build should s2i build an application using a context directory [apigroup:build.openshift.io] fails

2024-05-08T11:11:57.558298778Z I0508 11:11:57.558273       1 builder.go:400] Powered by buildah v1.31.0
  2024-05-08T11:11:57.581578795Z I0508 11:11:57.581509       1 builder.go:473] effective capabilities: [audit_control=true audit_read=true audit_write=true block_suspend=true bpf=true checkpoint_restore=true chown=true dac_override=true dac_read_search=true fowner=true fsetid=true ipc_lock=true ipc_owner=true kill=true lease=true linux_immutable=true mac_admin=true mac_override=true mknod=true net_admin=true net_bind_service=true net_broadcast=true net_raw=true perfmon=true setfcap=true setgid=true setpcap=true setuid=true sys_admin=true sys_boot=true sys_chroot=true sys_module=true sys_nice=true sys_pacct=true sys_ptrace=true sys_rawio=true sys_resource=true sys_time=true sys_tty_config=true syslog=true wake_alarm=true]
  2024-05-08T11:11:57.583755245Z I0508 11:11:57.583715       1 builder.go:401] redacted build: {"kind":"Build","apiVersion":"build.openshift.io/v1","metadata":{"name":"s2icontext-1","namespace":"e2e-test-contextdir-wpphk","uid":"c2db2893-06e5-4274-96ae-d8cd635a1f8d","resourceVersion":"51882","generation":1,"creationTimestamp":"2024-05-08T11:11:55Z","labels":{"buildconfig":"s2icontext","openshift.io/build-config.name":"s2icontext","openshift.io/build.start-policy":"Serial"},"annotations":{"openshift.io/build-config.name":"s2icontext","openshift.io/build.number":"1"},"ownerReferences":[{"apiVersion":"build.openshift.io/v1","kind":"BuildConfig","name":"s2icontext","uid":"b7dbb52b-ae66-4465-babc-728ae3ceed9a","controller":true}],"managedFields":[{"manager":"openshift-apiserver","operation":"Update","apiVersion":"build.openshift.io/v1","time":"2024-05-08T11:11:55Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:openshift.io/build-config.name":{},"f:openshift.io/build.number":{}},"f:labels":{".":{},"f:buildconfig":{},"f:openshift.io/build-config.name":{},"f:openshift.io/build.start-policy":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"b7dbb52b-ae66-4465-babc-728ae3ceed9a\"}":{}}},"f:spec":{"f:output":{"f:to":{}},"f:serviceAccount":{},"f:source":{"f:contextDir":{},"f:git":{".":{},"f:uri":{}},"f:type":{}},"f:strategy":{"f:sourceStrategy":{".":{},"f:env":{},"f:from":{},"f:pullSecret":{}},"f:type":{}},"f:triggeredBy":{}},"f:status":{"f:conditions":{".":{},"k:{\"type\":\"New\"}":{".":{},"f:lastTransitionTime":{},"f:lastUpdateTime":{},"f:status":{},"f:type":{}}},"f:config":{},"f:phase":{}}}}]},"spec":{"serviceAccount":"builder","source":{"type":"Git","git":{"uri":"https://github.com/sclorg/s2i-ruby-container"},"contextDir":"2.7/test/puma-test-app"},"strategy":{"type":"Source","sourceStrategy":{"from":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/openshift/ruby:2.7-ubi8"},"pullSecret":{"name":"builder-dockercfg-v9xk2"},"env":[{"name":"BUILD_LOGLEVEL","value":"5"}]}},"output":{"to":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/e2e-test-contextdir-wpphk/test:latest"},"pushSecret":{"name":"builder-dockercfg-v9xk2"}},"resources":{},"postCommit":{},"nodeSelector":null,"triggeredBy":[{"message":"Manually triggered"}]},"status":{"phase":"New","outputDockerImageReference":"image-registry.openshift-image-registry.svc:5000/e2e-test-contextdir-wpphk/test:latest","config":{"kind":"BuildConfig","namespace":"e2e-test-contextdir-wpphk","name":"s2icontext"},"output":{},"conditions":[{"type":"New","status":"True","lastUpdateTime":"2024-05-08T11:11:55Z","lastTransitionTime":"2024-05-08T11:11:55Z"}]}}
  2024-05-08T11:11:57.584949442Z Cloning "https://github.com/sclorg/s2i-ruby-container" ...
  2024-05-08T11:11:57.585044449Z I0508 11:11:57.585030       1 source.go:237] git ls-remote --heads https://github.com/sclorg/s2i-ruby-container
  2024-05-08T11:11:57.585081852Z I0508 11:11:57.585072       1 repository.go:450] Executing git ls-remote --heads https://github.com/sclorg/s2i-ruby-container
  2024-05-08T11:11:57.840621917Z I0508 11:11:57.840572       1 source.go:237] 663daf43b2abb5662504638d017c7175a6cff59d	refs/heads/3.2-experimental
  2024-05-08T11:11:57.840621917Z 88b4e684576b3fe0e06c82bd43265e41a8129c5d	refs/heads/add_test_latest_imagestreams
  2024-05-08T11:11:57.840621917Z 12a863ab4b050a1365d6d59970dddc6743e8bc8c	refs/heads/master
  2024-05-08T11:11:57.840730405Z I0508 11:11:57.840714       1 source.go:69] Cloning source from https://github.com/sclorg/s2i-ruby-container
  2024-05-08T11:11:57.840793509Z I0508 11:11:57.840781       1 repository.go:450] Executing git clone --recursive --depth=1 https://github.com/sclorg/s2i-ruby-container /tmp/build/inputs
  2024-05-08T11:11:59.073229755Z I0508 11:11:59.073183       1 repository.go:450] Executing git rev-parse --abbrev-ref HEAD
  2024-05-08T11:11:59.080132731Z I0508 11:11:59.080079       1 repository.go:450] Executing git rev-parse --verify HEAD
  2024-05-08T11:11:59.083626287Z I0508 11:11:59.083586       1 repository.go:450] Executing git --no-pager show -s --format=%an HEAD
  2024-05-08T11:11:59.115407368Z I0508 11:11:59.115361       1 repository.go:450] Executing git --no-pager show -s --format=%ae HEAD
  2024-05-08T11:11:59.195276873Z I0508 11:11:59.195231       1 repository.go:450] Executing git --no-pager show -s --format=%cn HEAD
  2024-05-08T11:11:59.198916080Z I0508 11:11:59.198879       1 repository.go:450] Executing git --no-pager show -s --format=%ce HEAD
  2024-05-08T11:11:59.204712375Z I0508 11:11:59.204663       1 repository.go:450] Executing git --no-pager show -s --format=%ad HEAD
  2024-05-08T11:11:59.211098793Z I0508 11:11:59.211051       1 repository.go:450] Executing git --no-pager show -s --format=%<(80,trunc)%s HEAD
  2024-05-08T11:11:59.216192627Z I0508 11:11:59.216149       1 repository.go:450] Executing git config --get remote.origin.url
  2024-05-08T11:11:59.218615714Z 	Commit:	12a863ab4b050a1365d6d59970dddc6743e8bc8c (Bump common from `1f774c8` to `a957816` (#537))
  2024-05-08T11:11:59.218661988Z 	Author:	dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  2024-05-08T11:11:59.218683019Z 	Date:	Tue Apr 9 15:24:11 2024 +0200
  2024-05-08T11:11:59.218722882Z I0508 11:11:59.218711       1 repository.go:450] Executing git rev-parse --abbrev-ref HEAD
  2024-05-08T11:11:59.234411732Z I0508 11:11:59.234366       1 repository.go:450] Executing git rev-parse --verify HEAD
  2024-05-08T11:11:59.237729596Z I0508 11:11:59.237698       1 repository.go:450] Executing git --no-pager show -s --format=%an HEAD
  2024-05-08T11:11:59.255304604Z I0508 11:11:59.255269       1 repository.go:450] Executing git --no-pager show -s --format=%ae HEAD
  2024-05-08T11:11:59.261113560Z I0508 11:11:59.261074       1 repository.go:450] Executing git --no-pager show -s --format=%cn HEAD
  2024-05-08T11:11:59.270006232Z I0508 11:11:59.269961       1 repository.go:450] Executing git --no-pager show -s --format=%ce HEAD
  2024-05-08T11:11:59.278485984Z I0508 11:11:59.278443       1 repository.go:450] Executing git --no-pager show -s --format=%ad HEAD
  2024-05-08T11:11:59.281940527Z I0508 11:11:59.281906       1 repository.go:450] Executing git --no-pager show -s --format=%<(80,trunc)%s HEAD
  2024-05-08T11:11:59.299465312Z I0508 11:11:59.299423       1 repository.go:450] Executing git config --get remote.origin.url
  2024-05-08T11:11:59.374652834Z error: provided context directory does not exist: 2.7/test/puma-test-app

Expected results:

Tests succeed

Additional info:

Ruby 2.7 is EOL and not searchable in the Red Hat container catalog.

Failing test: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-openshift-controller-manager-operator/344/pull-ci-openshift-cluster-openshift-controller-manager-operator-release-4.14-openshift-e2e-aws-builds-techpreview/1788152058105303040

https://github.com/openshift/origin/pull/28927

Bug OCPBUGS-23539: Bogus warning message when creating manifests

View the Description View the linked PRs

Description of problem:

Creating the installation manifests results in a bogus warning message about discarding existing manifests, even though none exist.

Version-Release number of selected component (if applicable):

Tested on 4.15 dev, but the problem appears to have been present since 4.2.

How reproducible:

100%

Steps to Reproduce:

1. Start with an empty dir containing only an install-config.yaml with platform: baremetal
2. Run "openshift-install create manifests"
3. There is no step 3

Actual results:

INFO Consuming Install Config from target directory 
WARNING Discarding the Openshift Manifests that was provided in the target directory because its dependencies are dirty and it needs to be regenerated 
INFO Manifests created in: test/manifests and test/openshift

Expected results:

INFO Consuming Install Config from target directory           
INFO Manifests created in: test/manifests and test/openshift

Additional info:

The issue is due to multiple assets referencing the same files.

https://github.com/openshift/installer/pull/7753

Bug OCPBUGS-29180: unable to use `continue: true` in user-defined AlertmanagerConfig

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28251~~. The following is the description of the original issue:
—
Description of problem:

Trying to define multiple receivers in a single user-defined AlertmanagerConfig

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

#### Monitoring for user-defined projects is enabled
```
oc -n openshift-monitoring get configmap cluster-monitoring-config -o yaml | head -4
```
```
apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true
```

#### separate Alertmanager instance for user-defined alert routing is Enabled and Configured
```
oc -n openshift-user-workload-monitoring get configmap user-workload-monitoring-config -o yaml | head -6
```
```
apiVersion: v1
data:
  config.yaml: |
    alertmanager:
      enabled: true
      enableAlertmanagerConfig: true
```
create testing namespace 
oc new-project libor-alertmanager-testing 
```
## TESTING - MULTIPLE RECEIVERS IN ALERTMANAGERCONFIG
Single AlertmanagerConfig
`alertmanager_config_webhook_and_email_rootDefault.yaml`
```
apiVersion: monitoring.coreos.com/v1beta1
kind: AlertmanagerConfig
metadata:
  name: libor-alertmanager-testing-email-webhook
  namespace: libor-alertmanager-testing
spec:
  receivers:
  - name: 'libor-alertmanager-testing-webhook'
    webhookConfigs:
      - url: 'http://prometheus-msteams.internal-monitoring.svc:2000/occ-alerts'
  - name: 'libor-alertmanager-testing-email'
    emailConfigs:
      - to: USER@USER.CO
        requireTLS: false
        sendResolved: true
  - name: Default
  route:
    groupBy:
    - namespace
    receiver: Default
    groupInterval: 60s
    groupWait: 60s
    repeatInterval: 12h
    routes:
    - matchers:
      - name: severity
        value: critical
        matchType: '='
        continue: true
      receiver: 'libor-alertmanager-testing-webhook'
    - matchers:
      - name: severity
        value: critical
        matchType: '='
      receiver: 'libor-alertmanager-testing-email'
```
Once saved the continue statement is removed from the object. 
```
the configuration applied to alertmanager contains continue false statements
```
oc exec -n openshift-user-workload-monitoring alertmanager-user-workload-0 -- amtool config show --alertmanager.url http://localhost:9093 

```
route:
  receiver: Default
  group_by:
  - namespace
  continue: false
  routes:
  - receiver: libor-alertmanager-testing/libor-alertmanager-testing-email-webhook/Default
    group_by:
    - namespace
    matchers:
    - namespace="libor-alertmanager-testing"
    continue: true
    routes:
    - receiver: libor-alertmanager-testing/libor-alertmanager-testing-email-webhook/libor-alertmanager-testing-webhook
      matchers:
      - severity="critical"
      continue: false  <----
    - receiver: libor-alertmanager-testing/libor-alertmanager-testing-email-webhook/libor-alertmanager-testing-email
      matchers:
      - severity="critical"
      continue: false <-----
```
If I update the statements to read `continue: true` 
and test here: https://prometheus.io/webtools/alerting/routing-tree-editor/ 

then I get the desired results

workaround is to use 2 separate files - the continue statement is being added.

Actual results:

Once saved the continue statement is removed from the object.

Expected results:

continue true statement is retain and applied to alertmanager

Additional info:

https://github.com/openshift/prometheus-operator/pull/276

Bug OCPBUGS-23432: OCP installation its failing because VIP is not being allocated to the bootstrap node

View the Description View the linked PRs

Description of problem:

OCPv4.14.1 installation its failing because VIP is not being allocated to the bootstrap node

Version-Release number of selected component (if applicable):

OCPv4.14.1

How reproducible:

100% --> https://access.redhat.com/support/cases/#/case/03668010

Steps to Reproduce:

1.
2.
3.

Actual results:

https://access.redhat.com/support/cases/#/case/03668010/discussion?commentId=a0a6R00000Vmdf3QAB

Expected results:

OCP installation to end sucessfully

Additional info:

In the comment https://access.redhat.com/support/cases/#/case/03668010/discussion?commentId=a0a6R00000Vmdf3QAB are described the current state and issue. If additional logs are required I can arrange for this.

Bug OCPBUGS-44782: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2359

Bug OCPBUGS-30871: [release-4.15] no response when clicking on 'Configure' button for AlertmanagerReceiversNotConfigured alert

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30805~~. The following is the description of the original issue:
—
Description of problem:

nothing happens when user clicks on the 'Configure' button next to AlertmanagerReceiversNotConfigured alert

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-03-11-041450

How reproducible:

 Always

Steps to Reproduce:

1. navigate to Home -> Overview, locate the AlertmanagerReceiversNotConfigured alert in 'Status' card
2. click the 'Configure' button next to AlertmanagerReceiversNotConfigured alert

Actual results:

nothing happens

Expected results:

user should be taken to alert manager configuration page /monitoring/alertmanagerconfig

Additional info:

https://github.com/openshift/console/pull/13669

Bug OCPBUGS-35141: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8646

Bug OCPBUGS-38970: DeploymentConfigs deprecation info alert should not present on the Edit deployment page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36510~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-36424~~. The following is the description of the original issue:
—
Description of problem:

    DeploymentConfigs deprecation info alert is shows on the Edit deployment form. It should be shows on only deploymentConfigs pages.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create a deployment
    2. Open Edit deployment form from the actions menu
    3.

Actual results:

    DeploymentConfigs deprecation info alert present on the edit deployment form

Expected results:

    DeploymentConfigs deprecation info alert should not be shown for the Deployment

Additional info:

https://github.com/openshift/console/pull/14195

Bug OCPBUGS-19198: Update 4.15 ose-vmware-vsphere-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/170

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-21984: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/41

Bug OCPBUGS-22639: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/185

Bug OCPBUGS-24150: Update 4.15 ose-cluster-capi-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/147

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/147

Bug OCPBUGS-27852: ovnkube-controller bug: ovn service lb still has the endpoint when pod is in terminating state

View the Description View the linked PRs

Description of problem:

The users are experiencing an issue with NodePort traffic forwarding, where the TCP traffic continues to be directed to pods which are under terminating state, the connection cannot be created sucessfully, as per the customer mentioned this issue is causing the connection disruptions in the business transaction.

Version-Release number of selected component (if applicable):

On the OpenShift 4.12.13 with RHEL8.6 workers and OVN environment.

How reproducible:

here is the code found.
https://github.com/openshift/ovn-kubernetes/blob/dd3c7ed8c1f41873168d3df26084ecbfd3d9a36b/go-controller/pkg/util/kube.go#L360；
—
func IsEndpointServing(endpoint discovery.Endpoint) bool {
if endpoint.Conditions.Serving != nil

{ return *endpoint.Conditions.Serving }

else

{ return IsEndpointReady(endpoint) }

}

// IsEndpointValid takes as input an endpoint from an endpoint slice and a boolean that indicates whether to include
// all terminating endpoints, as per the PublishNotReadyAddresses feature in kubernetes service spec. It always returns true
// if includeTerminating is true and falls back to IsEndpointServing otherwise.
func IsEndpointValid(endpoint discovery.Endpoint, includeTerminating bool) bool

{ return includeTerminating || IsEndpointServing(endpoint) }

—

Look like 'IsEndpointValid' function will retrun serving=true endpoint, it not checking the ready=true endpoint
I see recently the code has been changed in this section(look up Ready=true is changed to Serving=true)?

[Check the "Serving" field for endpoints]
https://github.com/openshift/ovn-kubernetes/commit/aceef010daf0697fe81dba91a39ed0fdb6563dea#diff-daf9de695e0ff81f9173caf83cb88efa138e92a9b35439bd7044aa012ff931c0

https://github.com/openshift/ovn-kubernetes/blob/release-4.12/go-controller/pkg/util/kube.go#L326-L386
—
out.Port = *port.Port
for _, endpoint := range slice.Endpoints {
// Skip endpoint if it's not valid
if !IsEndpointValid(endpoint, includeTerminating)

{ klog.V(4).Infof("Slice endpoint not valid") continue }

for _, ip := range endpoint.Addresses {
klog.V(4).Infof("Adding slice %s endpoint: %v, port: %d", slice.Name, endpoint.Addresses, *port.Port)
ipStr := utilnet.ParseIPSloppy(ip).String()
switch slice.AddressType

{ case discovery.AddressTypeIPv4: v4ips.Insert(ipStr) case discovery.AddressTypeIPv6: v6ips.Insert(ipStr) default: klog.V(5).Infof("Skipping FQDN slice %s/%s", slice.Namespace, slice.Name) }

}
}
—

Steps to Reproduce:

Here is the customer's sample pods for you refering.
mbgateway-st-8576f6f6f8-5jc75 1/1 Running 0 104m 172.30.195.124 appn01-100.app.paas.example.com <none> <none>
mbgateway-st-8576f6f6f8-q8j6k 1/1 Running 0 5m51s 172.31.2.97 appn01-202.app.paas.example.com <none> <none>

pod yaml：
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 40
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 9190
timeoutSeconds: 5
name: mbgateway-st
ports:
- containerPort: 9190
protocol: TCP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 40
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 9190
timeoutSeconds: 5
resources:
limits:
cpu: "2"
ephemeral-storage: 10Gi
memory: 2G
requests:
cpu: 50m
ephemeral-storage: 100Mi
memory: 1111M

when delete pod Pod（mbgateway-st-8576f6f6f8-5jc75）, check the EndpointSlice status：
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:

addresses:
- 172.30.195.124
conditions:
ready: false
serving: true
terminating: true
nodeName: appn01-100.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-5jc75
namespace: lb59-10-st-unigateway
uid: 5e8a375d-ba56-4894-8034-0009d0ab8ebe
zone: AZ61QEBIZ_AZ61QEM02_FD3
addresses:
- 172.31.2.97
conditions:
ready: true
serving: true
terminating: false
nodeName: appn01-202.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-q8j6k
namespace: lb59-10-st-unigateway
uid: 5bd195b7-e342-4b34-b165-12988a48e445
zone: AZ61QEBIZ_AZ61QEM02_FD1

Wait for a little moment, try to check Ovn Service lb， it found the endpoints information doesn't update to the latest.
9349d703-1f28-41fe-b505-282e8abf4c40 Service_lb59-10- tcp 172.35.0.185:31693 172.30.195.124:9190,172.31.2.97:9190
dca65745-fac4-4e73-b412-2c7530cf4a91 Service_lb59-10- tcp 172.35.0.170:31693 172.30.195.124:9190,172.31.2.97:9190
a5a65766-b0f2-4ac6-8f7c-cdebeea303e3 Service_lb59-10- tcp 172.35.0.89:31693 172.30.195.124:9190,172.31.2.97:9190
a36517c5-ecaa-4a41-b686-37c202478b98 Service_lb59-10- tcp 172.35.0.213:31693 172.30.195.124:9190,172.31.2.97:9190
16d997d1-27f0-41a3-8a9f-c63c8872d7b8 Service_lb59-10- tcp 172.35.0.92:31693 172.30.195.124:9190,172.31.2.97:9190

Wait for a little moment,
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:

addresses:
- 172.30.195.124
conditions:
ready: false
serving: true
terminating: true
nodeName: appn01-100.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-5jc75
namespace: lb59-10-st-unigateway
uid: 5e8a375d-ba56-4894-8034-0009d0ab8ebe
zone: AZ61QEBIZ_AZ61QEM02_FD3
addresses:
- 172.31.2.97
conditions:
ready: true
serving: true
terminating: false
nodeName: appn01-202.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-q8j6k
namespace: lb59-10-st-unigateway
uid: 5bd195b7-e342-4b34-b165-12988a48e445
zone: AZ61QEBIZ_AZ61QEM02_FD1
addresses:
- 172.30.132.78
conditions:
ready: false
serving: false
terminating: false
nodeName: appn01-089.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-8lp4s
namespace: lb59-10-st-unigateway
uid: 755cbd49-792b-4527-b96a-087be2178e9d
zone: AZ61QEBIZ_AZ61QEM02_FD3

check Ovn Service lb， it found the Pod Endpoint information is still here：
fceeaf8f-e747-4290-864c-ba93fb565a8a Service_lb59-10- tcp 172.35.0.56:31693 172.30.132.78:9190,172.30.195.124:9190,172.31.2.97:9190
bef42efd-26db-4df3-b99d-370791988053 Service_lb59-10- tcp 172.35.1.26:31693 172.30.132.78:9190,172.30.195.124:9190,172.31.2.97:9190
84172e2c-081c-496a-afec-25ebcb83cc60 Service_lb59-10- tcp 172.35.0.118:31693 172.30.132.78:9190,172.30.195.124:9190,172.31.2.97:9190
34412ddd-ab5c-4b6b-95a3-6e718dd20a4f Service_lb59-10- tcp 172.35.1.14:31693 172.30.132.78:9190,172.30.195.124:9190,172.31.2.97:9190

Actual results:

Service LB endpoint determines on the POD.status.condition[type=Serving] status.

Expected results:

Service LB endpoint should determines on the POD.status.condition[type=Ready] status.

Additional info:

The ovn-controller determines whether an endpoint should be added to the Service Load Balancer (serviceLB) based on the condition.serving. The current issue is that when a pod is in the terminating state, the condition.serving remains true. Its state determines on the POD.status.condition[type=Ready] status is being true.

However when a pod is deleted, the endpointslice condition.serving state remains unchanged, and the backend pool of the service LB still includes the IP information of the deleted pod.Why doesn't ovn-controller use the condition.ready status to decide whether the pod's IP should be added to the service LB backend pool?

Could the shift-networking experts confirm whether this is the openshift ovn service lb bug or not?

https://github.com/openshift/ovn-kubernetes/pull/2025

Bug OCPBUGS-24012: Tuned Node Profile takes up to 30 minutes post OpenShift Container Platform 4 - Node creation before it's being created

View the Description View the linked PRs

It was found that OpenShift Container Platform 4 - Node(s) are missing certain settings applied via tuned and when starting to investigate the problem it was found that it takes up to 30 minutes or more for the tuned profiles of this newly added OpenShift Container Platform 4 - Node for being created.

When increasing the log level of cluster-node-tuning-operator pod we can see the following events being recorded.

I1128 13:05:12.465193       1 controller.go:1121] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (add)
I1128 13:05:12.465235       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:12.465247       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:12.465255       1 profilecalculator.go:131] Node's new-worker-X.example.com providerID=aws:///eu-central-1c/i-0874090641dd61eef
I1128 13:05:12.465268       1 controller.go:300] sync(): Node new-worker-X.example.com label(s) changed
I1128 13:05:12.465288       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:12.486200       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:12.486233       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:12.486242       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:12.486256       1 controller.go:300] sync(): Node new-worker-X.example.com label(s) changed
I1128 13:05:12.486273       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:12.612063       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:12.612114       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:12.612127       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:12.612149       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:15.232435       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:15.232477       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:15.232541       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:15.232565       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:22.805108       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:22.805142       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:22.805151       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:22.805170       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:30.803481       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:30.803511       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:30.803519       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:30.803533       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:35.815894       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:35.815933       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:35.815942       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:35.815958       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:35.832338       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:35.832386       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:35.832395       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:35.832419       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:35.851291       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:35.851337       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:35.851349       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:35.851369       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:40.855159       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:40.855192       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:40.855201       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:40.855221       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:48.004741       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:48.004783       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:48.004815       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:48.004835       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:48.011986       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:48.012035       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:48.012047       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:48.012067       1 controller.go:300] sync(): Node new-worker-X.example.com label(s) changed
I1128 13:05:48.012090       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:53.475798       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:53.475842       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:53.475855       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:53.475876       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:56.097269       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:56.097299       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:56.097309       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:56.097329       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:05:58.497782       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:05:58.497838       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:05:58.497847       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:05:58.497864       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:06.117201       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:06.117235       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:06.117254       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:06.117271       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:08.008992       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:08.009031       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:08.009041       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:08.009059       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:09.685949       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:09.685988       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:09.685997       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:09.686015       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:11.163882       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:11.163929       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:11.163941       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:11.163965       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:19.730972       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:19.731005       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:19.731013       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:19.731028       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:06:23.713627       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:06:23.713665       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:06:23.713675       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:06:23.713693       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:07:52.133190       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:07:52.133227       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:07:52.133235       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:07:52.133268       1 controller.go:300] sync(): Node new-worker-X.example.com label(s) changed
I1128 13:07:52.133285       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:07:55.779247       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:07:55.779278       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:07:55.779286       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:07:55.779324       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:07:55.799941       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:07:55.799975       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:07:55.799983       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:07:55.800021       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:07:56.062048       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:07:56.062081       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:07:56.062089       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:07:56.062126       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:09:58.224261       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:09:58.224294       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:09:58.224303       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:09:58.224333       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:10:08.146467       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:10:08.146504       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:10:08.146513       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:10:08.146549       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:10:29.293368       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:10:29.293402       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:10:29.293410       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:10:29.293440       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:11:38.765691       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:11:38.781424       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:11:38.781432       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:11:38.781471       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:15:35.022263       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:15:35.022303       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:15:35.022312       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:15:35.022349       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:20:41.252897       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:20:41.252942       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:20:41.252951       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:20:41.252988       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:21:38.768157       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:21:38.781098       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:21:38.781103       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:21:38.781133       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:25:47.684402       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:25:47.684445       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:25:47.684457       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:25:47.684494       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:25:53.336668       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:25:53.336700       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:25:53.336709       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:25:53.336738       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:25:57.754420       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:25:57.754453       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:25:57.754462       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:25:57.754491       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:26:03.987123       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:26:03.987188       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:26:03.987203       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:26:03.987258       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:26:38.231524       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:26:38.231558       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:26:38.231566       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:26:38.231602       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:27:08.845310       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:27:08.845349       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:27:08.845358       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:27:08.845398       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:27:49.797881       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:27:49.797919       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:27:49.797928       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:27:49.797958       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:27:49.856526       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:27:49.856566       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:27:49.856575       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:27:49.856612       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:27:49.904286       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:27:49.904341       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:27:49.904350       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:27:49.904400       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:30:02.351363       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:30:02.351398       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:30:02.351407       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:30:02.351440       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:30:03.719303       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:30:03.719338       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:30:03.719347       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:30:03.719380       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:30:33.316267       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:30:33.316297       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:30:33.316307       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:30:33.316336       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:30:33.330998       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:30:33.331030       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:30:33.331038       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:30:33.331066       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:31:31.688121       1 controller.go:221] sync(): Kind profile: openshift-cluster-node-tuning-operator/new-worker-X.example.com
I1128 13:31:31.688136       1 controller.go:374] sync(): Profile new-worker-X.example.com
I1128 13:31:31.688300       1 profilecalculator.go:164] calculateProfile(new-worker-X.example.com)
I1128 13:31:31.688337       1 controller.go:677] syncProfile(): Profile new-worker-X.example.com not found, creating one [openshift-node]
I1128 13:31:31.688396       1 request.go:1073] Request Body: {"kind":"Profile","apiVersion":"tuned.openshift.io/v1","metadata":{"name":"new-worker-X.example.com","namespace":"openshift-cluster-node-tuning-operator","creationTimestamp":null,"ownerReferences":[{"apiVersion":"tuned.openshift.io/v1","kind":"Tuned","name":"default","uid":"324f82ad-4475-4b49-ac29-57cb454314e7","controller":true,"blockOwnerDeletion":true}]},"spec":{"config":{"tunedProfile":"openshift-node","debug":false,"tunedConfig":{"reapply_sysctl":null}}},"status":{"bootcmdline":"","tunedProfile":"","conditions":[{"type":"Applied","status":"Unknown","lastTransitionTime":"2023-11-28T13:31:31Z"},{"type":"Degraded","status":"Unknown","lastTransitionTime":"2023-11-28T13:31:31Z"}]}}
I1128 13:31:31.698807       1 request.go:1073] Response Body: {"apiVersion":"tuned.openshift.io/v1","kind":"Profile","metadata":{"creationTimestamp":"2023-11-28T13:31:31Z","generation":1,"managedFields":[{"apiVersion":"tuned.openshift.io/v1","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"324f82ad-4475-4b49-ac29-57cb454314e7\"}":{}}},"f:spec":{".":{},"f:config":{".":{},"f:debug":{},"f:tunedConfig":{},"f:tunedProfile":{}}}},"manager":"cluster-node-tuning-operator","operation":"Update","time":"2023-11-28T13:31:31Z"}],"name":"new-worker-X.example.com","namespace":"openshift-cluster-node-tuning-operator","ownerReferences":[{"apiVersion":"tuned.openshift.io/v1","blockOwnerDeletion":true,"controller":true,"kind":"Tuned","name":"default","uid":"324f82ad-4475-4b49-ac29-57cb454314e7"}],"resourceVersion":"9673729653","uid":"8607cf52-9a00-49d2-baff-8a97c73b809a"},"spec":{"config":{"debug":false,"tunedConfig":{},"tunedProfile":"openshift-node"}}}
I1128 13:31:31.698915       1 controller.go:687] created profile new-worker-X.example.com [openshift-node]
I1128 13:31:31.698925       1 controller.go:209] event from workqueue (profile/openshift-cluster-node-tuning-operator/new-worker-X.example.com) successfully processed
I1128 13:31:31.702309       1 controller.go:1121] add event to workqueue due to *v1.Profile, Namespace=openshift-cluster-node-tuning-operator, Name=new-worker-X.example.com (add)
I1128 13:31:31.702335       1 controller.go:221] sync(): Kind profile: openshift-cluster-node-tuning-operator/new-worker-X.example.com
I1128 13:31:31.702358       1 controller.go:374] sync(): Profile new-worker-X.example.com
I1128 13:31:31.702494       1 profilecalculator.go:164] calculateProfile(new-worker-X.example.com)
I1128 13:31:31.713444       1 controller.go:752] syncProfile(): updating Profile new-worker-X.example.com [openshift-node]
I1128 13:31:31.713543       1 request.go:1073] Request Body: {"kind":"Profile","apiVersion":"tuned.openshift.io/v1","metadata":{"name":"new-worker-X.example.com","namespace":"openshift-cluster-node-tuning-operator","uid":"8607cf52-9a00-49d2-baff-8a97c73b809a","resourceVersion":"9673729653","generation":1,"creationTimestamp":"2023-11-28T13:31:31Z","ownerReferences":[{"apiVersion":"tuned.openshift.io/v1","kind":"Tuned","name":"default","uid":"324f82ad-4475-4b49-ac29-57cb454314e7","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"cluster-node-tuning-operator","operation":"Update","apiVersion":"tuned.openshift.io/v1","time":"2023-11-28T13:31:31Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"324f82ad-4475-4b49-ac29-57cb454314e7\"}":{}}},"f:spec":{".":{},"f:config":{".":{},"f:debug":{},"f:tunedConfig":{},"f:tunedProfile":{}}}}}]},"spec":{"config":{"tunedProfile":"openshift-node","debug":false,"tunedConfig":{"reapply_sysctl":null},"providerName":"aws"}},"status":{"bootcmdline":"","tunedProfile":"","conditions":[{"type":"Applied","status":"Unknown","lastTransitionTime":"2023-11-28T13:31:31Z"},{"type":"Degraded","status":"Unknown","lastTransitionTime":"2023-11-28T13:31:31Z"}]}}
I1128 13:31:31.713611       1 round_trippers.go:466] curl -v -XPUT  -H "User-Agent: cluster-node-tuning-operator/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Accept: application/json, */*" -H "Authorization: Bearer <masked>" -H "Content-Type: application/json" 'https://172.16.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles/new-worker-X.example.com'
I1128 13:31:31.720708       1 round_trippers.go:553] PUT https://172.16.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles/new-worker-X.example.com 200 OK in 7 milliseconds
I1128 13:31:31.720855       1 request.go:1073] Response Body: {"apiVersion":"tuned.openshift.io/v1","kind":"Profile","metadata":{"creationTimestamp":"2023-11-28T13:31:31Z","generation":2,"managedFields":[{"apiVersion":"tuned.openshift.io/v1","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"324f82ad-4475-4b49-ac29-57cb454314e7\"}":{}}},"f:spec":{".":{},"f:config":{".":{},"f:debug":{},"f:providerName":{},"f:tunedConfig":{},"f:tunedProfile":{}}}},"manager":"cluster-node-tuning-operator","operation":"Update","time":"2023-11-28T13:31:31Z"}],"name":"new-worker-X.example.com","namespace":"openshift-cluster-node-tuning-operator","ownerReferences":[{"apiVersion":"tuned.openshift.io/v1","blockOwnerDeletion":true,"controller":true,"kind":"Tuned","name":"default","uid":"324f82ad-4475-4b49-ac29-57cb454314e7"}],"resourceVersion":"9673729659","uid":"8607cf52-9a00-49d2-baff-8a97c73b809a"},"spec":{"config":{"debug":false,"providerName":"aws","tunedConfig":{},"tunedProfile":"openshift-node"}}}
I1128 13:31:31.720946       1 controller.go:757] updated profile new-worker-X.example.com [openshift-node]
I1128 13:31:31.720955       1 controller.go:209] event from workqueue (profile/openshift-cluster-node-tuning-operator/new-worker-X.example.com) successfully processed
I1128 13:31:31.721160       1 controller.go:1136] add event to workqueue due to *v1.Profile, Namespace=openshift-cluster-node-tuning-operator, Name=new-worker-X.example.com (update)
I1128 13:31:31.724833       1 controller.go:221] sync(): Kind profile: openshift-cluster-node-tuning-operator/new-worker-X.example.com
I1128 13:31:31.724847       1 controller.go:374] sync(): Profile new-worker-X.example.com
I1128 13:31:31.724971       1 profilecalculator.go:164] calculateProfile(new-worker-X.example.com)
I1128 13:31:31.726987       1 controller.go:742] syncProfile(): no need to update Profile new-worker-X.example.com
I1128 13:31:31.726993       1 controller.go:209] event from workqueue (profile/openshift-cluster-node-tuning-operator/new-worker-X.example.com) successfully processed
I1128 13:31:32.273200       1 controller.go:1136] add event to workqueue due to *v1.Profile, Namespace=openshift-cluster-node-tuning-operator, Name=new-worker-X.example.com (update)
I1128 13:31:32.273234       1 controller.go:221] sync(): Kind profile: openshift-cluster-node-tuning-operator/new-worker-X.example.com
I1128 13:31:32.273246       1 controller.go:374] sync(): Profile new-worker-X.example.com
I1128 13:31:32.273410       1 profilecalculator.go:164] calculateProfile(new-worker-X.example.com)
I1128 13:31:32.284388       1 controller.go:742] syncProfile(): no need to update Profile new-worker-X.example.com
I1128 13:31:32.284400       1 controller.go:209] event from workqueue (profile/openshift-cluster-node-tuning-operator/new-worker-X.example.com) successfully processed
I1128 13:31:38.766803       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:31:38.769582       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:31:38.769588       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:31:38.769617       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed
I1128 13:35:39.839137       1 controller.go:1136] add event to workqueue due to *v1.Node, Name=new-worker-X.example.com (update)
I1128 13:35:39.839174       1 controller.go:221] sync(): Kind node: /new-worker-X.example.com
I1128 13:35:39.839182       1 controller.go:282] sync(): Node new-worker-X.example.com
I1128 13:35:39.839215       1 controller.go:209] event from workqueue (node//new-worker-X.example.com) successfully processed

So at 13:05:12 the OpenShift Container Platform 4 - Node called `new-worker-X.example.com` would indeed become available but it still took until 13:31:31 until the tuned profile was created and therefore required settings on the OpenShift Container Platform 4 - Node are being applied.

https://github.com/openshift/machine-config-operator/pull/4048

Bug OCPBUGS-27436: It should deny creating an ImageDigestMirrorSet with conflicting mirrorSourcePolicy

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27190~~. The following is the description of the original issue:
—
Description of problem:

When creating an ImageDigestMirrorSet with conflicting mirrorSourcePolicy, it didn't prompt error.

Version-Release number of selected component (if applicable):

% oc get clusterversion 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-01-14-100410   True        False         27m     Cluster version is 4.15.0-0.nightly-2024-01-14-100410

How reproducible:

always

Steps to Reproduce:

1. create an ImageContentSourcePolicy 

ImageContentSourcePolicy.yaml:
apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: ubi8repo
spec:
  repositoryDigestMirrors:
  - mirrors:
    - example.io/example/ubi-minimal
    - example.com/example/ubi-minimal
    source: registry.access.redhat.com/ubi6/ubi-minimal
  - mirrors:
    - mirror.example.net
    source: registry.example.com/example

2.After the mcp finish updating, check the /etc/containers/registries.conf update as expected

3.create an ImageDigestMirrorSet with conflicting mirrorSourcePolicy for the same source "registry.example.com/example"

ImageDigestMirrorSet-conflict.yaml: 
apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  name: digest-mirror
spec:
  imageDigestMirrors:
  - mirrors:
    - example.io/example/ubi-minimal
    - example.com/example/ubi-minimal
    source: registry.access.redhat.com/ubi8/ubi-minimal
    mirrorSourcePolicy: AllowContactingSource
  - mirrors:
    - mirror.example.net
    source: registry.example.com/example
    mirrorSourcePolicy: NeverContactSource

Actual results:

3. create successfully, but the mcp didn't get updated and no relevant mc generated.

The machine-config-controller log showed:
I0116 02:34:03.897335       1 container_runtime_config_controller.go:417] Error syncing image config openshift-config: could not Create/Update MachineConfig: could not update registries config with new changes: conflicting mirrorSourcePolicy is set for the same source "registry.example.com/example" in imagedigestmirrorsets and/or imagetagmirrorsets

Expected results:

3. it should prompt: there exist conflicting mirrorSourcePolicy for the same source "registry.example.com/example" in ICSP

Additional info:

https://github.com/openshift/machine-config-operator/pull/4431

Bug OCPBUGS-32024: [4.15] Logs of runtimecfg node-ip detection too verbose

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29919~~. The following is the description of the original issue:
—
Description of problem:

Pods running in the namespace openshift-vsphere-infra are so much verbose printing as INFO messages that should debug.

This excesse of verbosity has an impact in CRIO, in the node and also in the Logging system. 

For instance, having 71 nodes, the number of logs coming from this namespace in 1 month was: 450.000.000 meaning 1TB of logs written to disk on the node by CRIO, reading but the Red Hat log collector and stored in the Log Store.

Added to the impact on the performance, it have a financial impact for the storage needed.

Examples of logs are that adjust better to DEBUG and not as INFO:
```
/// For keep-alive pods are printed 4 messages per node each 10 seconds per node, in this example, the number of nodes is 71, then, this means 284 log entries per second, then 1704 log entries by minute and keepalive pod
$ oc logs keepalived-master.example-0 -c  keepalived-monitor |grep master.example-0|grep 2024-02-15T08:20:21 |wc -l

$ oc logs keepalived-master-example-0 -c  keepalived-monitor |grep worker-example-0|grep 2024-02-15T08:20:21 
2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"
2024-02-15T08:20:21.733399279Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
2024-02-15T08:20:21.733421398Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"

/// For haproxy logs observed 2 logs printed per 6 seconds for each master, this means 6 messages in the same second, 60 messages/minute per pod
$ oc logs haproxy-master-0-example -c haproxy-monitor
...
2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="Searching for Node IP of master-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x]'."
2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="For node master-example-0 selected peer address x.x.x.x using NodeInternalIP"

Version-Release number of selected component (if applicable):

OpenShift 4.14
VSphere IPI installation

How reproducible:

Always

Steps to Reproduce:

    1. Install OpenShift 4.14 Vsphere IPI environment
    2. Review the logs of the haproxy pods and keealived pods running in the namespace `openshift-vsphere-infra`

Actual results:

The pods haproxy-* and keepalived-* pods being so much verbose printing as INFO messages should be as DEBUG. 

Some of the messages are available in the Description of the problem in the present bug.

Expected results:

Printed as INFO only relevant messages helping to reduce the verbosity of the pods running in the namespace  `openshift-vsphere-infra`

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/304

Bug OCPBUGS-42753: ovnkube-node hostPath mount of /var/lib/kubelet is missing HostToContainer mountPropagation, breaks CSI driver

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36594~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-30950~~. The following is the description of the original issue:
—
Description of problem: ovnkube-node and multus DaemonSets have hostPath volumes which prevent clean unmount of CSI Volumes because of missing "mountPropagation: HostToContainer" parameter in volumeMount

Version-Release number of selected component (if applicable): OpenShift 4.14

How reproducible: Always

Steps to Reproduce:

1. on a node mount a file system underneath /var/lib/kubelet/ simulating the mount of a CSI driver PersistentVolume

2. restart the ovnkube-node pod running on that node

3. unmount the filesystem from 1. The mount will then be removed from the host list of mounted devices however a copy of the mount is still active in the mount namespace of the ovnkube-node pod.
This is blocking some CSI drivers relying on multipath to properly delete a block device, since mounts are still registered on the block device.

Actual results:
CSI Volume Mount cleanly unmounted.

Expected results:
CSI Volume Mount uncleanly unmounted.

Additional info:

The mountPropagation parameter is already implememted in the volumeMount for the host rootFS:

- name: host-slash
readOnly: true
mountPath: /host
mountPropagation: HostToContainer

However the same parameter is missing for the volumeMount of /var/lib/kubelet

It is possible to workaround the issue with a kubectl patch command like this:

$ kubectl patch daemonset ovnkube-node --type='json' -p='[
{
"op": "replace",
"path": "/spec/template/spec/containers/7/volumeMounts/1",
"value": {
"name": "host-kubelet",
"mountPath": "/var/lib/kubelet",
"mountPropagation": "HostToContainer",
"readOnly": true
}
}
]'

Affected Platforms: Platform Agnostic UPI

https://github.com/openshift/cluster-network-operator/pull/2520

Bug OCPBUGS-7465: oc-mirror will hit 401 code after hang a while

View the Description View the linked PRs

Description of problem:

When use the command `oc-mirror list operators --catalog=registry.redhat.io/redhat/certified-operator-index:v4.12 -v 9` , at begging the response code is 200 okay , when the command will hang for a while , then will got response code 401.

Version-Release number of selected component (if applicable):

How reproducible:

sometimes

Steps to Reproduce:

Using the advanced cluster management package as an example.

1. oc-mirror list operators --catalog=registry.redhat.io/redhat/certified-operator-index:v4.12 -v 9

Actual results: After hang a while , will got 401 code , seems when timeout the oc-mirror try again forgot to read the credentials

level=debug msg=fetch response received digest=sha256:a67257cfe913ad09242bf98c44f2330ec7e8261ca3a8db3431cb88158c3d4837 mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.age=714959 response.header.connection=keep-alive response.header.content-length=80847073 response.header.content-type=binary/octet-stream response.header.date=Mon, 06 Feb 2023 06:52:06 GMT response.header.etag="a428fafd37ee58f4bdeae1a7ff7235b5-1" response.header.last-modified=Fri, 16 Sep 2022 17:54:09 GMT response.header.server=AmazonS3 response.header.via=1.1 010c0731b9775a983eceaec0f5fa6a2e.cloudfront.net (CloudFront) response.header.x-amz-cf-id=rEfKWnJdasWIKnjWhYyqFn9eHY8v_3Y9WwSRnnkMTkPayHlBxWX1EQ== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-storage-class=INTELLIGENT_TIERING response.header.x-amz-version-id=GfqTTjWbdqB0sreyjv3fyo1k6LQ9kZKC response.header.x-cache=Hit from cloudfront response.status=200 OK size=80847073 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:a67257cfe913ad09242bf98c44f2330ec7e8261ca3a8db3431cb88158c3d4837
level=debug msg=fetch response received digest=sha256:d242c7b4380d3c9db3ac75680c35f5c23639a388ad9313f263d13af39a9c8b8b mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.age=595868 response.header.connection=keep-alive response.header.content-length=98028196 response.header.content-type=binary/octet-stream response.header.date=Tue, 07 Feb 2023 15:56:56 GMT response.header.etag="f702c84459b479088565e4048a890617-1" response.header.last-modified=Wed, 18 Jan 2023 06:55:12 GMT response.header.server=AmazonS3 response.header.via=1.1 7f5e0d3b9ea85d0d75063a66c0ebc840.cloudfront.net (CloudFront) response.header.x-amz-cf-id=Tw9cjJjYCy8idBiQ1PvljDkhAoEDEzuDCNnX6xJub4hGeh8V0CIP_A== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-storage-class=INTELLIGENT_TIERING response.header.x-amz-version-id=nt7yY.YmjWF0pfAhzh_fH2xI_563GnPz response.header.x-cache=Hit from cloudfront response.status=200 OK size=98028196 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:d242c7b4380d3c9db3ac75680c35f5c23639a388ad9313f263d13af39a9c8b8b
level=debug msg=fetch response received digest=sha256:664a8226a152ea0f1078a417f2ec72d3a8f9971e8a374859b486b60049af9f18 mediatype=application/vnd.docker.container.image.v1+json response.header.accept-ranges=bytes response.header.age=17430 response.header.connection=keep-alive response.header.content-length=24828 response.header.content-type=binary/octet-stream response.header.date=Tue, 14 Feb 2023 08:37:35 GMT response.header.etag="57eb6fdca8ce82a837bdc2cebadc3c7b-1" response.header.last-modified=Mon, 13 Feb 2023 16:11:57 GMT response.header.server=AmazonS3 response.header.via=1.1 0c96ded7ff282d2dbcf47c918b6bb500.cloudfront.net (CloudFront) response.header.x-amz-cf-id=w9zLDWvPJ__xbTpI8ba5r9DRsFXbvZ9rSx5iksG7lFAjWIthuokOsA== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-version-id=Enw8mLebn4.ShSajtLqdo4riTDHnVEFZ response.header.x-cache=Hit from cloudfront response.status=200 OK size=24828 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:664a8226a152ea0f1078a417f2ec72d3a8f9971e8a374859b486b60049af9f18
level=debug msg=fetch response received digest=sha256:130c9d0ca92e54f59b68c4debc5b463674ff9555be1f319f81ca2f23e22de16f mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.age=829779 response.header.connection=keep-alive response.header.content-length=26039246 response.header.content-type=binary/octet-stream response.header.date=Sat, 04 Feb 2023 22:58:25 GMT response.header.etag="a08688b701b31515c6861c69e4d87ebd-1" response.header.last-modified=Tue, 06 Dec 2022 20:50:51 GMT response.header.server=AmazonS3 response.header.via=1.1 000f4a2f631bace380a0afa747a82482.cloudfront.net (CloudFront) response.header.x-amz-cf-id=S-h31zheAEOhOs6uH52Rpq0ZnoRRdd5VfaqVbZWXzAX-Zym-0XtuKA== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-storage-class=INTELLIGENT_TIERING response.header.x-amz-version-id=BQOjon.COXTTON_j20wZbWWoDEmGy1__ response.header.x-cache=Hit from cloudfront response.status=200 OK size=26039246 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:130c9d0ca92e54f59b68c4debc5b463674ff9555be1f319f81ca2f23e22de16f




level=debug msg=do request digest=sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9 mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip request.header.accept=application/vnd.docker.image.rootfs.diff.tar.gzip, */* request.header.range=bytes=13417268- request.header.user-agent=opm/alpha request.method=GET size=91700480 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9
level=debug msg=fetch response received digest=sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9 mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.cache-control=max-age=0, no-cache, no-store response.header.connection=keep-alive response.header.content-length=99 response.header.content-type=application/json response.header.date=Tue, 14 Feb 2023 13:34:06 GMT response.header.docker-distribution-api-version=registry/2.0 response.header.expires=Tue, 14 Feb 2023 13:34:06 GMT response.header.pragma=no-cache response.header.registry-proxy-request-id=0d7ea55f-e96d-4311-885a-125b32c8e965 response.header.www-authenticate=Bearer realm="https://registry.redhat.io/auth/realms/rhcc/protocol/redhat-docker-v2/auth",service="docker-registry",scope="repository:redhat/certified-operator-index:pull" response.status=401 Unauthorized size=91700480 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9.

Expected results:

Should always read the credentials for the command .

Bug OCPBUGS-17287: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/132

Bug OCPBUGS-24632: The configuration values of the CPMS generated by installer on vSphere is not the same with the master machines

View the Description View the linked PRs

Description of problem:

The configuration values of the CPMS generated by installer on vSphere is not the same with the configuration values of the master machines, although it doesn’t trigger update when installing the cluster with TechPreview, but it’s confusing for users. 

Another, if installing a cluster without TechPreview, then enable TechPreview for day2 operation, the CPMS is inactive by default, then only active CPMS and do not change other configuration values, it will trigger update, this is not as expected.

Also, on other providers(AWS, GCP, Azure, Nutanix), the configuration values of the CPMS generated by installer is the same with the configuration values of the master machines.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-04-223539

How reproducible:

Always

Steps to Reproduce:

1.Create a cluster on vSphere with TechPreview, we use flexy-template: ipi-on-vsphere/versioned-installer_techpreview, there is CPMS by default and it’s Active.  
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2023-12-04-223539   True        False         6m1s    Cluster version is 4.15.0-0.nightly-2023-12-04-223539
liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
Already on project "openshift-machine-api" on server "https://api.huliu-vs07b.qe.devcluster.openshift.com:6443".
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs07b-pkm69-master-0         Running                          27m
huliu-vs07b-pkm69-master-1         Running                          27m
huliu-vs07b-pkm69-master-2         Running                          27m
huliu-vs07b-pkm69-worker-0-5cgd9   Running                          21m
huliu-vs07b-pkm69-worker-0-ql9zv   Running                          21m
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE    AGE
cluster   3         3         3       3                       Active   27m   

2.Check the configuration values in CPMS is not the same with the master machines. But they should be the same.
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
…
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices: null
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: ""
            userDataSecret:
              name: master-user-data
            workspace: {}
…
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-vs07b-pkm69-master-2  -oyaml
…
  providerSpec:
    value:
      apiVersion: machine.openshift.io/v1beta1
      credentialsSecret:
        name: vsphere-cloud-credentials
      diskGiB: 120
      kind: VSphereMachineProviderSpec
      memoryMiB: 16384
      metadata:
        creationTimestamp: null
      network:
        devices:
        - networkName: devqe-segment-221
      numCPUs: 4
      numCoresPerSocket: 4
      snapshot: ""
      template: huliu-vs07b-pkm69-rhcos-generated-region-generated-zone
      userDataSecret:
        name: master-user-data
      workspace:
        datacenter: DEVQEdatacenter
        datastore: /DEVQEdatacenter/datastore/vsanDatastore
        folder: /DEVQEdatacenter/vm/huliu-vs07b-pkm69
        resourcePool: /DEVQEdatacenter/host/DEVQEcluster//Resources
        server: vcenter.devqe.ibmc.devcluster.openshift.com
…

Must-gather: https://drive.google.com/file/d/1KC4fwvQudRRebi9DyNOVtRT2AmenL5ek/view?usp=sharing 

3.Install a cluster on vSphere without TechPreview, we use flexy-template: ipi-on-vsphere/versioned-installer, there is no CPMS by default. 
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion                                                      
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2023-12-04-223539   True        False         16m     Cluster version is 4.15.0-0.nightly-2023-12-04-223539
liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.huliu-vs07c.qe.devcluster.openshift.com:6443".
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs07c-p6258-master-0         Running                          41m
huliu-vs07c-p6258-master-1         Running                          41m
huliu-vs07c-p6258-master-2         Running                          41m
huliu-vs07c-p6258-worker-0-78zxg   Running                          36m
huliu-vs07c-p6258-worker-0-tv2rw   Running                          36m
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
No resources found in openshift-machine-api namespace.

4.Enable TechPreview, there is CPMS now, and it’s Inactive
liuhuali@Lius-MacBook-Pro huali-test % oc edit featuregate                                            
featuregate.config.openshift.io/cluster edited
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE      AGE
cluster   3         3         3                               Inactive   13m
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
…
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices: null
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: ""
            userDataSecret:
              name: master-user-data
            workspace: {}

5.Edit the CPMS, only change Inactive to Active. It triggers update, but it shouldn’t, because I didn’t change any configuration values.

liuhuali@Lius-MacBook-Pro huali-test % oc edit controlplanemachineset                             
controlplanemachineset.machine.openshift.io/cluster edited
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE          TYPE   REGION   ZONE   AGE
huliu-vs07c-p6258-master-0         Running                               59m
huliu-vs07c-p6258-master-1         Running                               59m
huliu-vs07c-p6258-master-2         Running                               59m
huliu-vs07c-p6258-master-ccgth-0   Provisioning                          8s
huliu-vs07c-p6258-worker-0-78zxg   Running                               54m
huliu-vs07c-p6258-worker-0-tv2rw   Running                               54m
liuhuali@Lius-MacBook-Pro huali-test % oc logs control-plane-machine-set-operator-85595cdfdf-zh9bk
…
I1207 08:59:32.993680       1 updates.go:473]  "msg"="Machine requires an update" "controller"="controlplanemachineset" "diff"=["Template: /DEVQEdatacenter/vm/huliu-vs07c-p6258/huliu-vs07c-p6258-rhcos-generated-region-generated-zone != huliu-vs07c-p6258-rhcos-generated-region-generated-zone"] "index"=2 "name"="huliu-vs07c-p6258-master-2" "namespace"="openshift-machine-api" "reconcileID"="da8e7371-8378-42df-8a41-bc3f4198fb20" "updateStrategy"="RollingUpdate"


Must gather: https://drive.google.com/file/d/1LL_anFTNsH5O4cIJSUzioyHphS_avyTP/view?usp=sharing

Actual results:

The configuration values of the CPMS generated by installer on vSphere is not the same with the configuration values of the master machines

Expected results:

The configuration values of the CPMS generated by installer on vSphere should be the same with the configuration values of the master machines

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/317

Bug OCPBUGS-25690: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/1008

Bug OCPBUGS-37622: Bump to kubernetes 1.28.12

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.12:

Changelog:
v1.28.12: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v12811

https://github.com/openshift/kubernetes/pull/2037

Bug OCPBUGS-19708: MCO does not create duplicated kernel arguments

View the Description View the linked PRs

Description of problem:

When we create a MC that declares the same kernel argument twice, MCO is adding it only once.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2023-09-22-181920   True        False         5h18m   Cluster version is 4.12.0-0.nightly-2023-09-22-181920

We have seen this behavior in 4.15 too 4.15.0-0.nightly-2023-09-22-224720

How reproducible:

Always

Steps to Reproduce:

1. Create a MC that declares 2 kernel arguments with the same value (z=4 is duplicated)

 apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-kernel-arguments-32-zparam
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
    - y=0
    - z=4
    - y=1
    - z=4

Actual results:

We get the following parameters

$ oc debug -q node/sergio-v12-9vwrc-worker-c-tpbvh.c.openshift-qe.internal  -- chroot /host cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-a594b3a14778ce39f2b42ddb90e933c1971268a746ef1678a3c6eedee5a21b00/vmlinuz-4.18.0-372.73.1.el8_6.x86_64 ostree=/ostree/boot.0/rhcos/a594b3a14778ce39f2b42ddb90e933c1971268a746ef1678a3c6eedee5a21b00/0 ignition.platform.id=gcp console=ttyS0,115200n8 root=UUID=e101e976-e029-411d-ad71-6856f3838c4f rw rootflags=prjquota boot=UUID=75598fe5-c10d-4e95-9747-1708d9fe6a10 console=tty0 y=0 z=4 y=1

There is only one "z=4" parameter. We should see "y=0 z=4 y=1 z=4" instead of "y=0 z=4 y=1"

Expected results:

In older versions we can see that the duplicated parameters are created

For example, this is the output in a IPI on AWS 4.9 cluster

$ oc debug -q node/ip-10-0-189-69.us-east-2.compute.internal -- chroot /host cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-e1eeff6ec1b9b70a3554779947906f4a7fb93e0d79fbefcb045da550b7d9227f/vmlinuz-4.18.0-305.97.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.1/rhcos/e1eeff6ec1b9b70a3554779947906f4a7fb93e0d79fbefcb045da550b7d9227f/0 ignition.platform.id=aws root=UUID=ed307195-b5a9-4160-8a7a-df42aa734c28 rw rootflags=prjquota y=0 z=4 y=1 z=4


All the parameters are created, including the duplicated "z=4".

Additional info:

https://github.com/openshift/machine-config-operator/pull/3947

Bug OCPBUGS-20525: Masters are not attached with the provided custom security groups which defined in platform.aws.defaultMachinePlatform

View the Description View the linked PRs

Description of problem:

Set custom security group IDs in the installconfig.platform.aws.defaultMachinePlatform.additionalSecurityGroupIDs field of install-config.yaml

such as: 

   apiVersion: v1
   controlPlane:
     architecture: amd64
     hyperthreading: Enabled
     name: master
     platform: {}
     replicas: 3
   compute:
   - architecture: amd64
     hyperthreading: Enabled
     name: worker
     platform: {}
     replicas: 3
   metadata:
     name: gpei-test1013
   platform:
     aws:
       region: us-east-2
       subnets:
       - subnet-0bc86b64e7736479c
       - subnet-0addd33c410b52251
       - subnet-093392f94a4099566
       - subnet-0b915a53042b6dc61
       defaultMachinePlatform:
         additionalSecurityGroupIDs:
         - sg-0fbc4c9733e6c18e7
         - sg-0b46b502b575d30ba
         - sg-02a59f8662d10c6d3


After installation, check the Security Groups attached to master and worker, master doesn't have the specified custom security groups attached while workers have. 

For one of the masters:
[root@preserve-gpei-worker k_files]# aws ec2 describe-instances --instance-ids i-08c0b0b6e4308be3b  --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "terraform-20231013000602175000000002",
                "GroupId": "sg-04b104d07075afe96"
            }
        ]
    ]
]

For one of the workers:
[root@preserve-gpei-worker k_files]# aws ec2 describe-instances --instance-ids i-00643f07748ec75da --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "test-sg2",
                "GroupId": "sg-0b46b502b575d30ba"
            },
            {
                "GroupName": "terraform-20231013000602174300000001",
                "GroupId": "sg-0d7cd50d4cb42e513"
            },
            {
                "GroupName": "test-sg3",
                "GroupId": "sg-02a59f8662d10c6d3"
            },
            {
                "GroupName": "test-sg1",
                "GroupId": "sg-0fbc4c9733e6c18e7"
            }
        ]
    ]
]


Also checked the master's controlplanemachineset, it does have the custom security groups configured, but they're not attached to the master instance in the end.

[root@preserve-gpei-worker k_files]# oc get controlplanemachineset -n openshift-machine-api cluster -o yaml |yq .spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.securityGroups
- filters:
    - name: tag:Name
      values:
        - gpei-test1013-8lwtb-master-sg
- id: sg-02a59f8662d10c6d3
- id: sg-0b46b502b575d30ba
- id: sg-0fbc4c9733e6c18e7

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-12-104602

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

It works well when setting the security groups in installconfig.controlPlane.platform.aws.additionalSecurityGroupIDs

https://github.com/openshift/installer/pull/7589

Bug OCPBUGS-29376: [4.15] ci: e2e testing for mixed cpus feature

View the Description View the linked PRs

Same as ~~CNF-9173~~

Opened as a bug in order to backport for 4.15

Bug OCPBUGS-30189: [release-4.15 backport] The hypershift installer does not set the cipher suites for konnectivity-server

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29773~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3673

Bug OCPBUGS-29638: image-registry co is degraded on Azure MAG, Azure Stack Hub cloud or with azure workload identity

View the Description View the linked PRs

Description of problem:

Install IPI cluster against 4.15 nightly build on Azure MAG and Azure Stack Hub or with Azure workload identity, image-registry co is degraded with different errors.

On MAG:
$ oc get co image-registry
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.15.0-0.nightly-2024-02-16-235514   True        False         True       5h44m   AzurePathFixControllerDegraded: Migration failed: panic: Get "https://imageregistryjima41xvvww.blob.core.windows.net/jima415a-hfxfh-image-registry-vbibdmawmsvqckhvmmiwisebryohfbtm?comp=list&prefix=docker&restype=container": dial tcp: lookup imageregistryjima41xvvww.blob.core.windows.net on 172.30.0.10:53: no such host...

$ oc get pod -n openshift-image-registry
NAME                                               READY   STATUS    RESTARTS        AGE
azure-path-fix-ssn5w                               0/1     Error     0               5h47m
cluster-image-registry-operator-86cdf775c7-7brn6   1/1     Running   1 (5h50m ago)   5h58m
image-registry-5c6796b86d-46lvx                    1/1     Running   0               5h47m
image-registry-5c6796b86d-9st5d                    1/1     Running   0               5h47m
node-ca-48lsh                                      1/1     Running   0               5h44m
node-ca-5rrsl                                      1/1     Running   0               5h47m
node-ca-8sc92                                      1/1     Running   0               5h47m
node-ca-h6trz                                      1/1     Running   0               5h47m
node-ca-hm7s2                                      1/1     Running   0               5h47m
node-ca-z7tv8                                      1/1     Running   0               5h44m

$ oc logs azure-path-fix-ssn5w -n openshift-image-registry
panic: Get "https://imageregistryjima41xvvww.blob.core.windows.net/jima415a-hfxfh-image-registry-vbibdmawmsvqckhvmmiwisebryohfbtm?comp=list&prefix=docker&restype=container": dial tcp: lookup imageregistryjima41xvvww.blob.core.windows.net on 172.30.0.10:53: no such hostgoroutine 1 [running]:
main.main()
    /go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:49 +0x125

The blob storage endpoint seems not correct, should be:
$ az storage account show -n imageregistryjima41xvvww -g jima415a-hfxfh-rg --query primaryEndpoints
{
  "blob": "https://imageregistryjima41xvvww.blob.core.usgovcloudapi.net/",
  "dfs": "https://imageregistryjima41xvvww.dfs.core.usgovcloudapi.net/",
  "file": "https://imageregistryjima41xvvww.file.core.usgovcloudapi.net/",
  "internetEndpoints": null,
  "microsoftEndpoints": null,
  "queue": "https://imageregistryjima41xvvww.queue.core.usgovcloudapi.net/",
  "table": "https://imageregistryjima41xvvww.table.core.usgovcloudapi.net/",
  "web": "https://imageregistryjima41xvvww.z2.web.core.usgovcloudapi.net/"
}

On Azure Stack Hub:
$ oc get co image-registry
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.15.0-0.nightly-2024-02-16-235514   True        False         True       3h32m   AzurePathFixControllerDegraded: Migration failed: panic: open : no such file or directory...

$ oc get pod -n openshift-image-registry
NAME                                               READY   STATUS    RESTARTS        AGE
azure-path-fix-8jdg7                               0/1     Error     0               3h35m
cluster-image-registry-operator-86cdf775c7-jwnd4   1/1     Running   1 (3h38m ago)   3h54m
image-registry-658669fbb4-llv8z                    1/1     Running   0               3h35m
image-registry-658669fbb4-lmfr6                    1/1     Running   0               3h35m
node-ca-2jkjx                                      1/1     Running   0               3h35m
node-ca-dcg2v                                      1/1     Running   0               3h35m
node-ca-q6xmn                                      1/1     Running   0               3h35m
node-ca-r46r2                                      1/1     Running   0               3h35m
node-ca-s8jkb                                      1/1     Running   0               3h35m
node-ca-ww6ql                                      1/1     Running   0               3h35m

$ oc logs azure-path-fix-8jdg7 -n openshift-image-registry
panic: open : no such file or directorygoroutine 1 [running]:
main.main()
    /go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:36 +0x145

On cluster with Azure workload identity:
Some operator's PROGRESSING is True
image-registry                             4.15.0-0.nightly-2024-02-16-235514   True        True          False      43m     Progressing: The deployment has not completed...

pod azure-path-fix is in CreateContainerConfigError status, and get error in its Event.

"state": {
    "waiting": {
        "message": "couldn't find key REGISTRY_STORAGE_AZURE_ACCOUNTKEY in Secret openshift-image-registry/image-registry-private-configuration",
        "reason": "CreateContainerConfigError"
    }
}

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-02-16-235514

How reproducible:

    Always

Steps to Reproduce:

    1. Install IPI cluster on MAG or Azure Stack Hub or config Azure workload identity
    2.
    3.

Actual results:

    Installation failed and image-registry operator is degraded

Expected results:

    Installation is successful.

Additional info:

    Seems that issue is related with https://github.com/openshift/image-registry/pull/393

https://github.com/openshift/cluster-image-registry-operator/pull/1004

Bug OCPBUGS-19160: Update 4.15 ose-cluster-kube-apiserver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1550

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1550

Bug OCPBUGS-22204: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/280

Bug OCPBUGS-28779: add new arm64 tested azure instance types in installer doc

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28708~~. The following is the description of the original issue:
—
Description of problem:

When running 4.15 installer full function test, detect below one arm64 instance families and verified, need to append them in installer doc[1]:
- standardBpsv2Family

[1] https://github.com/openshift/installer/blob/master/docs/user/azure/tested_instance_types_aarch64.md

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7971

Bug OCPBUGS-25995: seLinuxMount is missed after changing to csi-operator

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24245~~. The following is the description of the original issue:
—
https://github.com/openshift/csi-operator/blob/master/assets/overlays/aws-ebs/base/csidriver.yaml

Missed "seLinuxMount: true" which has been merged in https://github.com/bertinatto/aws-ebs-csi-driver-operator-1/blob/0a9642cff6d2a7f9aea940ce89b65fc189cba6b6/assets/csidriver.yaml#L14

https://github.com/openshift/csi-operator/pull/91

Bug OCPBUGS-28956: openshift/csi-driver-shared-resource-operator - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28664~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource-operator/pull/102

Bug OCPBUGS-35229: [4.15] v0 CI failures

View the Description View the linked PRs

Description of problem:

CI is permafailing all the way down to 4.12 due to some breaking changes being side loaded to old version due to a :latest tag for a fixture image

Longer version - we faced a few different issues:
- we made a change to opm where it started to validate package names differently. This broke some of our tests because they had invalid package names.
- opm switched to a different cache backend, which lead to the operatorhubio image being updated with the new cache backend, but that same image broke CI for older versions whose opm did not support the new backend

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/operator-framework-olm/pull/768

Bug OCPBUGS-19705: Do not use port 9106 for ovnkube-control-plane metrics

View the Description View the linked PRs

In order to avoid possible issues with SDN during migration from SDN to OVNK, do not use port 9106 for ovnkube-control-plane metrics, since it's already used by SDN. Use a port that is not used by SDN, such as 9108.

https://github.com/openshift/cluster-network-operator/pull/2031

Bug OCPBUGS-22724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1800

Bug OCPBUGS-25648: vsphere-problem-detector-operator pod CrashLoopBackOff with panic

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25372~~. The following is the description of the original issue:
—
Description of problem:

Find in QE's CI (with vsphere-agent profile), storage CO is not avaliable and vsphere-problem-detector-operator pod is CrashLoopBackOff with panic.
(Find must-garther here: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-vsphere-agent-disconnected-ha-f14/1734850632575094784/artifacts/vsphere-agent-disconnected-ha-f14/gather-must-gather/)


The storage CO reports "unable to find VM by UUID":
  - lastTransitionTime: "2023-12-13T09:15:27Z"
    message: "VSphereCSIDriverOperatorCRAvailable: VMwareVSphereControllerAvailable:
      unable to find VM ci-op-782gwsbd-b3d4e-master-2 by UUID \nVSphereProblemDetectorDeploymentControllerAvailable:
      Waiting for Deployment"
    reason: VSphereCSIDriverOperatorCR_VMwareVSphereController_vcenter_api_error::VSphereProblemDetectorDeploymentController_Deploying
    status: "False"
    type: Available
(But I did not see the "unable to find VM by UUID" from vsphere-problem-detector-operator log in must-gather)


The vsphere-problem-detector-operator log:
2023-12-13T10:10:56.620216117Z I1213 10:10:56.620159       1 vsphere_check.go:149] Connected to vcenter.devqe.ibmc.devcluster.openshift.com as ci_user_01@devqe.ibmc.devcluster.openshift.com
2023-12-13T10:10:56.625161719Z I1213 10:10:56.625108       1 vsphere_check.go:271] CountVolumeTypes passed
2023-12-13T10:10:56.625291631Z I1213 10:10:56.625258       1 zones.go:124] Checking tags for multi-zone support.
2023-12-13T10:10:56.625449771Z I1213 10:10:56.625433       1 zones.go:202] No FailureDomains configured.  Skipping check.
2023-12-13T10:10:56.625497726Z I1213 10:10:56.625487       1 vsphere_check.go:271] CheckZoneTags passed
2023-12-13T10:10:56.625531795Z I1213 10:10:56.625522       1 info.go:44] vCenter version is 8.0.2, apiVersion is 8.0.2.0 and build is 22617221
2023-12-13T10:10:56.625562833Z I1213 10:10:56.625555       1 vsphere_check.go:271] ClusterInfo passed
2023-12-13T10:10:56.625603236Z I1213 10:10:56.625594       1 datastore.go:312] checking datastore /DEVQEdatacenter/datastore/vsanDatastore for permissions
2023-12-13T10:10:56.669205822Z panic: runtime error: invalid memory address or nil pointer dereference
2023-12-13T10:10:56.669338411Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x23096cb]
2023-12-13T10:10:56.669565413Z 
2023-12-13T10:10:56.669591144Z goroutine 550 [running]:
2023-12-13T10:10:56.669838383Z github.com/openshift/vsphere-problem-detector/pkg/operator.getVM(0xc0005da6c0, 0xc0002d3b80)
2023-12-13T10:10:56.669991749Z     github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:319 +0x3eb
2023-12-13T10:10:56.670212441Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*vSphereChecker).enqueueSingleNodeChecks.func1()
2023-12-13T10:10:56.670289644Z     github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:238 +0x55
2023-12-13T10:10:56.670490453Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker.func1(0xc000c88760?, 0x0?)
2023-12-13T10:10:56.670702592Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:40 +0x55
2023-12-13T10:10:56.671142070Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker(0xc000c78660, 0xc000c887a0?)
2023-12-13T10:10:56.671331852Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:41 +0xe7
2023-12-13T10:10:56.671529761Z github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool.func1()
2023-12-13T10:10:56.671589925Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:28 +0x25
2023-12-13T10:10:56.671776328Z created by github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool
2023-12-13T10:10:56.671847478Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:27 +0x73

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-11-033133

How reproducible:

Steps to Reproduce:

    1. See description
    2.
    3.

Actual results:

   vpd is panic

Expected results:

   vpd should not panic

Additional info:

   I guess it is privileges issue, but our pod should not be panic.

https://github.com/openshift/vsphere-problem-detector/pull/149

Bug OCPBUGS-25947: Converting load balancer service from internal scope to external keeps internal load balancer IP on GCP

View the Description View the linked PRs

Reproducer:
1. On a GCP cluster, create an ingress controller with internal load balancer scope, like this:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: foo
  namespace: openshift-ingress-operator
spec:
  domain: foo.<cluster-domain>
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      dnsManagementPolicy: Managed
      scope: Internal

2. Wait for load balancer service to complete rollout

$ oc -n openshift-ingress get service router-foo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-foo LoadBalancer 172.30.101.233 10.0.128.5 80:32019/TCP,443:32729/TCP 81s

3. Edit ingress controller to set spec.endpointPublishingStrategy.loadBalancer.scope to External

the load balancer service (router-foo in this case) should get an external IP address, but currently it keeps the 10.x.x.x address that was already assigned.

https://github.com/openshift/cloud-provider-gcp/pull/56

Bug OCPBUGS-39172: noProxy URL not available in Prometheus k8s CR after configuring remote-write

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39170~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-39029~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38289. The following is the description of the original issue:
—
Description of problem:

The cluster-wide proxy is getting injected for remote-write config automatically but not the noProxy URLs in Prometheus k8s CR which is available in openshift-monitoring project which is expected. However, if the remote-write endpoint is in noProxy region, then metrics are not transferred.

Version-Release number of selected component (if applicable):

RHOCP 4.16.4

How reproducible:

100%

Steps to Reproduce:

1. Configure proxy custom resource in RHOCP 4.16.4 cluster
2. Create cluster-monitoring-config configmap in openshift-monitoring project
3. Inject remote-write config (without specifically configuring proxy for remote-write)
4. After saving the modification in  cluster-monitoring-config configmap, check the remoteWrite config in Prometheus k8s CR. Now it contains the proxyUrl but NOT the noProxy URL(referenced from cluster proxy). Example snippet:
==============
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
[...]
  name: k8s
  namespace: openshift-monitoring
spec:
[...]
  remoteWrite:
  - proxyUrl: http://proxy.abc.com:8080     <<<<<====== Injected Automatically but there is no noProxy URL.
    url: http://test-remotewrite.test.svc.cluster.local:9090

Actual results:

The proxy URL from proxy CR is getting injected in Prometheus k8s CR automatically when configuring remoteWrite but it doesn't have noProxy inherited from cluster proxy resource.

Expected results:

The noProxy URL should get injected in Prometheus k8s CR as well.

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2448

Bug OCPBUGS-44328: cvo trying to progress unaccepted release following scale toggle

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43964~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42386~~. The following is the description of the original issue:
—
Description of problem:

usually providing a cluster with unaccepted update, such as unsigned payload without force, is treated with releaseaccepted=false progressing=false. however by scaling cvo deployment down and up again, progressing=true is observed, causing oc adm upgrade as well as oc adm upgrade status to display incorrect information, and clusterversion object to display empty capabilities and history item with version ""

Version-Release number of selected component (if applicable):

4.16.0-rc.4 but observed as well as early as 4.10.67

How reproducible:

100%

Steps to Reproduce:

1. target the cluster at unsigned build without using force
❯ oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a

2. scale cvo down and up again
 ❯ oc scale --replicas 0 -n openshift-cluster-version deployments/cluster-version-operator
deployment.apps/cluster-version-operator scaled

❯ oc scale --replicas 1 -n openshift-cluster-version deployments/cluster-version-operator
deployment.apps/cluster-version-operator scaled

Actual results:

oc adm update displays "info: An upgrade is in progress. Working towards..."

also a warning of "Architecture has not been configured"

❯ oc adm upgrade
info: An upgrade is in progress. Working towards registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a

ReleaseAccepted=False  

  Reason: RetrievePayload
  Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a" failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a against keyrings: verifier-public-key-redhat

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.16
warning: Cannot display available updates:
  Reason: NoArchitecture
  Message: Architecture has not been configured.

clusterversion object have Progressing True, "capabilities: {}" as well as a partial history item with version ""

 ❯ oc get clusterversion version -oyaml                                                                                                                       
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  creationTimestamp: "2024-06-10T11:36:51Z"
  generation: 3
  name: version
  resourceVersion: "70199"
  uid: 9c80848b-9f3a-4f0d-8472-a2ccce1c4023
spec:
  channel: stable-4.16
  clusterID: e74054ac-e0fe-4cf7-a457-4887ba96cff9
  desiredUpdate:
    architecture: ""
    force: false
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    version: ""
status:
  availableUpdates: null
  capabilities: {}
  conditions:
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: Architecture has not been configured.
    reason: NoArchitecture
    status: "False"
    type: RetrievedUpdates
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: Capabilities match configured spec
    reason: AsExpected
    status: "False"
    type: ImplicitlyEnabledCapabilities
  - lastTransitionTime: "2024-06-10T14:06:42Z"
    message: 'Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a"
      failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
      against keyrings: verifier-public-key-redhat'
    reason: RetrievePayload
    status: "False"
    type: ReleaseAccepted
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    message: Done applying 4.16.0-rc.4
    status: "True"
    type: Available
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    status: "False"
    type: Failing
  - lastTransitionTime: "2024-06-10T14:07:30Z"
    message: Working towards registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    status: "True"
    type: Progressing
  desired:
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    version: ""
  history:
  - completionTime: null
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    startedTime: "2024-06-10T14:07:30Z"
    state: Partial
    verified: false
    version: ""
  - completionTime: "2024-06-10T12:06:31Z"
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    startedTime: "2024-06-10T11:37:17Z"
    state: Completed
    verified: false
    version: 4.16.0-rc.4
  observedGeneration: 3
  versionHash: AjnKTa_3kbg=

in upgrade status, Progressing to an empty target with Completion 0%

= Control Plane =
Assessment:      Progressing
Target Version:   (from 4.16.0-rc.4)
Completion:      0%
Duration:        2m26.971091165s
Operator Status: 33 Healthy

Expected results:

clusterversion stays the same as before scale toggle

apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  creationTimestamp: "2024-06-10T11:36:51Z"
  generation: 3
  name: version
  resourceVersion: "69881"
  uid: 9c80848b-9f3a-4f0d-8472-a2ccce1c4023
spec:
  channel: stable-4.16
  clusterID: e74054ac-e0fe-4cf7-a457-4887ba96cff9
  desiredUpdate:
    architecture: ""
    force: false
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    version: ""
status:
  availableUpdates: null
  capabilities:
    enabledCapabilities:
    - Build
    - CSISnapshot
    - CloudControllerManager
    - CloudCredential
    - Console
    - DeploymentConfig
    - ImageRegistry
    - Ingress
    - Insights
    - MachineAPI
    - NodeTuning
    - OperatorLifecycleManager
    - Storage
    - baremetal
    - marketplace
    - openshift-samples
    knownCapabilities:
    - Build
    - CSISnapshot
    - CloudControllerManager
    - CloudCredential
    - Console
    - DeploymentConfig
    - ImageRegistry
    - Ingress
    - Insights
    - MachineAPI
    - NodeTuning
    - OperatorLifecycleManager
    - Storage
    - baremetal
    - marketplace
    - openshift-samples
  conditions:
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: 'Unable to retrieve available updates: currently reconciling cluster
      version 4.16.0-rc.4 not found in the "stable-4.16" channel'
    reason: VersionNotFound
    status: "False"
    type: RetrievedUpdates
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: Capabilities match configured spec
    reason: AsExpected
    status: "False"
    type: ImplicitlyEnabledCapabilities
  - lastTransitionTime: "2024-06-10T14:06:42Z"
    message: 'Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a"
      failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
      against keyrings: verifier-public-key-redhat'
    reason: RetrievePayload
    status: "False"
    type: ReleaseAccepted
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    message: Done applying 4.16.0-rc.4
    status: "True"
    type: Available
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    status: "False"
    type: Failing
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    message: Cluster version is 4.16.0-rc.4
    status: "False"
    type: Progressing
  desired:
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    url: https://access.redhat.com/errata/RHEA-2024:0041
    version: 4.16.0-rc.4
  history:
  - completionTime: "2024-06-10T12:06:31Z"
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    startedTime: "2024-06-10T11:37:17Z"
    state: Completed
    verified: false
    version: 4.16.0-rc.4
  observedGeneration: 2
  versionHash: AjnKTa_3kbg=

no upgrade is in progress message for release that is not accepted

 ❯ oc adm upgrade
Cluster version is 4.16.0-rc.4

ReleaseAccepted=False

  Reason: RetrievePayload
  Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a" failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a against keyrings: verifier-public-key-redhat

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.16
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.16.0-rc.4 not found in the "stable-4.16" channel

Additional info:

it is possible to kick the cluster out of this state, by applying --clear, which causing the cluster to breefly progress into its original version, followed by 3 items appearing in history

❯ oc adm upgrade --clear
Cleared the update field, still at registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a

❯ oc adm upgrade
info: An upgrade is in progress. Working towards 4.16.0-rc.4: 116 of 894 done (12% complete)

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.16
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.16.0-rc.4 not found in the "stable-4.16" channel

❯ oc get clusterversion version -oyaml
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  creationTimestamp: "2024-06-10T11:36:51Z"
  generation: 4
  name: version
  resourceVersion: "72594"
  uid: 9c80848b-9f3a-4f0d-8472-a2ccce1c4023
spec:
  channel: stable-4.16
  clusterID: e74054ac-e0fe-4cf7-a457-4887ba96cff9
status:
  availableUpdates: null
  capabilities:
    enabledCapabilities:
    - Build
    - CSISnapshot
    - CloudControllerManager
    - CloudCredential
    - Console
    - DeploymentConfig
    - ImageRegistry
    - Ingress
    - Insights
    - MachineAPI
    - NodeTuning
    - OperatorLifecycleManager
    - Storage
    - baremetal
    - marketplace
    - openshift-samples
    knownCapabilities:
    - Build
    - CSISnapshot
    - CloudControllerManager
    - CloudCredential
    - Console
    - DeploymentConfig
    - ImageRegistry
    - Ingress
    - Insights
    - MachineAPI
    - NodeTuning
    - OperatorLifecycleManager
    - Storage
    - baremetal
    - marketplace
    - openshift-samples
  conditions:
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: 'Unable to retrieve available updates: currently reconciling cluster
      version 4.16.0-rc.4 not found in the "stable-4.16" channel'
    reason: VersionNotFound
    status: "False"
    type: RetrievedUpdates
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: Capabilities match configured spec
    reason: AsExpected
    status: "False"
    type: ImplicitlyEnabledCapabilities
  - lastTransitionTime: "2024-06-10T14:13:07Z"
    message: Payload loaded version="4.16.0-rc.4" image="quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6"
      architecture="amd64"
    reason: PayloadLoaded
    status: "True"
    type: ReleaseAccepted
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    message: Done applying 4.16.0-rc.4
    status: "True"
    type: Available
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    status: "False"
    type: Failing
  - lastTransitionTime: "2024-06-10T14:14:00Z"
    message: Cluster version is 4.16.0-rc.4
    status: "False"
    type: Progressing
  desired:
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    url: https://access.redhat.com/errata/RHEA-2024:0041
    version: 4.16.0-rc.4
  history:
  - completionTime: "2024-06-10T14:14:00Z"
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    startedTime: "2024-06-10T14:13:07Z"
    state: Completed
    verified: false
    version: 4.16.0-rc.4
  - completionTime: "2024-06-10T14:13:07Z"
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    startedTime: "2024-06-10T14:07:30Z"
    state: Partial
    verified: false
    version: ""
  - completionTime: "2024-06-10T12:06:31Z"
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    startedTime: "2024-06-10T11:37:17Z"
    state: Completed
    verified: false
    version: 4.16.0-rc.4
  observedGeneration: 4
  versionHash: AjnKTa_3kbg=

also trying to apply a rollback at this state, resulting in invalid SemVer error

 ❯ OC_ENABLE_CMD_UPGRADE_ROLLBACK=true oc adm upgrade rollback                                                             
error: previous version "" invalid SemVer: Version string empty

https://github.com/openshift/cluster-version-operator/pull/1104

Bug OCPBUGS-23743: Bump router to Kubernetes 1.28 for 4.15

View the Description View the linked PRs

Description of problem

The openshift/router repository vendors k8s.io/* v0.27.2. OpenShift 4.15 is based on Kubernetes 1.28.

Version-Release number of selected component (if applicable)

4.15.

How reproducible

Always.

Steps to Reproduce

Check https://github.com/openshift/router/blob/release-4.15/go.mod.

Actual results

The k8s.io/* packages are at v0.27.2.

Expected results

The k8s.io/* packages are at v0.28.0 or newer.

https://github.com/openshift/router/pull/542

Bug OCPBUGS-29780: Control Plane Kube Apiserver Service Port should remain as 2040 for IBM Cloud Provider

View the Description View the linked PRs

Description of problem:

    A recent [PR](https://github.com/openshift/hypershift/commit/c030ab66d897815e16d15c987456deab8d0d6da0) updated the kube-apiserver service port to `6443`. That change causes a small outage when upgrading from a 4.13 cluster in IBMCloud. We need to keep the service port as 2040 for IBM Cloud Provider to avoid the outage.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3595

Bug OCPBUGS-31324: PodSecurityViolation alert missing in Hypershift

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31263~~. The following is the description of the original issue:
—
hypershift is not creating this alert in HostedClusters
https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/bindata/assets/alerts/podsecurity-violations.yaml

In standalone OCP, it is done by the KASO.

https://github.com/openshift/hypershift/pull/3798

Bug OCPBUGS-34721: Incorrect usage of install-config.yaml additionalTrustBundle field

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32042~~. The following is the description of the original issue:
—
Description of problem:

When the user configures the install-config.yaml additionalTrustBundle field (for example, in a disconnected installation using a local registry),
the user-ca-bundle configmap gets populated with more content than strictly required

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Setup a local registry and mirror the content of an ocp release
    2. Configure the install-config.yaml for a mirrored installation. In particular, configure the additionalTrustBundle field with the registry cert
    3. Create the agent ISO, boot the nodes and wait for the installation to complete

Actual results:

    The user-ca-bundle cm does not contain onyl the registry cert

Expected results:

user-ca-bundle configmap with just the content of the install-config additionalTrustBundle field

Additional info:

https://github.com/openshift/installer/pull/8512

Bug OCPBUGS-33697: [4.15] common users are unable to create ephemeral/CSI volumes in upgraded clusters

View the Description View the linked PRs

Description of problem:

    The storage team added CSI and ephemeral volumes in 4.12 and 4.13 but the affected SCCs are not being reconciled, resulting in these capabilities unreachable to the hands of the expected end users.

Version-Release number of selected component (if applicable):

    4.13+

How reproducible:

    100%

Steps to Reproduce:

    1.check either of "anyuid", "hostaccess", "hostmount-anyuid", "hostnetwork", "nonroot", "restricted" SCCs on a cluster upgraded from 4.11

Actual results:

    no "csi" and "ephemeral" in .volumes

Expected results:

    "csi" and "ephemeral" in .volumes

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1679

Bug OCPBUGS-4038: OKD: skip enabling gatewayd.socket

View the Description View the linked PRs

Description of problem:

OKD installer attempts to enable systemd-journal-gatewayd.socket, which is not present on FCOS

Version-Release number of selected component (if applicable):

4.13

Bug OCPBUGS-17199: CEO prevents member deletion during revision rollout

View the Description View the linked PRs

this is case 2 from ~~OCPBUGS-14673~~

Description of problem:

MHC for control plane cannot work right for some cases

2.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded.

This is a regression bug, because I tested this on 4.12 around September 2022, case 2 and case 3 work right.
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-54326

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-05-112833
4.13.0-0.nightly-2023-06-06-194351
4.12.0-0.nightly-2023-06-07-005319

How reproducible:

Always

Steps to Reproduce:

1.Create MHC for control plane

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: control-plane-health
  namespace: openshift-machine-api
spec:
  maxUnhealthy: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-type: master
  unhealthyConditions:
  - status: "False"
    timeout: 300s
    type: Ready
  - status: "Unknown"
    timeout: 300s
    type: Ready


liuhuali@Lius-MacBook-Pro huali-test % oc create -f mhc-master3.yaml 
machinehealthcheck.machine.openshift.io/control-plane-health created
liuhuali@Lius-MacBook-Pro huali-test % oc get mhc
NAME                              MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
control-plane-health              1              3                  3
machine-api-termination-handler   100%           0                  0 

Case 2.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded.
liuhuali@Lius-MacBook-Pro huali-test % oc debug node/huliu-az7c-svq9q-master-1 
Starting pod/huliu-az7c-svq9q-master-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.6
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# systemctl stop kubelet


Removing debug pod ...
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                                   STATUS   ROLES                  AGE   VERSION
huliu-az7c-svq9q-master-1              Ready    control-plane,master   95m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-2              Ready    control-plane,master   95m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-c96k8-0        Ready    control-plane,master   19m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-5r8jf   Ready    worker                 34m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-k747l   Ready    worker                 47m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-r2vdn   Ready    worker                 83m   v1.26.5+7a891f0
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                   PHASE     TYPE              REGION   ZONE   AGE
huliu-az7c-svq9q-master-1              Running   Standard_D8s_v3   westus          97m
huliu-az7c-svq9q-master-2              Running   Standard_D8s_v3   westus          97m
huliu-az7c-svq9q-master-c96k8-0        Running   Standard_D8s_v3   westus          23m
huliu-az7c-svq9q-worker-westus-5r8jf   Running   Standard_D4s_v3   westus          39m
huliu-az7c-svq9q-worker-westus-k747l   Running   Standard_D4s_v3   westus          53m
huliu-az7c-svq9q-worker-westus-r2vdn   Running   Standard_D4s_v3   westus          91m
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                                   STATUS     ROLES                  AGE     VERSION
huliu-az7c-svq9q-master-1              NotReady   control-plane,master   107m    v1.26.5+7a891f0
huliu-az7c-svq9q-master-2              Ready      control-plane,master   107m    v1.26.5+7a891f0
huliu-az7c-svq9q-master-c96k8-0        Ready      control-plane,master   32m     v1.26.5+7a891f0
huliu-az7c-svq9q-master-jdhgg-1        Ready      control-plane,master   2m10s   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-5r8jf   Ready      worker                 46m     v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-k747l   Ready      worker                 59m     v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-r2vdn   Ready      worker                 95m     v1.26.5+7a891f0
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                   PHASE      TYPE              REGION   ZONE   AGE
huliu-az7c-svq9q-master-1              Deleting   Standard_D8s_v3   westus          110m
huliu-az7c-svq9q-master-2              Running    Standard_D8s_v3   westus          110m
huliu-az7c-svq9q-master-c96k8-0        Running    Standard_D8s_v3   westus          36m
huliu-az7c-svq9q-master-jdhgg-1        Running    Standard_D8s_v3   westus          5m55s
huliu-az7c-svq9q-worker-westus-5r8jf   Running    Standard_D4s_v3   westus          52m
huliu-az7c-svq9q-worker-westus-k747l   Running    Standard_D4s_v3   westus          65m
huliu-az7c-svq9q-worker-westus-r2vdn   Running    Standard_D4s_v3   westus          103m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                   PHASE      TYPE              REGION   ZONE   AGE
huliu-az7c-svq9q-master-1              Deleting   Standard_D8s_v3   westus          3h
huliu-az7c-svq9q-master-2              Running    Standard_D8s_v3   westus          3h
huliu-az7c-svq9q-master-c96k8-0        Running    Standard_D8s_v3   westus          105m
huliu-az7c-svq9q-master-jdhgg-1        Running    Standard_D8s_v3   westus          75m
huliu-az7c-svq9q-worker-westus-5r8jf   Running    Standard_D4s_v3   westus          122m
huliu-az7c-svq9q-worker-westus-k747l   Running    Standard_D4s_v3   westus          135m
huliu-az7c-svq9q-worker-westus-r2vdn   Running    Standard_D4s_v3   westus          173m
liuhuali@Lius-MacBook-Pro huali-test % oc get node   
NAME                                   STATUS     ROLES                  AGE    VERSION
huliu-az7c-svq9q-master-1              NotReady   control-plane,master   178m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-2              Ready      control-plane,master   178m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-c96k8-0        Ready      control-plane,master   102m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-jdhgg-1        Ready      control-plane,master   72m    v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-5r8jf   Ready      worker                 116m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-k747l   Ready      worker                 129m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-r2vdn   Ready      worker                 165m   v1.26.5+7a891f0
liuhuali@Lius-MacBook-Pro huali-test % oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-06-06-194351   True        True          True       107m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
baremetal                                  4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
cloud-controller-manager                   4.13.0-0.nightly-2023-06-06-194351   True        False         False      176m    
cloud-credential                           4.13.0-0.nightly-2023-06-06-194351   True        False         False      3h      
cluster-autoscaler                         4.13.0-0.nightly-2023-06-06-194351   True        False         False      173m    
config-operator                            4.13.0-0.nightly-2023-06-06-194351   True        False         False      175m    
console                                    4.13.0-0.nightly-2023-06-06-194351   True        False         False      136m    
control-plane-machine-set                  4.13.0-0.nightly-2023-06-06-194351   True        False         False      71m     
csi-snapshot-controller                    4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
dns                                        4.13.0-0.nightly-2023-06-06-194351   True        True          False      173m    DNS "default" reports Progressing=True: "Have 6 available node-resolver pods, want 7."
etcd                                       4.13.0-0.nightly-2023-06-06-194351   True        True          True       173m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
image-registry                             4.13.0-0.nightly-2023-06-06-194351   True        True          False      165m    Progressing: The registry is ready...
ingress                                    4.13.0-0.nightly-2023-06-06-194351   True        False         False      165m    
insights                                   4.13.0-0.nightly-2023-06-06-194351   True        False         False      168m    
kube-apiserver                             4.13.0-0.nightly-2023-06-06-194351   True        True          True       171m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-controller-manager                    4.13.0-0.nightly-2023-06-06-194351   True        False         True       171m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-scheduler                             4.13.0-0.nightly-2023-06-06-194351   True        False         True       171m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-storage-version-migrator              4.13.0-0.nightly-2023-06-06-194351   True        False         False      106m    
machine-api                                4.13.0-0.nightly-2023-06-06-194351   True        False         False      167m    
machine-approver                           4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
machine-config                             4.13.0-0.nightly-2023-06-06-194351   False       False         True       60m     Cluster not available for [{operator 4.13.0-0.nightly-2023-06-06-194351}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 7, updated: 7, ready: 6, unavailable: 1)]
marketplace                                4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
monitoring                                 4.13.0-0.nightly-2023-06-06-194351   True        False         False      106m    
network                                    4.13.0-0.nightly-2023-06-06-194351   True        True          False      177m    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)...
node-tuning                                4.13.0-0.nightly-2023-06-06-194351   True        False         False      173m    
openshift-apiserver                        4.13.0-0.nightly-2023-06-06-194351   True        True          True       107m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-apiserver ()
openshift-controller-manager               4.13.0-0.nightly-2023-06-06-194351   True        False         False      170m    
openshift-samples                          4.13.0-0.nightly-2023-06-06-194351   True        False         False      167m    
operator-lifecycle-manager                 4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-06-06-194351   True        False         False      168m    
service-ca                                 4.13.0-0.nightly-2023-06-06-194351   True        False         False      175m    
storage                                    4.13.0-0.nightly-2023-06-06-194351   True        True          False      174m    AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods...
liuhuali@Lius-MacBook-Pro huali-test % 

-----------------------

There might be an easier way by just rolling a revision in etcd, stopping kubelet and then observing the same issue.

Actual results:

CEO's member removal controller is getting stuck on the IsBootstrapComplete check that was introduced to fix another bug: 

 https://github.com/openshift/cluster-etcd-operator/commit/c96150992a8aba3654835787be92188e947f557c#diff-d91047e39d2c1ab6b35e69359a24e83c19ad9b3e9ad4e44f9b1ac90e50f7b650R97 

 turns out IsBootstrapComplete checks whether a revision is currently rolling out (makes sense) and that one NotReady node with kubelet gone still has a revision going (rev 7, target 9).

more info: https://issues.redhat.com/browse/OCPBUGS-14673?focusedId=22726712&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-22726712

This causes the etcd member to not be removed. 

Which in turn blocks the vertical scale-down procedure to remove the pre-drain hook as the member is still present. Effectively you end up with a cluster of 4 CP machines, where one is stuck in Deleting state.

Expected results:

The etcd member should be removed and the machine/node should be deleted

Additional info:

Removing the revision check does fix this issue reliably, but might not be desirable:
https://github.com/openshift/cluster-etcd-operator/pull/1087

https://github.com/openshift/cluster-etcd-operator/pull/1087

Bug OCPBUGS-17286: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/181

Bug OCPBUGS-17652: [alibabacloud] IPI installation on Alibabacloud cannot succeed, and zero control-plane node ready

View the Description View the linked PRs

Description of problem:

IPI installation on Alibabacloud cannot succeed, and zero control-plane node ready.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. IPI installation on Alibabacloud, with "credentialsMode: Manual"

Actual results:

Bootstrap failed, with all control-plane nodes NotReady.

Expected results:

The installation should succeed.

Additional info:

The log bundle is available at https://drive.google.com/file/d/1eb1D6GeNyu1Bys6vDyf3ev9aFjzWW6lW/view?usp=drive_link.

The installation of exactly the same scenario can succeed with 4.14.0-ec.4-x86_64.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/276

Bug OCPBUGS-29523: Power VS: All deploys are failing due to terraform-provider-ibm

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29482~~. The following is the description of the original issue:
—
Description of problem:

    A change to how Power VS Workspaces are queried is not compatible with the version of terraform-provider-ibm

Version-Release number of selected component (if applicable):

How reproducible:

    Easily

Steps to Reproduce:

    1. Try to deploy with Power VS
    2. Fail with an error stating that [ERROR] Error retrieving service offering: ServiceDoesnotExist: Given service : "power-iaas" doesn't exist

Actual results:

    Fail with [ERROR] Error retrieving service offering: ServiceDoesnotExist: Given service : "power-iaas" doesn't exist

Expected results:

    Install should succeed.

Additional info:

https://github.com/openshift/installer/pull/8026

Bug OCPBUGS-36367: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-16707: Misleading error message to highlight "HostAlreadyClaimed"

View the Description View the linked PRs

Description of problem:

When we encounter the HostAlreadyClaimed issue, the error message is pointing to the wrong route name.

Version-Release number of selected component (if applicable):

OCP v4.12.z

How reproducible:

Frequently

Steps to Reproduce:

- Created three routes with the similar hosts, one without the path and other eith the paths defined.

# oc get routes
NAME     HOST/PORT                                                                       PATH    SERVICES        PORT   TERMINATION   WILDCARD
route1   httpd-example-path-based-routes.apps.firstcluster.lab.upshift.rdu2.redhat.com           httpd-example   web    edge          None
route2   httpd-example-path-based-routes.apps.firstcluster.lab.upshift.rdu2.redhat.com   /path   httpd-example   web    edge          None
route3   HostAlreadyClaimed                                                              /path   httpd-example   web    edge          None   <---------------


- Got 'HostAlreadyClaimed' error for the third route 'route3' which is expected because the path and the hostname of 'route2' & route3' are the same.

- In the route description, we could see that the first route that is 'route1' is reported to be the older route for the host but we expect it should report 'route2' because the hostname and paths are similar for the route2 and route3. 

# oc describe route route3
Name:            route3
Namespace:        path-based-routes
Created:        14 seconds ago
Labels:            app=httpd-example
            template=httpd-example
Annotations:        <none>
Requested Host:        httpd-example-path-based-routes.apps.firstcluster.lab.upshift.rdu2.redhat.com
            rejected by router default:  (host router-default.apps.firstcluster.lab.upshift.rdu2.redhat.com)HostAlreadyClaimed (14 seconds ago)
              route route1 already exposes httpd-example-path-based-routes.apps.firstcluster.lab.upshift.rdu2.redhat.com and is older   <----------------
Path:            /path
TLS Termination:    edge
Insecure Policy:    <none>
Endpoint Port:        web

Service:    httpd-example
Weight:        100 (100%)
Endpoints:    10.1.2.3:8080 

- However, deleting the 'route2' resolves the issue.

Actual results:

Error messages for 'HostAlreadyClainmed' issue should consider the route name to be reported on the basis of Hostname and paths.

Expected results:

Only hostname is taken into consideration where route's path should be checked as well and then the appropiate route name should be reported in the error.

https://github.com/openshift/router/pull/508

Bug OCPBUGS-19106: Update 4.15 ose-cluster-config-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-config-operator/pull/353

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-21672: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-22077: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/585

Bug OCPBUGS-27101: [regression] increased etcd leader elections significantly impacting vsphere amd64 platform

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27094~~. The following is the description of the original issue:
—
Description of problem:

Based on this and this component readiness data that compares success rates for those two particular tests, we are regressing ~7-10% between the current 4.15 master and 4.14.z (iow. we made the product ~10% worse).

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1720630313664647168

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1719915053026643968

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721475601161785344

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1724202075631390720

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721927613917696000

These jobs and their failures are all caused by increased etcd leader elections disrupting seemingly unrelated test cases across the VSphere AMD64 platform.

Since this particular platform's business significance is high, I'm setting this as "Critical" severity.

Please get in touch with me or Dean West if more teams need to be pulled into investigation and mitigation.

Version-Release number of selected component (if applicable):

4.15 / master

How reproducible:

Component Readiness Board

Actual results:

The etcd leader elections are elevated. Some jobs indicate it is due to disk i/o throughput OR network overload.

Expected results:

1. We NEED to understand what is causing this problem.
2. If we can mitigate this, we should.
3. If we cannot mitigate this, we need to document this or work with VSphere infrastructure provider to fix this problem.
4. We optionally need a way to measure how often this happens in our fleet so we can evaluate how bad it is.

Additional info:

Bug OCPBUGS-32693: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/584

Task MGMT-16011: Reduce agent image size

View the Description View the linked PRs

The agent container image is currently ~770MB. On slow networks, this can take a long time to download, and users don't know why their host isn't being discovered.

Some suggestions from Omer Tuchfeld:

Change all step binaries to a single binary that inspects argv[0] to determine how it should behave, the rest being symlinks to the one binary (hyperkube / busybox-style)
Strip debug information from the agent binariesy
Remove nmap as a dependency, it's probably overkill for our purposes
Or at least delete the nmap cracklib directory
Remove /usr/share/doc
We don't need the entire /usr/share/misc/magic database, stop using file and use a simpler detection for MBR partition
Remove X11
Remove licenses (or maybe compress them if it's legally problematic)
Remove man
Remove grub , this is a container, it doesn't boot
I'm not sure where all the ceph stuff is coming from
Remove Python - where are we even using Python in the agent?
Look into the libicudata and libmozjs things, I'm not sure we need them (or what they are)

https://github.com/openshift/assisted-installer-agent/pull/617

Task MGMT-16236: Remove elastic APM dependency

View the Description View the linked PRs

elastic APM seems to be unused

https://github.com/openshift/assisted-service/pull/5702

Story MGMT-17593: (openshift 4.15) Bump x/net to at least v0.24.0 to mitigate CVE-2023-45288

View the Description View the linked PRs

In order to address ~~OCPBUGS-30905~~
Bump x/net to at least v0.24.0 to mitigate CVE-2023-45288

Bug OCPBUGS-19550: Limit multus pod watch to pods on the local node

View the Description View the linked PRs

Multus doesn't need to watch pods on other nodes. To save memory and CPU set MULTUS_NODE_NAME to filter pods that multus watches.

https://github.com/openshift/cluster-network-operator/pull/2020

Bug OCPBUGS-22034: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-samples-operator/pull/538

Bug OCPBUGS-30621: 4.15: Remove csi-operator legacy/ directory

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30620~~. The following is the description of the original issue:
—
AWS EBS, Azure Disk and Azure File operators are now built from cmd/ and pkg/, there is no code used from legacy/ dir and we should remove it.

There are still test manifests in legacy/ directory that are still used! They need to be moved somewhere else + Dockerfile.*.test and CI steps must be updated!

Technically, this is a copy of ~~STOR-1797~~, but we need a bug to be able to backport aws-ebs changes to 4.15 and not use legacy/ directory there too.

Bug OCPBUGS-36550: Disconnected ARO clusters fail to add new nodes after upgrading to 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36536~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35300~~. The following is the description of the original issue:
—
Description of problem:

ARO cluster fails to install with disconnected networking.
We see master nodes bootup hang on the service machine-config-daemon-pull.service. Logs from the service indicate it cannot reach the public IP of the image registry. In ARO, image registries need to go via a proxy. Dnsmasq is used to inject proxy DNS answers, but machine-config-daemon-pull is starting before ARO's dnsmasq.service starts.

Version-Release number of selected component (if applicable):

4.14.16

How reproducible:

Always

Steps to Reproduce:

For Fresh Install:
1. Create the required ARO vnet and subnets
2. Attach a route table to the subnets with a blackhole route 0.0.0.0/0
3. Create 4.14 ARO cluster with --apiserver-visibility=Private --ingress-visibility=Private --outbound-type=UserDefinedRouting

[OR]

Post Upgrade to 4.14:
1. Create a ARO 4.13 UDR.
2. ClusterUpgrade the cluster 4.13-> 4.14 , upgrade was successful
3. Create a new node (scale up), we run into the same issue.

Actual results:

For Fresh Install of 4.14:
ERROR: (InternalServerError) Deployment failed.

[OR]

Post Upgrade to 4.14:
Node doesn't come into a Ready State and Machine is stuck in Provisioned status.

Expected results:

Succeeded

Additional info:
We see in the node logs that machine-config-daemon-pull.service is unable to reach the image registry. ARO's dnsmasq was not yet started.
Previously, systemd ordering was set for ovs-configuration.service to start after (ARO's) dnsmasq.service. Perhaps that should have gone on machine-config-daemon-pull.service.
See https://issues.redhat.com/browse/OCPBUGS-25406.

https://github.com/openshift/machine-config-operator/pull/4454

Bug OCPBUGS-37113: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/523

Bug OCPBUGS-19722: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3955

Bug OCPBUGS-21612: `oc adm ocp-certificates monitor-certificates` can panic

View the Description View the linked PRs

Description of problem:

Run the command `oc adm ocp-certificates monitor-certificates` will panic.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. `oc adm ocp-certificates monitor-certificates`

Actual results:

panic:

Expected results:

no panic

Additional info:

https://github.com/openshift/oc/pull/1589

Bug OCPBUGS-24168: Update 4.15 ose-machine-api-provider-azure-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-azure/pull/86

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-azure/pull/86

Bug OCPBUGS-27850: Power VS: Cannot deploy to mad

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27788~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Try to deploy in mad02 or mad04 with powervs
    2. Cannot import boot image
    3. fail

Actual results:

Fail

Expected results:

Cluster comes up

Additional info:

https://github.com/openshift/installer/pull/7948

Bug OCPBUGS-22058: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7590

Bug OCPBUGS-8764: [IPI Baremetal] The host doesn't power off upon removal during scale down.

View the Description View the linked PRs

The host doesn't power off upon removal during scale down.

Version: 4.4.0-0.nightly-2020-01-09-013524

Steps to reproduce:

Starting with 3 workers:
[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME STATUS PROVISIONING STATUS CONSUMER BMC HARDWARE PROFILE ONLINE ERROR
openshift-master-0 OK externally provisioned ocp-edge-cluster-master-0 ipmi://192.168.123.1:6230 true
openshift-master-1 OK externally provisioned ocp-edge-cluster-master-1 ipmi://192.168.123.1:6231 true
openshift-master-2 OK externally provisioned ocp-edge-cluster-master-2 ipmi://192.168.123.1:6232 true
openshift-worker-0 OK provisioned ocp-edge-cluster-worker-0-d2fvm ipmi://192.168.123.1:6233 unknown true
openshift-worker-5 OK provisioned ocp-edge-cluster-worker-0-ptklp ipmi://192.168.123.1:6245 unknown true
openshift-worker-9 OK provisioned ocp-edge-cluster-worker-0-jb2tm ipmi://192.168.123.1:6239 unknown true

[kni@worker-2 ~]$ oc get machine -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ocp-edge-cluster-master-0 4d4h
ocp-edge-cluster-master-1 4d4h
ocp-edge-cluster-master-2 4d4h
ocp-edge-cluster-worker-0-d2fvm 146m
ocp-edge-cluster-worker-0-jb2tm 11m
ocp-edge-cluster-worker-0-ptklp 3h54m

[kni@worker-2 ~]$ oc get node
NAME STATUS ROLES AGE VERSION
master-0 Ready master 4d4h v0.0.0-master+$Format:%h$
master-1 Ready master 4d4h v0.0.0-master+$Format:%h$
master-2 Ready master 4d4h v0.0.0-master+$Format:%h$
worker-0 Ready worker 18m v0.0.0-master+$Format:%h$
worker-5 Ready worker 18m v0.0.0-master+$Format:%h$
worker-9 Ready worker 5m2s v0.0.0-master+$Format:%h$

adding annotation to mark the proper node for deletion:
oc annotate machine ocp-edge-cluster-worker-0-jb2tm machine.openshift.io/cluster-api-delete-machine=yes -n openshift-machine-api
machine.machine.openshift.io/ocp-edge-cluster-worker-0-jb2tm annotated

Deleting the bmh:
[kni@worker-2 ~]$ oc delete bmh openshift-worker-9 -n openshift-machine-api
baremetalhost.metal3.io "openshift-worker-9" deleted

Scaling down the replicas number:
[kni@worker-2 ~]$ oc scale machineset -n openshift-machine-api ocp-edge-cluster-worker-0 --replicas=2
machineset.machine.openshift.io/ocp-edge-cluster-worker-0 scaled

The entry (worker-9) got removed as expected:
[kni@worker-2 ~]$ oc get node
NAME STATUS ROLES AGE VERSION
master-0 Ready master 4d4h v0.0.0-master+$Format:%h$
master-1 Ready master 4d4h v0.0.0-master+$Format:%h$
master-2 Ready master 4d4h v0.0.0-master+$Format:%h$
worker-0 Ready worker 28m v0.0.0-master+$Format:%h$
worker-5 Ready worker 28m v0.0.0-master+$Format:%h$
[kni@worker-2 ~]$ oc get machine -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ocp-edge-cluster-master-0 4d4h
ocp-edge-cluster-master-1 4d4h
ocp-edge-cluster-master-2 4d4h
ocp-edge-cluster-worker-0-d2fvm 156m
ocp-edge-cluster-worker-0-ptklp 4h5m

[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME STATUS PROVISIONING STATUS CONSUMER BMC HARDWARE PROFILE ONLINE ERROR
openshift-master-0 OK externally provisioned ocp-edge-cluster-master-0 ipmi://192.168.123.1:6230 true
openshift-master-1 OK externally provisioned ocp-edge-cluster-master-1 ipmi://192.168.123.1:6231 true
openshift-master-2 OK externally provisioned ocp-edge-cluster-master-2 ipmi://192.168.123.1:6232 true
openshift-worker-0 OK provisioned ocp-edge-cluster-worker-0-d2fvm ipmi://192.168.123.1:6233 unknown true
openshift-worker-5 OK provisioned ocp-edge-cluster-worker-0-ptklp ipmi://192.168.123.1:6245 unknown true

Yet, if I try to connect to the node that got deleted - it's still UP and running.

Expected result:
The removed node should have been powered off automatically.

https://github.com/openshift/baremetal-operator/pull/315

Task SPLAT-1280: [vsphere] update control plane machinset documentation in repo

View the linked PRs

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/267

Bug OCPBUGS-31107: Upload Jar form's Clear button is not functioning

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30073~~. The following is the description of the original issue:
—
Description of problem:

Clear Button in Upload Jar Form is not working, user need to close the form in-order to remove the previous selected JAR file.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Open Upload Jar File form from Add Page
    2. Upload a JAR file
    3. Remove the JAR the file by using clear button

Actual results:

    The selected JAR file is not removed even after using "Clear" button

Expected results:

    The "Clear" button should remove the selected file from the form.

Additional info:

https://github.com/openshift/console/pull/13688

Bug OCPBUGS-35502: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8607

Bug OCPBUGS-10562: Re-enable operator-uninstall.spec.ts

View the Description View the linked PRs

Description of problem:

Business Automation Operands fail to load in uninstall operator modal. With "Cannot load Operands. There was an error loading operands for this operator. Operands will need to be deleted manually..." alert message.

"Delete all operand instances for this operator__checkbox" is not shown so the test fails. 

https://search.ci.openshift.org/?search=Testing+uninstall+of+Business+Automation+Operator&maxAge=168h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13214

Bug OCPBUGS-17534: cgroupv2 memory calculation is not accounted correctly

View the Description View the linked PRs

Description of problem:

https://github.com/kubernetes/kubernetes/issues/118916

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. compare memory usage from v1 and v2 and notice differences with the same workloads
2.
3.

Actual results:

they slightly differ because of accounting differences

Expected results:

they should be largely the same

Additional info:

https://github.com/openshift/kubernetes/pull/1711

Bug OCPBUGS-22213: Links for CodeEditor in console-dynamic-plugin-sdk api docs are returning 404

View the Description View the linked PRs

Description of problem:

Link for CodeEditor component are returning 404.
Check link for options and ref parameters https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#codeeditor

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-22703: Monitor tests are failing in Local Zone jobs (edge nodes)

View the Description View the linked PRs

Description of problem:

The following pre submit jobs for Local Zones are perm failing since August:
- e2e-aws-ovn-localzones: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-installer-master-e2e-aws-ovn-localzones?buildId=1716457254460329984
- e2e-aws-ovn-shared-vpc-localzones: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-installer-master-e2e-aws-ovn-shared-vpc-localzones

Investigating we can see common failures in tests '[sig-network] can collect <poller_name> poller pod logs', leading the most of jobs to not completed correctly for those failures.

Exploring the code I can see it was recently added, near August and matches with when the failures started.

It is required to tolerate the label "node-role.kubernetes.io/edge" to run pods on instances located in Local Zone ("edge nodes"). I am not sure if I am looking in the correct place, but it seems it is tolerating only master labels: https://github.com/openshift/origin/blob/master/pkg/monitortests/network/disruptionpodnetwork/host-network-target-deployment.yaml#L42

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

always

Steps to Reproduce:

trigger the job:
1. open a PR on installer
2. run the job
3. check failed tests '[sig-network] can collect <poller_name> poller pod logs' 

Example of 4.15 blocked feature PR (Wavelength Zones): https://github.com/openshift/installer/pull/7369#issuecomment-1783699175

Actual results:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/7590/pull-ci-openshift-installer-master-e2e-aws-ovn-localzones/1715075142427611136
{  1 pods lacked sampler output: [pod-network-to-pod-network-disruption-poller-d94fb55db-9qfpz]}

E1018 22:06:34.773866       1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close
E1018 22:06:34.774669       1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close

Expected results:

Monitor jobs be scheduled in edge nodes?
How we can track job failures for new monitor tests?

Additional info:

Edge nodes have NoSchedule taints applied by default, to run monitor pods in those nodes you need to tolerate the label "node-role.kubernetes.io/edge"

See the enhancement for more informaation: https://github.com/openshift/enhancements/blob/master/enhancements/installer/aws-custom-edge-machineset-local-zones.md#user-workload-deployments

Looking the must-gather of job 1716457254460329984, you can see the monitor pods not scheduled due the missing tolerations:

$ grep -rni pod-network-to-pod-network-disruption-poller-7c97cd5d7-t2mn2 \
  1716457254460329984-must-gather/09abb0d6fc08ee340563e6e11f5ceafb42fb371e50ab6acee6764031062525b7/namespaces/openshift-kube-scheduler/pods/ \
  | awk -F'] "' '{print$2}' | sort | uniq -c
    215 Unable to schedule pod; no fit; waiting" pod="e2e-pod-network-disruption-test-59s5d/pod-network-to-pod-network-disruption-poller-7c97cd5d7-t2mn2" 
err="0/7 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/edge: }, 
6 node(s) didn't match pod anti-affinity rules. preemption: 0/7 nodes are available: 
1 Preemption is not helpful for scheduling, 6 No preemption victims found for incoming pod.."

https://github.com/openshift/origin/pull/28363

Bug OCPBUGS-24104: Update 4.15 openshift-enterprise-console-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console-operator/pull/818

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/console-operator/pull/818

Bug OCPBUGS-27017: [release-4.15] replace instanceAdmin role with specific compute permissions

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/439

Bug OCPBUGS-31646: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4301

Bug MGMT-16303: Boot after coreos-install fails with 4.15-ec.2 on KVM/s390x

View the Description View the linked PRs

Description of the problem:

The reboot that happens after writing the RHCOS image to the disk fails with 4.15-ec.2 on KVM s390.

How reproducible:

I am not able to reproduced in the qemu s390x emulator. But Amadeus Podvratnik had the issue in real hardware.

Steps to reproduce:

1. Use assisted installer with version 4.15-ec.2 to install to a logical partition.

Actual results:

The installer writes the RHCOS image to the disk, but then fails to boot from it. Instead it boots to the emergency shell and writes this errors to the console:

Nov 27 12:49:49 localhost ostree-prepare-root[1130]: ostree-prepare-root: Couldnn
't find specified OSTree root '/sysroot//ostree/boot.1/rhcos/452f29cc74e701f4f3ff
69e66657fe28788d6c490aa0032c138909b7b2ce429c7/0': No such file or directory
Nov 27 12:49:49 localhost systemd[1]: ostree-prepare-root.service: Main process  
exited, code=exited, status=1/FAILURE
Nov 27 12:49:49 localhost systemd[1]: ostree-prepare-root.service: Failed with rr
esult 'exit-code'.
Nov 27 12:49:49 localhost systemd[1]: Failed to start OSTree Prepare OS/.

Expected results:

Should boot and continue the installation.

https://github.com/openshift/assisted-service/pull/5765

Bug OCPBUGS-18401: String filter on events page doesn't work well

View the Description View the linked PRs

Description of problem:

Go to Home -> Events page, type string in filter field, the events are not filtered. (The search mode is fuzzy search by default)

Version-Release number of selected component (if applicable):

 4.14.0-0.nightly-2023-08-28-154013

How reproducible:

Always

Steps to Reproduce:

1.Go to Home -> Events page, type string in filter field,
2.
3.

Actual results:

1. The events are not filtered.

Expected results:

1. Should filter out events containing the filter string.

Additional info:

Type filter could work on events page.

Bug OCPBUGS-19834: In HCP cluster updating pull-secret in hosted cluster CR on HUB cluster is not reflecting on HCP cluster VMs

View the Description View the linked PRs

Description of problem:

Customer created hosted control plane (HCP of type kubevirt) clusters on Hub OCP cluster

Now for their workload to pull images on HCP cluster



They added  auth for our registries to a secret named "scale-rm-pull-secret" in "clusters" namespace in Hub cluster.And then specified this secret "scale-rm-pull-secret" in hostedcluster CR for HCP in question in hub under namespace "clusters"



They expect this  change to reflect on HCP cluster nodes and images to be pulled successfully. However they keep getting imagepullbackoff error on HCP cluster

PodPibm-spectrum-scale-controller-manager-5cb84655b4-dvnxk

NamespaceNSibm-spectrum-scale-operator

Generated from kubelet on scale-41312-t7nml
2 times in the last 0 minutes
Failed to pull image "icr.io/cpopen/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8": rpc error: code = Unknown desc = (Mirrors also failed: [cp.stg.icr.io/cp/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: Requesting bearer token: invalid status code from registry 400 (Bad Request)] [docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: unable to retrieve auth token: invalid username/password: authentication required]): icr.io/cpopen/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: reading manifest sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8 in icr.io/cpopen/ibm-spectrum-scale-operator: manifest unknown

Customer is able to pull the image manually using same credentials

podman pull docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Pulled the image manually on nodes successfully after logging to registry with same credentials but pod continues to say can not pull image. ANother thing to note is that pod has imagepullpolicy as "ifnotpresent" so after manual pull on all three nodes also why it continue to throw same error
podman pull docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8
Trying to pull docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8...
Getting image source signatures
Copying blob 1e3d9b7d1452 skipped: already exists  
Copying blob fe5ca62666f0 skipped: already exists  
Copying blob e8c73c638ae9 skipped: already exists  
Copying blob fcb6f6d2c998 skipped: already exists  
Copying blob b02a7525f878 skipped: already exists  
Copying blob 4aa0ea1413d3 skipped: already exists  
Copying blob 7c881f9ab25e skipped: already exists  
Copying blob 5627a970d25e skipped: already exists  
Copying blob c7e34367abae skipped: already exists  
Copying blob f92848770344 skipped: already exists  
Copying blob a7ca0d9ba68f skipped: already exists  
Copying config 07120ff2fe done  
Writing manifest to image destination
Storing signatures
07120ff2fe00d6335ef757b33546fc9ec9e3d799a500349343f09228bcdf73c0
sh-5.1# 

PodPibm-spectrum-scale-controller-manager-5cb84655b4-dvnxk
NamespaceNSibm-spectrum-scale-operator
21 Sept 2023, 17:58
Generated from kubelet on scale-41312-t7nml
2 times in the last 0 minutes
Failed to pull image "icr.io/cpopen/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8": rpc error: code = Unknown desc = (Mirrors also failed: [cp.stg.icr.io/cp/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: Requesting bearer token: invalid status code from registry 400 (Bad Request)] [docker-na-public.artifactory.swg-devops.com/sys-spectrum-scale-team-cloud-native-docker-local/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: unable to retrieve auth token: invalid username/password: authentication required]): icr.io/cpopen/ibm-spectrum-scale-operator@sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8: reading manifest sha256:f6138abb5493d7ef6405dcf0a6bb5afc697cca9f20be1a88b3214268b6382da8 in icr.io/cpopen/ibm-spectrum-scale-operator: manifest unknown

https://github.com/openshift/hypershift/pull/3237

Story CONSOLE-3816: Remove Ceph Storage Static Plugin

View the Description View the linked PRs

Ceph storage plugin has moved to it's own repository at https://github.com/red-hat-storage/odf-console
The static plugin has not been used for a few releases and now can be removed safely.

https://github.com/openshift/console/pull/13255

Bug OCPBUGS-21649: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-aws/pull/479

Bug OCPBUGS-25242: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/103

Bug OCPBUGS-31941: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kube-rbac-proxy/pull/109

Bug OCPBUGS-42110: Node sclaling failed due to misconfigurations in on-prem-resolv-prepender.service in RHOCP4

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42109~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42108~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38012. The following is the description of the original issue:
—
Description of problem:

Customers are unable to scale-up the OCP nodes when the initial setup is done with OCP 4.8/4.9 and then upgraded to 4.15.22/4.15.23

At first customer observed that the node scale-up failed and the /etc/resolv.conf was empty on the nodes.
As a workaround, customer copy/paste the resolv.conf content from a correct resolv.conf and then it continued with setting up the new node.

However then they observed the rendered MachineConfig assembled with the 00-worker, and suspected that something can be wrong with the on-prem-resolv-prepender.service service definition.
As a workaround, customer manually changed this service definition which helped them to scale up new nodes.

Version-Release number of selected component (if applicable):

4.15 , 4.16

How reproducible:

100%

Steps to Reproduce:

1. Install OCP vSphere IPI cluster version 4.8 or 4.9
2. Check "on-prem-resolv-prepender.service" service definition
3. Upgrade it to 4.15.22 or 4.15.23
4. Check if the node scaling is working 
5. Check "on-prem-resolv-prepender.service" service definition

Actual results:

Unable to scaleup node with default service definition. After manually making changes in the service definition , scaling is working.

Expected results:

Node sclaing should work without making any manual changes in the service definition.

Additional info:

on-prem-resolv-prepender.service content on the clusters build with 4.8 / 4.9 version and then upgraded to 4.15.22 / 4.25.23 :
~~~
[Unit]
Description=Populates resolv.conf according to on-prem IPI needs
# Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe
After=crio-wipe.service
[Service]
Type=oneshot
Restart=on-failure
RestartSec=10
StartLimitIntervalSec=0
ExecStart=/usr/local/bin/resolv-prepender.sh
EnvironmentFile=/run/resolv-prepender/env
~~~

After manually correcting the service definition as below, scaling works on 4.15.22 / 4.15.23 :
~~~
[Unit]
Description=Populates resolv.conf according to on-prem IPI needs
# Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe
After=crio-wipe.service
StartLimitIntervalSec=0                -----------> this
[Service]
Type=oneshot
#Restart=on-failure                    -----------> this
RestartSec=10
ExecStart=/usr/local/bin/resolv-prepender.sh
EnvironmentFile=/run/resolv-prepender/env
~~~

Below is the on-prem-resolv-prepender.service on a freshly intsalled 4.15.23 where sclaing is working fine :
~~~
[Unit]
Description=Populates resolv.conf according to on-prem IPI needs
# Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe
After=crio-wipe.service
StartLimitIntervalSec=0
[Service]
Type=oneshot
Restart=on-failure
RestartSec=10
ExecStart=/usr/local/bin/resolv-prepender.sh
EnvironmentFile=/run/resolv-prepender/env
~~~

Observed this in the rendered MachineConfig which is assembled with the 00-worker

https://github.com/openshift/machine-config-operator/pull/4621

Bug OCPBUGS-9066: Installer should retry when it fails to download the RHCOS image

View the Description View the linked PRs

Description of problem:
From time to time the installation fails with something like the one below:

2022-01-03 16:33:27.936 | level=debug msg=Generating Terraform Variables...
2022-01-03 16:33:27.940 | level=info msg=Obtaining RHCOS image file from 'https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.8/48.84.202109241901-0/x86_64/rhcos-48.84.202109241901-0-openstack.x86_64.qcow2.gz?sha256=e0a1d8a99c5869150a56b8de475ea7952ca2fa3aacad7ca48533d1176df503ab'
2022-01-03 16:33:27.943 | level=fatal msg=failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": failed to get openstack Terraform variables: Get "https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.8/48.84.202109241901-0/x86_64/rhcos-48.84.202109241901-0-openstack.x86_64.qcow2.gz": dial tcp: lookup rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com on 10.46.0.31:53: read udp 172.16.40.23:38673->10.46.0.31:53: i/o timeout
2022-01-03 16:33:27.946 |

Version:
4.8.0-0.nightly-2021-12-23-010813 but we see it for other versions as well
IPI

I expect the installer to have some sort of retry mechanism.

https://github.com/openshift/installer/pull/7106

Bug OCPBUGS-19135: Update 4.15 ose-cluster-kube-cluster-api-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-operator/pull/24

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-operator/pull/24

Bug OCPBUGS-21670: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-olm-operator/pull/32

Bug OCPBUGS-22075: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/206

Bug OCPBUGS-25424: MCO the content mismatch bug revised when upgrading from 4.13.23 to 4.14.3

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25245~~. The following is the description of the original issue:
—
Description of problem:

    When upgrading cluster from 4.13.23 to 4.14.3, machine-config CO gets stuck due to a content mismatch error on all nodes.

Node node-xxx-xxx is reporting: "unexpected on-disk state
      validating against rendered-master-734521b50f69a1602a3a657419ed4971: content
      mismatch for file \"/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt\""

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. perform a upgrade from 4.13.x to 4.14.x
    2. 
    3.

Actual results:

    machine-config stalls during upgrade

Expected results:

    the "content mismatch" shouldn't happen anymore according to the MCO engineering team

Additional info:

https://github.com/openshift/machine-config-operator/pull/4077

Bug OCPBUGS-36258: kubelet does not start after reboot due to dependency issue

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36198~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33694~~. The following is the description of the original issue:
—
Description of problem:

kubelet does not start after reboot due to dependency issue

Version-Release number of selected component (if applicable):

 OCP 4.14.23

How reproducible:

    Every time at customer end

Steps to Reproduce:

    1. Upgrade Openshift cluster (OVN based) with kdump enabled to OCP 4.14.23
    2. Check kubelet and crio status

Actual results:

    kubelet and crio services are in dead state and do not start automatically after reboot, manual intervention is needed.

$ cat sos_commands/crio/systemctl_status_crio 
○ crio.service - Container Runtime Interface for OCI (CRI-O)
     Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; preset: disabled)
    Drop-In: /etc/systemd/system/crio.service.d
             └─01-kubens.conf, 05-mco-ordering.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
     Active: inactive (dead)
       Docs: https://github.com/cri-o/cri-o$ cat sos_commands/openshift/systemctl_status_kubelet 
○ kubelet.service - Kubernetes Kubelet
     Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─01-kubens.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
     Active: inactive (dead)

Expected results:

    kubelet and crio should start automatically.

Additional info:

I feel the recent patch to wait till kdump starts has broken the ordering cycle.

https://github.com/openshift/machine-config-operator/pull/4213/files

May 09 19:12:05 network01 systemd[1]: network-online.target: Found dependency on kdump.service/start
May 09 19:13:48 network01 systemd[1]: ovs-configuration.service: Found ordering cycle on kdump.service/start
May 09 19:13:48 network01 systemd[1]: ovs-configuration.service: Job kdump.service/start deleted to break ordering cycle starting with ovs-configuration.service/start
May 12 21:20:57 network01 systemd[1]: node-valid-hostname.service: Found dependency on kdump.service/start
May 12 21:21:00 network01 kdumpctl[1389]: kdump: kexec: loaded kdump kernel
May 12 21:21:00 network01 kdumpctl[1389]: kdump: Starting kdump: [OK]
May 12 21:25:28 network01 systemd[1]: kdump.service: Found ordering cycle on network-online.target/start
May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on node-valid-hostname.service/start
May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on ovs-configuration.service/start
May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on kdump.service/start
May 12 21:25:28 network01 systemd[1]: kdump.service: Job network-online.target/start deleted to break ordering cycle starting with kdump.service/start
May 12 21:25:31 network01 kdumpctl[1284]: kdump: kexec: loaded kdump kernel
May 12 21:25:31 network01 kdumpctl[1284]: kdump: Starting kdump: [OK]

To break a cycle, systemd deletes a job part of the cycle, making the corresponding service not to be started.
  Disabling kdump and rebooting the node helps, kubelet and crio start automatically. 

# systemctl disable kdump

# systemctl reboot

Make sure systemctl list-jobs do not have any pending jobs, once it is completed, we can check status of kubelet.

# systemctl list-jobs

# systemctl status kubelet

https://github.com/openshift/machine-config-operator/pull/4441

Bug OCPBUGS-17850: common user can view UWM alertmanager alerts

View the Description View the linked PRs

Description of problem:

enable UWM and enable UWM alertmanager

$ oc -n openshift-monitoring get cm cluster-monitoring-config -oyaml
apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-17T06:02:36Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "259151"
  uid: a9365c21-5c1d-4c91-98ee-f074b023dd31

$ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
apiVersion: v1
data:
  config.yaml: |
    alertmanager:
      enabled: true
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-17T06:02:44Z"
  labels:
    app.kubernetes.io/managed-by: cluster-monitoring-operator
    app.kubernetes.io/part-of: openshift-monitoring
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
  resourceVersion: "148193"
  uid: b3c6e5a6-ff7b-4ae4-85eb-28be683119e4

$ oc -n openshift-user-workload-monitoring get pod
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-user-workload-0           6/6     Running   0          4h50m
alertmanager-user-workload-1           6/6     Running   0          4h50m
prometheus-operator-77bcdcbd9c-7nt6v   2/2     Running   0          6h14m
prometheus-user-workload-0             6/6     Running   0          6h14m
prometheus-user-workload-1             6/6     Running   0          6h14m
thanos-ruler-user-workload-0           4/4     Running   0          4h50m
thanos-ruler-user-workload-1           4/4     Running   0          4h50m

kubeadmin user create namespace and PrometheusRule, the alert could be fired

apiVersion: v1
kind: Namespace
metadata:
  name: ns1
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example-alert
  namespace: ns1
spec:
  groups:
  - name: example
    rules:
    - alert: TestAlert
      expr: vector(1)
      labels:
        severity: none
      annotations:
        message: This is an alert meant to ensure that the entire alerting pipeline is functional.

could see the alerts from UWM alertmanager

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts' | jq
[
  {
    "annotations": {
      "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
    },
    "endsAt": "2023-08-17T12:08:41.558Z",
    "fingerprint": "348490d73f8513a0",
    "receivers": [
      {
        "name": "Default"
      }
    ],
    "startsAt": "2023-08-17T12:04:11.558Z",
    "status": {
      "inhibitedBy": [],
      "silencedBy": [],
      "state": "active"
    },
    "updatedAt": "2023-08-17T12:04:41.583Z",
    "generatorURL": "https://thanos-querier-openshift-monitoring.apps.***/api/graph?g0.expr=vector%281%29&g0.tab=1",
    "labels": {
      "alertname": "TestAlert",
      "namespace": "ns1",
      "severity": "none"
    }
  }
]

open another terminal, or another person execute following commands in his terminal

##### login with common user, deploy pod to project is only for we can use curl command
# oc login https://${api_server}:6443 -u ${user} -p ${password}
# oc new-project test
# oc -n test new-app rails-postgresql-example
# oc -n test get pod
NAME                                  READY   STATUS      RESTARTS   AGE
postgresql-1-deploy                   0/1     Completed   0          13m
postgresql-1-v4lz5                    1/1     Running     0          13m
rails-postgresql-example-1-build      0/1     Completed   0          13m
rails-postgresql-example-1-crdbq      1/1     Running     0          9m20s
rails-postgresql-example-1-deploy     0/1     Completed   0          9m42s
rails-postgresql-example-1-hook-pre   0/1     Completed   0          9m39s
# token=`oc whoami -t`
# echo $token
sha256~EJCVjflM6lbsl8plKkU7Hv0swkQMxySJr5BGXRJaKhU

user could see the alert from UWM alertmanager service

# oc -n test exec postgresql-1-v4lz5 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts'  | jq
[
  {
    "annotations": {
      "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
    },
    "endsAt": "2023-08-17T12:16:56.558Z",
    "fingerprint": "348490d73f8513a0",
    "receivers": [
      {
        "name": "Default"
      }
    ],
    "startsAt": "2023-08-17T12:04:11.558Z",
    "status": {
      "inhibitedBy": [],
      "silencedBy": [],
      "state": "active"
    },
    "updatedAt": "2023-08-17T12:12:56.563Z",
    "generatorURL": "https://thanos-querier-openshift-monitoring.apps.***/api/graph?g0.expr=vector%281%29&g0.tab=1",
    "labels": {
      "alertname": "TestAlert",
      "namespace": "ns1",
      "severity": "none"
    }
  }
]

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-16-114741

How reproducible:

always

Steps to Reproduce:

1. see the description

Actual results:

common user can view UWM alertmanager alerts

Expected results:

Additional info:

if this is expected, we could close the bug

https://github.com/openshift/cluster-monitoring-operator/pull/2099

Bug OCPBUGS-21969: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/361

Bug OCPBUGS-23550: Unable to use oc-mirror on RHEL9 Host with FIPS enabled OCP cluster

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/804

Bug OCPBUGS-31619: Installation fails with 1 master and 2 workers as the console deployment set the number of replicas based on the InfrastructureTopology rather than the ControlPlaneTopology

View the Description View the linked PRs

Description of problem:

The node selector for the console deployment requires deploying it on the master nodes, The node selector for the console deployment requires deploying it on the master nodes, while the replica count is determined by the infrastructureTopology, which primarily tracks the workers' setup.

When an OpenShift cluster is installed with a single master node and multiple workers, this leads the console deployment to request 2 replicas as infrastructureTopology is set to HighlyAvailable. Instead, ControlPlaneTopology is set to SingleReplica as expected.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

    1. Install an openshift cluster with 1 master and 2 workers

Actual results:

The installation fails as the replicas for the console deployment is set to 2.

  apiVersion: config.openshift.io/v1
  kind: Infrastructure
  metadata:
    creationTimestamp: "2024-01-18T08:34:47Z"
    generation: 1
    name: cluster
    resourceVersion: "517"
    uid: d89e60b4-2d9c-4867-a2f8-6e80207dc6b8
  spec:
    cloudConfig:
      key: config
      name: cloud-provider-config
    platformSpec:
      aws: {}
      type: AWS
  status:
    apiServerInternalURI: https://api-int.adstefa-a12.qe.devcluster.openshift.com:6443
    apiServerURL: https://api.adstefa-a12.qe.devcluster.openshift.com:6443
    controlPlaneTopology: SingleReplica
    cpuPartitioning: None
    etcdDiscoveryDomain: ""
    infrastructureName: adstefa-a12-6wlvm
    infrastructureTopology: HighlyAvailable
    platform: AWS
    platformStatus:
      aws:
        region: us-east-2
      type: AWS


apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
   .... 
  creationTimestamp: "2024-01-18T08:54:23Z"
  generation: 3
  labels:
    app: console
    component: ui
  name: console
  namespace: openshift-console
spec:
  progressDeadlineSeconds: 600
  replicas: 2

Expected results:

The replica is set to 1, tracking the ControlPlaneTopology value instead of hte infrastructureTopology.

Additional info:

https://github.com/openshift/console-operator/pull/882

Story HOSTEDCP-1200: Remove no crashing pod test exceptions

View the Description View the linked PRs

User Story:

We should remove all exceptions added over time to https://github.com/openshift/hypershift/blob/860064d33f4729c2db3c68722d0b5a633e6d1bcd/test/e2e/util/util.go#L414

https://github.com/openshift/hypershift/pull/3138

Bug OCPBUGS-18371: Searching for items in quick search is confusing

View the Description View the linked PRs

Description of problem:

In the quick search, if you search for word net you can see two options with the same name and description, one is for the source to image option and the other is for the sample option
but there is no way to differentiate in quick search

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Go to topology or Add page and select quick search
2. Search for net or node you will see confusing options
3.

Actual results:

Similar options with no differentiation in the quick search menu

Expected results:

Some way to differentiate different options in the quick search menu

Additional info:

https://github.com/openshift/console/pull/13381

Bug OCPBUGS-23342: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/191

Bug OCPBUGS-24155: Update 4.15 ose-prometheus-adapter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/k8s-prometheus-adapter/pull/95

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/k8s-prometheus-adapter/pull/95

Task MON-3383: Remove weak MD5 cryptograhic primitive usage to complay with newer CWE

View the Description View the linked PRs

The security team will soon start having the code owners address also CWE (Common Weakness Enumeration). Although this is not a CVE per se it may have security ramifications.

This issue addresses weak MD5 primitive usages in CMO.

https://cwe.mitre.org/data/definitions/1240.html

https://github.com/openshift/cluster-monitoring-operator/pull/2086

Bug OCPBUGS-23248: When a receiver is created for alert notification through web console uses match instead of matchers

View the Description View the linked PRs

Description of problem:

Alert notification receiver created through web console creates receiver with field match which is deprecated instead of matchers and when match is changed to matchers causes Alertmanager pods to crashloopbackoff state throwing the error:
~~~
ts=2023-11-14T08:42:39.694Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="yaml: unmarshal errors:\n  line 51: cannot unmarshal !!map into []string"
~~~

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create alert notification receiver through web console.
Administration-->configuration-->Alertmanager-->create receiver-->add receiver

2. Check the yaml created which would contain route section with match and not matchers.

3. correct the match to matchers and not change the matchers defined like severity or alertname correctly  .

4. Restart the Alertmanager pods which leads to crashloopbackoff state.

Actual results:

Alert notification receiver uses match field

Expected results:

Alert notification receiver should use matchers filed

Additional info:

https://github.com/openshift/console/pull/13358

Bug OCPBUGS-28204: Implement multi-rhel artifact extraction

View the Description View the linked PRs

Description of problem:


The following binaries need to get extracted from the release payload for both rhel8 and rhel9:

oc
ccoctl
opm
openshift-install
oc-mirror

The images that contain these, should produce artifacts of both kinds in some locatiuon, and probably make the artifact of their architecture available under a normal location in path. Example:

/usr/share/<binary>.rhel8
/usr/share/<binary>.rhel9
/usr/bin/<binary>

This ticket is about getting "oc adm release extract" to do the right thing in a backwards compatible way. If both binaries are available get those. If not, get from the old location.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1672

Task OPRUN-3106: Investigate and Fix e2e Failures in Image Update test

View the Description View the linked PRs

This Downstream PR is failing continuously on the Image Update Test, the goal of this task is to identify the root cause and fix it.

https://github.com/openshift/operator-framework-olm/pull/600

Bug OCPBUGS-37729: update owners in whereabouts

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37723~~. The following is the description of the original issue:
—
Backport owners file changes

https://github.com/openshift/whereabouts-cni/pull/305

Bug OCPBUGS-17674: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-apiserver/pull/387

Bug OCPBUGS-19237: Update 4.15 cluster-monitoring-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-monitoring-operator/pull/2084

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-monitoring-operator/pull/2084

Bug OCPBUGS-22225: Remove wildfly docker.io samples

View the Description View the linked PRs

Samples operator in OKD refers to docker.io/openshift/wildfly, which are no longer available. Library sync should update samples to use quay.io links

https://github.com/openshift/cluster-samples-operator/pull/519

Bug OCPBUGS-36382: OCP 4.14.8 responds with RST to all ip fragmented packets arriving to a pod

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29511~~. The following is the description of the original issue:
—
Description of problem:

When external TCP traffic is IP fragmented with no DF flag set and is targeted to a pod external IP, the fragmented packets are responded by RST and are not delivered to the PODs application socket.

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.14.8
Kustomize Version: v5.0.1
Server Version: 4.14.7
Kubernetes Version: v1.27.8+4fab27b

How reproducible:

I built a reproducer for this issue on KVM hosted OCP claster.
I can simulate the same traffic as can be seen in the customer's network.
So we do have a solid reproducer for the issue.
Details are in the JIRA updates.

Steps to Reproduce:
I wrote a simple C-based tcp_server/tcp_client application for testing.
The client simply sends a file towards the server from a networking namespace with
disabled pmtu. The server app runs in a pod and simply waits for connections then reads the data from the socket and stores the received file into /tmp .
There is along the way from the client namespace a veth pair with MTU 1000 since the
path MTU is 1500.
This is enough to get ip packets fragmented along the way from the client to the server.
Details of the setup and testing steps are in the JIRA comments.

Actual results:

$ oc get network.operator -o yaml | grep routingViaHost
routingViaHost: false
All fragmented packets are responded causing a TCP RST and are not delivered to the
application socket in the pod.

Expected results:

Fragmented packets are delivered to the application socket running in a pod with
$ oc get network.operator -o yaml | grep routingViaHost
routingViaHost: false

Additional info:

There is a WA to prevent the issue.
$ oc get network.operator -o yaml | grep routingViaHost
routingViaHost: true
Makes the fragmented traffic arrive at the application socket in the pod.

I can assist with the reproducer and testing on the test env.
Regards Michal Tesar

https://github.com/openshift/ovn-kubernetes/pull/2215

Bug OCPBUGS-19185: Update 4.15 ose-aws-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-aws/pull/477

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-aws/pull/477

Bug OCPBUGS-22195: Capi provider reconciliation from old versions might fail to reconcile

View the Description View the linked PRs

Description of problem:

https://redhat-internal.slack.com/archives/C061SJRTKDG/p1697798046548799
In some ocm envs the latest HO is stuck onreconciliating CAPI provider for some 4.12 HCs

{"level":"error","ts":"2023-10-20T10:53:27Z","msg":"Reconciler error","controller":"hostedcluster","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedCluster","HostedCluster":{"name":"build08","namespace":"ocm-production-23qm3j1pkslelghufgs874g86ccn5sba"},"namespace":"ocm-production-23qm3j1pkslelghufgs874g86ccn5sba","name":"build08","reconcileID":"482f297f-8afb-407c-96d9-bc1de727ef78","error":"failed to reconcile capi provider: failed to reconcile capi provider deployment: Deployment.apps \"capi-provider\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"capi-provider-controller-manager\", \"control-plane\":\"capi-provider-controller-manager\", \"hypershift.openshift.io/control-plane-component\":\"capi-provider-controller-manager\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

reconciliation is tuck

Expected results:

reconciliation succeeds

Additional info:

https://github.com/openshift/hypershift/pull/3108

Bug OCPBUGS-23082: Set automountServiceAccountToken to false for network-node-identity deployment in Hypershift

View the Description View the linked PRs

Description of problem:

From our initial investigation, it seems like the network-node-identity component does not need management cluster access in Hypershift

We were looking at:
https://github.com/openshift/cluster-network-operator/blob/release-4.14/bindata/network/node-identity/managed/node-identity.yaml

For the webhook and approver container: https://github.com/openshift/ovn-kubernetes/blob/release-4.14/go-controller/cmd/ovnkube-identity/ovnkubeidentity.go

For the token minter container: https://github.com/openshift/hypershift/blob/release-4.14/token-minter/tokenminter.go

We also tested by disabling the automountserviceaccounttoken and things still seemed to be functioning

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Deploy a 4.14 hosted cluster
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2100

Bug OCPBUGS-26567: Topology: Chinese translation was broken

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26554~~. The following is the description of the original issue:
—
Chinese translation in topology was invalid, see https://github.com/openshift/console/pull/13458

https://github.com/openshift/console/pull/13498

Bug OCPBUGS-30284: If not required to set the oauthMetadata, oc login can fail with: oidc discovery error: Get "https://:0/.well-known/openid-configuration": dial tcp :0: connect: connection refused

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30124~~. The following is the description of the original issue:
—
Description of problem:
In https://issues.redhat.com/browse/OCPBUGS-28625?focusedId=24056681&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-24056681 , Seth Jennings states "It is not required to set the oauthMetadata to enable external OIDC".

Today having a chance to try without setting oauthMetadata, hit oc login fails with the error:

$ oc login --exec-plugin=oc-oidc --client-id=$CLIENT_ID --client-secret=$CLIENT_SECRET_VALUE --extra-scopes=email --callback-port=8080
error: oidc authenticator error: oidc discovery error: Get "https://:0/.well-known/openid-configuration": dial tcp :0: connect: connection refused
error: oidc authenticator error: oidc discovery error: Get "https://:0/.well-known/openid-configuration": dial tcp :0: connect: connection refused
Unable to connect to the server: getting credentials: exec: executable oc failed with exit code 1

Console login can succeed, though.

Note, OCM QE also encounters this when using ocm cli to test ROSA HCP external OIDC. Either oc or HCP, or anywhere (as a tester I'm not sure TBH ), worthy to have a fix, otherwise oc login is affected.

Version-Release number of selected component (if applicable):

[xxia@2024-03-01 21:03:30 CST my]$ oc version --client
Client Version: 4.16.0-0.ci-2024-03-01-033249
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
[xxia@2024-03-01 21:03:50 CST my]$ oc get clusterversion
NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.ci-2024-02-29-213249   True        False         8h      Cluster version is 4.16.0-0.ci-2024-02-29-213249

How reproducible:

Always

Steps to Reproduce:

1. Launch fresh HCP cluster.

2. Login to https://entra.microsoft.com. Register application and set properly.

3. Prepare variables.
HC_NAME=hypershift-ci-267920
MGMT_KUBECONFIG=/home/xxia/my/env/xxia-hs416-2-267920-4.16/kubeconfig
HOSTED_KUBECONFIG=/home/xxia/my/env/xxia-hs416-2-267920-4.16/hypershift-ci-267920.kubeconfig
AUDIENCE=7686xxxxxx
ISSUER_URL=https://login.microsoftonline.com/64dcxxxxxxxx/v2.0
CLIENT_ID=7686xxxxxx
CLIENT_SECRET_VALUE="xxxxxxxx"
CLIENT_SECRET_NAME=console-secret

4. Configure HC without oauthMetadata.
[xxia@2024-03-01 20:29:21 CST my]$ oc create secret generic console-secret -n clusters --from-literal=clientSecret=$CLIENT_SECRET_VALUE --kubeconfig $MGMT_KUBECONFIG

[xxia@2024-03-01 20:34:05 CST my]$ oc patch hc $HC_NAME -n clusters --kubeconfig $MGMT_KUBECONFIG --type=merge -p="
spec:
  configuration: 
    authentication: 
      oauthMetadata:
        name: ''
      oidcProviders:
      - claimMappings:
          groups:
            claim: groups
            prefix: 'oidc-groups-test:'
          username:
            claim: email
            prefixPolicy: Prefix
            prefix:
              prefixString: 'oidc-user-test:'
        issuer:
          audiences:
          - $AUDIENCE
          issuerURL: $ISSUER_URL
        name: microsoft-entra-id
        oidcClients:
        - clientID: $CLIENT_ID
          clientSecret:
            name: $CLIENT_SECRET_NAME
          componentName: console
          componentNamespace: openshift-console
      type: OIDC
"

Wait pods to renew:
[xxia@2024-03-01 20:52:41 CST my]$ oc get po -n clusters-$HC_NAME --kubeconfig $MGMT_KUBECONFIG --sort-by metadata.creationTimestamp
...
certified-operators-catalog-7ff9cffc8f-z5dlg          1/1     Running   0          5h44m
kube-apiserver-6bd9f7ccbd-kqzm7                       5/5     Running   0          17m
kube-apiserver-6bd9f7ccbd-p2fw7                       5/5     Running   0          15m
kube-apiserver-6bd9f7ccbd-fmsgl                       5/5     Running   0          13m
openshift-apiserver-7ffc9fd764-qgd4z                  3/3     Running   0          11m
openshift-apiserver-7ffc9fd764-vh6x9                  3/3     Running   0          10m
openshift-apiserver-7ffc9fd764-b7znk                  3/3     Running   0          10m
konnectivity-agent-577944765c-qxq75                   1/1     Running   0          9m42s
hosted-cluster-config-operator-695c5854c-dlzwh        1/1     Running   0          9m42s
cluster-version-operator-7c99cf68cd-22k84             1/1     Running   0          9m42s
konnectivity-agent-577944765c-kqfpq                   1/1     Running   0          9m40s
konnectivity-agent-577944765c-7t5ds                   1/1     Running   0          9m37s

5. Check console login and oc login.
$ export KUBECONFIG=$HOSTED_KUBECONFIG
$ curl -ksS $(oc whoami --show-server)/.well-known/oauth-authorization-server
{
"issuer": "https://:0",
"authorization_endpoint": "https://:0/oauth/authorize",
"token_endpoint": "https://:0/oauth/token",
...
}
Check console login, it succeeds, console upper right shows correctly user name oidc-user-test:xxia@redhat.com.

Check oc login:
$ rm -rf ~/.kube/cache/oc/
$ oc login --exec-plugin=oc-oidc --client-id=$CLIENT_ID --client-secret=$CLIENT_SECRET_VALUE --extra-scopes=email --callback-port=8080
error: oidc authenticator error: oidc discovery error: Get "https://:0/.well-known/openid-configuration": dial tcp :0: connect: connection refused
error: oidc authenticator error: oidc discovery error: Get "https://:0/.well-known/openid-configuration": dial tcp :0: connect: connection refused
Unable to connect to the server: getting credentials: exec: executable oc failed with exit code 1

Actual results:

Console login succeeds. oc login fails.

Expected results:

oc login should also succeed.

Additional info:{}

https://github.com/openshift/hypershift/pull/3694

Bug OCPBUGS-24144: Update 4.15 marketplace-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/operator-framework/operator-marketplace/pull/553

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/operator-framework/operator-marketplace/pull/553

Bug OCPBUGS-36287: [4.15] Failed to pull/push blob from/to image registry on external OIDC cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35567~~. The following is the description of the original issue:
—
This is a clone of ~~OCPBUGS-35335~~.

Description of problem:

user.openshift.io and oauth.openshift.io APIs are not unavailable in external oidc cluster, that conducts all the common pull/push blob from/to image registry failed.

Version-Release number of selected component (if applicable):

4.15.15

How reproducible:

always

Steps to Reproduce:

1.Create a ROSA HCP cluster which configured external oidc users
2.Push data to image registry under a project
oc new-project wxj1
oc new-build httpd~https://github.com/openshift/httpd-ex.git 
3.

Actual results:

$ oc logs -f build/httpd-ex-1
Cloning "https://github.com/openshift/httpd-ex.git" ...	Commit:	1edee8f58c0889616304cf34659f074fda33678c (Update httpd.json)	Author:	Petr Hracek <phracek@redhat.com>	Date:	Wed Jun 5 13:00:09 2024 +0200time="2024-06-12T09:55:13Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"I0612 09:55:13.306937       1 defaults.go:112] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].Caching blobs under "/var/cache/blobs".Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/httpd@sha256:765aa645587f34e310e49db7cdc97e82d34122adb0b604eea891e0f98050aa77...Warning: Pull failed, retrying in 5s ...Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/httpd@sha256:765aa645587f34e310e49db7cdc97e82d34122adb0b604eea891e0f98050aa77...Warning: Pull failed, retrying in 5s ...Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/httpd@sha256:765aa645587f34e310e49db7cdc97e82d34122adb0b604eea891e0f98050aa77...Warning: Pull failed, retrying in 5s ...error: build error: After retrying 2 times, Pull image still failed due to error: unauthorized: unable to validate token: NotFound


oc logs -f deploy/image-registry -n openshift-image-registry

time="2024-06-12T09:55:13.36003996Z" level=error msg="invalid token: the server could not find the requested resource (get users.user.openshift.io ~)" go.version="go1.20.12 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=0c380b81-99d4-4118-8de3-407706e8767c http.request.method=GET http.request.remoteaddr="10.130.0.35:50550" http.request.uri="/openshift/token?account=serviceaccount&scope=repository%3Aopenshift%2Fhttpd%3Apull" http.request.useragent="containers/5.28.0 (github.com/containers/image)"

Expected results:

Should pull/push blob from/to image registry on external oidc cluster

Additional info:

https://github.com/openshift/image-registry/pull/406

Bug MGMT-15306: [Staging] [BE] - adding vips manually and then change network to UMN - getting error from BE

View the Description View the linked PRs

Description of the problem:

In staging, BE 2.23.0 - adding API and Ingress VIPs manually, and then change network to UMN,. BE response with an error "User Managed Networking cannot be set with API VIP"
Had a talk with Nir Magnezi about this. we should add ability to BE to delete VIPs from DB, if api gets such a request
This is in continue to

MGMT-14416

~~MGMT-15117~~

How reproducible:

Steps to reproduce:

1. add api and ingress VIPs manually

2. Change network to UMN

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5462

Bug OCPBUGS-19145: Update 4.15 ose-openshift-apiserver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-apiserver/pull/390

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-apiserver/pull/390

Bug OCPBUGS-19836: duplicated log at addOrUpdateSubnet

View the Description View the linked PRs

Description of problem:

There are some duplicated logs originating from calling addOrUpdateSubnet twice, this is missleading.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. Start it up
2. Check logs.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1911

Bug OCPBUGS-20503: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/585

Bug OCPBUGS-23649: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-state-metrics/pull/109

Bug OCPBUGS-20499: openshift-gcp-routes.sh exits prematurely, causing critical systemd service restarts

View the Description View the linked PRs

This test triggers failures shortly after node reboot. Of course the node isn't ready, it rebooted.

: [sig-node] nodes should not go unready after being upgraded and go unready only once

{ 1 nodes violated upgrade expectations: Node ci-op-q38yw8yd-8aaeb-lsqxj-master-0 went unready multiple times: 2023-10-11T21:58:45Z, 2023-10-11T22:05:45Z Node ci-op-q38yw8yd-8aaeb-lsqxj-master-0 went ready multiple times: 2023-10-11T21:58:46Z, 2023-10-11T22:07:18Z }

Both of those times, the master-0 was rebooted or being rebooted.

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/2060/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1712203703311667200

https://github.com/openshift/machine-config-operator/pull/3977

Bug OCPBUGS-30118: namespace "openshift-cluster-api" not found in CustomNoUpgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29676~~. The following is the description of the original issue:
—
Description of problem:

    capi-based installer failing with missing openshift-cluster-api namespace

Version-Release number of selected component (if applicable):

How reproducible:

Always in CustomNoUpgrade

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Install failure

Expected results:

    namespace create, install succeeds or does not error on missing namespace

Additional info:

https://github.com/openshift/cluster-capi-operator/pull/167

Bug OCPBUGS-32013: Image Registry ClusterOperator not progressing on Azure Hosted Control Planes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30060~~. The following is the description of the original issue:
—
Description of problem:

  The image registry CO is not progressing on Azure Hosted Control Planes

Version-Release number of selected component (if applicable):

How reproducible:

    Every time

Steps to Reproduce:

    1. Create an Azure HCP
    2. Create a kubeconfig for the guest cluster
    3. Check the image-registry CO

Actual results:

    image-registry co's message is Progressing: The registry is ready...

Expected results:

    image-registry finishes progressing

Additional info:

    I let it go for about 34m

% oc get co | grep -i image
image-registry                             4.16.0-0.nightly-multi-2024-02-26-105325   True        True          False      34m     Progressing: The registry is ready...

% oc get co/image-registry -oyaml
...
  - lastTransitionTime: "2024-02-28T19:10:30Z"
    message: |-
      Progressing: The registry is ready
      NodeCADaemonProgressing: The daemon set node-ca is deployed
      AzurePathFixProgressing: The job does not exist
    reason: AzurePathFixNotFound::Ready
    status: "True"
    type: Progressing

https://github.com/openshift/hypershift/pull/3853

Bug OCPBUGS-33548: [release-4.15] Masthead logo no longer restricted to a max-height of 60px

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33523~~. The following is the description of the original issue:
—
With the change from PatternFly's `PageHeader` to `Masthead`, there is no longer a max-height of 60px restricting the size of the masthead logo. As a result, logos that are larger than 60px high display at their native size and cause the masthead to get taller (see https://drive.google.com/file/d/11enMtMU1cfzXQqRfd0eTdsKFkBVPWoFc/view?usp=sharing). This went unnoticed in the change because OpenShift and OKD logos are sized appropriately for the masthead and do not need the restriction. Further, the docs state a custom logo "is constrained to a max-width of 200px and a max-height` of 68px.", which is a separate bug that needs to be addressed (should read "is constrained to a max-height of 60px").

https://github.com/openshift/console/pull/13839

Bug OCPBUGS-34971: [release-4.15] control-plane-machine-set goes Available=False with UnavailableReplicas during updates

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20061~~. The following is the description of the original issue:
—

Description of problem:

Possibly reviving ~~OCPBUGS-10771~~, the control-plane-machine-set ClusterOperator occasionally goes Available=False with reason=UnavailableReplicas. For example, this run includes:

: [bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available expand_less	1h34m30s
{  3 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:

Oct 03 22:03:29.822 - 106s  E clusteroperator/control-plane-machine-set condition/Available reason/UnavailableReplicas status/False Missing 1 available replica(s)
Oct 03 22:08:34.162 - 98s   E clusteroperator/control-plane-machine-set condition/Available reason/UnavailableReplicas status/False Missing 1 available replica(s)
Oct 03 22:13:01.645 - 118s  E clusteroperator/control-plane-machine-set condition/Available reason/UnavailableReplicas status/False Missing 1 available replica(s)

But those are the nodes rebooting into newer RHCOS, and do not warrant immediate admin intervention. Teaching the CPMS operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required.

Version-Release number of selected component (if applicable):

4.15. Possibly all supported versions of the CPMS operator have this exposure.

How reproducible:

Looks like many (all?) 4.15 update jobs have near 100% reproducibility for some kind of issue with CPMS going Available=False, see Actual results below. These are likely for reasons that do not require admin intervention, although figuring that out is tricky today, feel free to push back if you feel that some of these do warrant admin immediate admin intervention.

Steps to Reproduce:

w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/control-plane-machine-set+should+not+change+condition/Available' | grep '^periodic-.*4[.]15.*failures match' | sort

Actual results:

periodic-ci-openshift-cluster-etcd-operator-release-4.15-periodics-e2e-aws-etcd-recovery (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 19 runs, 42% failed, 225% of failures match = 95% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-aws-ovn-arm64 (all) - 18 runs, 61% failed, 127% of failures match = 78% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-e2e-aws-sdn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 19 runs, 47% failed, 200% of failures match = 95% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-sdn-arm64 (all) - 9 runs, 78% failed, 114% of failures match = 89% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade (all) - 11 runs, 64% failed, 143% of failures match = 91% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 70 runs, 41% failed, 207% of failures match = 86% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-upgrade (all) - 7 runs, 43% failed, 200% of failures match = 86% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn (all) - 6 runs, 50% failed, 33% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 71 runs, 24% failed, 382% of failures match = 92% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 70 runs, 30% failed, 281% of failures match = 84% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 8 runs, 50% failed, 175% of failures match = 88% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 71 runs, 38% failed, 233% of failures match = 89% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 69 runs, 49% failed, 171% of failures match = 84% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade (all) - 7 runs, 57% failed, 175% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-sdn-upgrade (all) - 6 runs, 33% failed, 250% of failures match = 83% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 63 runs, 37% failed, 222% of failures match = 81% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-sdn-upgrade (all) - 6 runs, 33% failed, 250% of failures match = 83% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 7 runs, 43% failed, 233% of failures match = 100% impact
periodic-ci-openshift-release-master-okd-4.15-e2e-aws-ovn-upgrade (all) - 13 runs, 54% failed, 100% of failures match = 54% impact
periodic-ci-openshift-release-master-okd-scos-4.15-e2e-aws-ovn-upgrade (all) - 16 runs, 63% failed, 90% of failures match = 56% impact

Expected results:

CPMS goes Available=False if and only if immediate admin intervention is appropriate.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/299

Bug OCPBUGS-36150: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-openshift-apiserver-operator)

View the Description View the linked PRs

Backport to 4.15 of ~~OCPBUGS-35007~~ specifically for the cluster-openshift-apiserver-operator.

All workloads of the following namespaces need SCC pinning:

openshift-apiserver-operator

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/581

Bug OCPBUGS-17851: CPMS assumes diff on empty zone in Azure

View the Description View the linked PRs

Description of problem:


When we merged https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/229, it changed the way failure domains were injected for Azure so that additional fields could be accounted for. However, the CPMS failure domains have Azure zones as a string (which they should be) and the machine v1beta1 spec has them as a string pointer.

This means now that the CPMS is detecting the difference between the a nil zone and an empty string, even though every other piece of code in openshift treats them the same.

We should update the machine v1beta1 type to remove the pointer. This will be a no-op in terms of the data stored in etcd since the type is unstructured anyway.

It will then require updates to the MAPZ, CPMS, MAO and installer repositories to update their generation.

Version-Release number of selected component (if applicable):

4.14 nightlies from the merge of 229 onwards

How reproducible:

This is only affecting regions in Azure where there are no zones, currently in CI it's affecting about 20% of events.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/234

Bug OCPBUGS-19406: Fix script rh-manifest.sh in Openshift/Thanos

View the Description View the linked PRs

Description of problem:

The script rh-manifest.sh in Openshift/Thanos stops working, generating empty dependency list.

Version-Release number of selected component (if applicable):

How reproducible:

Run  script/rh-manifest.sh in Openshift/Thanos and check rh-manifest.txt.

Steps to Reproduce:

1.
2.
3.

Actual results:

The generated rh-manifest.txt is empty.

Expected results:

The generated rh-manifest.txt should list Javascript dependencies.

Additional info:

https://github.com/openshift/thanos/pull/120

Bug OCPBUGS-2117: [gcp] pre-emptible VM: machine-api-termination-handler not marking instance for deletion

View the Description View the linked PRs

Description of problem:

GCP preemptible VM termination is not being handled correctly by machine-api-termination-handler.

Version-Release number of selected component (if applicable):

Tested on both 4.10.22 and 4.11.2

How reproducible:

To reproduce the issue:

Create spot instance machine in gcp. Stop instance, notice in machine-api-termination-handler pod there is no signal in there signifying it was terminated. Note we do see on machines list the TERMINATED status. Result is that pods are not gracefully moved off in the 90sec window before node is turned off.

We would expect a terminated node to wait for pods to move off (up to 90sec) and then shutdown, instead of an immediate shutdown of the node.

Steps to Reproduce:

1. Create spot instance machine in gcp. 
2. Stop instance
3. Notice in machine-api-termination-handler pod there is no signal in there signifying it was terminated.
4. Note we do see on machines list the TERMINATED status. 
5. Result is that pods are not gracefully moved off in the 90sec window before node is turned off.

Actual results:

The machine-api-termination-handler logs don't show any message such as "Instance marked for termination, marking Node for deletion" but instead no signal is received from GCP.

Expected results:

A terminated node should wait for pods to move off (up to 90sec) and then shutdown, instead of an immediate shutdown of the node.

Additional info:
Here is the code:
https://github.com/openshift/machine-api-provider-gcp/blob/main/pkg/termination/termination.go#L96-L127

#forum-cloud slack thread:
https://coreos.slack.com/archives/CBZHF4DHC/p1656524730323259

#forum-node slack thread:
https://coreos.slack.com/archives/CK1AE4ZCK/p1656619821630479

https://github.com/openshift/machine-api-provider-gcp/pull/71

Bug OCPBUGS-28841: PowerVS: Add dal12 region

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28643~~. The following is the description of the original issue:
—
Description of problem:

There is a new zone in PowerVS called dal12.  We need to add this zone to the list of supported zones in the installer.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Deploy OpenShift cluster to the zone
    2.
    3.

Actual results:

Fails

Expected results:

Works

Additional info:

https://github.com/openshift/installer/pull/7977

Task MON-3286: Remove legacy code for 4.13->4.14 etcd ServiceMonitor migration

View the Description View the linked PRs

The code https://github.com/openshift/cluster-monitoring-operator/blob/91d735bd8662965037aae60c846c53baa79752ac/pkg/tasks/controlplane.go#L79-L93 makes sure CMO delete the resources it used to manage.

The code was temporarily added in https://github.com/openshift/cluster-monitoring-operator/pull/2039/files

https://github.com/openshift/cluster-monitoring-operator/pull/2116

Bug OCPBUGS-26515: [vsphere] IPI destroy cluster failed to delete TagCategory

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25841~~. The following is the description of the original issue:
—
Description of problem:

After running ./openshift-install destroy cluster, TagCategory still exist

# ./openshift-install destroy cluster --dir cluster --log-level debug
DEBUG OpenShift Installer 4.15.0-0.nightly-2023-12-18-220750
DEBUG Built from commit 2b894776f1653ab818e368fa625019a6de82a8c7
DEBUG Power Off Virtual Machines
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-master-2
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-master-1
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-master-0
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-worker-0-kpg46
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-worker-0-w5rrn
DEBUG Delete Virtual Machines
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-rhcos-generated-region-generated-zone
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-master-2
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-master-1
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-master-0
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-worker-0-kpg46
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-worker-0-w5rrn
DEBUG Delete Folder
INFO Destroyed                                     Folder=sgao-devqe-spn2w
DEBUG Delete                                        StoragePolicy=openshift-storage-policy-sgao-devqe-spn2w
INFO Destroyed                                     StoragePolicy=openshift-storage-policy-sgao-devqe-spn2w
DEBUG Delete                                        Tag=sgao-devqe-spn2w
INFO Deleted                                       Tag=sgao-devqe-spn2w
DEBUG Delete                                        TagCategory=openshift-sgao-devqe-spn2w
INFO Deleted                                       TagCategory=openshift-sgao-devqe-spn2w
DEBUG Purging asset "Metadata" from disk
DEBUG Purging asset "Master Ignition Customization Check" from disk
DEBUG Purging asset "Worker Ignition Customization Check" from disk
DEBUG Purging asset "Terraform Variables" from disk
DEBUG Purging asset "Kubeconfig Admin Client" from disk
DEBUG Purging asset "Kubeadmin Password" from disk
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk
DEBUG Purging asset "Cluster" from disk
INFO Time elapsed: 29s
INFO Uninstallation complete!

# govc tags.category.ls | grep sgao
openshift-sgao-devqe-spn2w

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-18-220750

How reproducible:

    always

Steps to Reproduce:

    1. IPI install OCP on vSphere
    2. Destroy cluster installed, check TagCategory

Actual results:

    TagCategory still exist

Expected results:

    TagCategory should be deleted

Additional info:

    Also reproduced in openshift-install-linux-4.14.0-0.nightly-2023-12-20-184526,4.13.0-0.nightly-2023-12-21-194724, while 4.12.0-0.nightly-2023-12-21-162946 have not this issue

https://github.com/openshift/installer/pull/7885

Bug OCPBUGS-27217: [4.15] Add suite to openshift origin

View the Description View the linked PRs

Description of problem:

Backport of live migration suite in origin to 4.15

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28524

Bug OCPBUGS-32860: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/740

Bug OCPBUGS-33404: Assisted installer reports installed SATA SDDs are removable and hangs the installation

View the Description View the linked PRs

Description of problem:

OpenShift Assisted Installer reporting Dell PowerEdge C6615 node’s four 960GB SATA Solid State Disks as removable and subsequently refusing to continue installation of OpenShift on to at least one of those Disks.
This issue is where by OpenShift agent installer reports installed SATA SDDs are removable and refuses to use any of them as installation targets.

Linux Kernel reports:
sd 4:0:0:0 [sdb] Attached SCSI removable disk
sd 5:0:0:0 [sdc] Attached SCSI removable disk
sd 6:0:0:0 [sdd] Attached SCSI removable disk
sd 3:0:0:0 [sda] Attached SCSI removable disk
Each removable disk is clean, 894.3GiB  free space, no partitions etc.

However - Insufficient
This host does not meet the minimum hardware or networking requirements and will not be included in the cluster.
Hardware: 
Failed     
  Warning alert:    
    Insufficient        
Minimum disks of required size: No eligible disks were found, please check specific disks to see why they are not eligible.

Version-Release number of selected component (if applicable):

   4.15.z

How reproducible:

    100 %

Steps to Reproduce:

    1. Install with assisted Installer
    2. Generate ISO using option over console.
    3. Boot the ISO on dell HW mentioned in description
    4. Observe journal logs for disk validations

Actual results:

    Installation fails at disk validation

Expected results:

    Installation should complete

Additional info:

https://github.com/openshift/assisted-installer-agent/pull/718

Task HOSTEDCP-1232: Prepare Hypershift for CAPI v1.5+

View the Description View the linked PRs

prepare Hypershift for the CAPI bump to v1.5.2 https://github.com/openshift/cluster-api/pull/181 so that hypershift-e2e can pass.

https://github.com/openshift/hypershift/pull/3087

Bug OCPBUGS-22104: Clicking on an log based alerts redirects to prometheus metrics

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

OCP 4.14

Logging 5.8

How reproducible:

Always

Steps to Reproduce:

Install CLO and Loki operator with log based alerts enabled
Check on observe -> alerts and select a log based alert
Click on the metric displayed in the alert detail page

Actual results:

The user is redirected to Observe -> metrics, and the chart does not display any metrics as they are not stored in prometheus

Expected results:

The user should be redirected to Observe -> Logs, and the metric should be displayed instead of the log list: see ~~OU-267~~

https://github.com/openshift/monitoring-plugin/pull/78

Bug OCPBUGS-23921: CCM uses MC's KAS instead of HC's KAS

View the Description View the linked PRs

Description of problem:

    The way CCM is deployed, it gets the kubeconfig configuration from the environment it runs on, which is the Management cluster. Thus, it communicates with the Kubernetes Api Server (KAS) of the Management Cluster (MC) instead of the KAS of the Hosted Cluster it is part of.

Version-Release number of selected component (if applicable):

    4.15.0

How reproducible:

    100%

Steps to Reproduce:

    1. Deploy a hosted cluster
    2. oc debug to the node running the HC CCM
    3. crictl ps -a to list all the containers
    4. crictl inspect X  # Where X is the container id of the CCM container
    5. nsenter -n -t pid_of_ccm_container
    6. tcpdump

Actual results:

    Communication goes to MC KAS

Expected results:

    Communication goes to HC KAS

Additional info:

https://github.com/openshift/hypershift/pull/3222

Bug OCPBUGS-23947: [azure] Fail to create cluster on existing vnet on MAG and ASH

View the Description View the linked PRs

When creating cluster on existing vnet on MAG and ASH, installer failed and threw out the error:

11-27 13:42:03.944  level=info msg=Creating infrastructure resources...
11-27 13:42:04.502  level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to get the virtual network "jima27maga-vnet": GET https://management.azure.com/subscriptions/8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7/resourceGroups/jima27maga-rg/providers/Microsoft.Network/virtualNetworks/jima27maga-vnet
11-27 13:42:04.503  level=fatal msg=--------------------------------------------------------------------------------
11-27 13:42:04.503  level=fatal msg=RESPONSE 404: 404 Not Found
11-27 13:42:04.503  level=fatal msg=ERROR CODE: SubscriptionNotFound
11-27 13:42:04.503  level=fatal msg=--------------------------------------------------------------------------------
11-27 13:42:04.503  level=fatal msg={
11-27 13:42:04.503  level=fatal msg=  "error": {
11-27 13:42:04.503  level=fatal msg=    "code": "SubscriptionNotFound",
11-27 13:42:04.503  level=fatal msg=    "message": "The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found."
11-27 13:42:04.504  level=fatal msg=  }
11-27 13:42:04.504  level=fatal msg=}
11-27 13:42:04.504  level=fatal msg=--------------------------------------------------------------------------------
11-27 13:42:04.504  level=fatal

During destroying cluster, got below error when removing shared tags.

$ ./openshift-install destroy cluster --dir ipi --log-level debug
DEBUG OpenShift Installer 4.15.0-0.nightly-2023-11-25-110147 
DEBUG Built from commit 1ea1a54a197501cdbda71196c7fac744f835217f 
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal_gov.json" 
DEBUG deleting public records                      
WARNING no DNS records found: either they were already deleted or the service principal lacks permissions to list them 
DEBUG deleting resource group                      
INFO deleted                                       resource group=jima761122c-264bb-rg
DEBUG deleting application registrations           
DEBUG failed to query resources with shared tag: POST https://management.azure.com/providers/Microsoft.ResourceGraph/resources 
DEBUG -------------------------------------------------------------------------------- 
DEBUG RESPONSE 400: 400 Bad Request                
DEBUG ERROR CODE: BadRequest                       
DEBUG -------------------------------------------------------------------------------- 
DEBUG {                                            
DEBUG   "error": {                                 
DEBUG     "code": "BadRequest",                    
DEBUG     "message": "Please provide below info when asking for support: timestamp = 2023-11-27T06:25:26.3355852Z, correlationId = b4dfd555-86b0-4e68-aec7-f75cd7307c69.", 
DEBUG     "details": [                             
DEBUG       {                                      
DEBUG         "code": "NoValidSubscriptionsInQueryRequest", 
DEBUG         "message": "There must be at least one subscription that is eligible to contain resources. Given: '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7'." 
DEBUG       }                                      
DEBUG     ]                                        
DEBUG   }                                          
DEBUG }                                            
DEBUG -------------------------------------------------------------------------------- 
DEBUG                                              
FATAL Failed to destroy cluster: failed to remove shared tags: failed to query resources with shared tag: POST https://management.azure.com/providers/Microsoft.ResourceGraph/resources 
FATAL -------------------------------------------------------------------------------- 
FATAL RESPONSE 400: 400 Bad Request                
FATAL ERROR CODE: BadRequest                       
FATAL -------------------------------------------------------------------------------- 
FATAL {                                            
FATAL   "error": {                                 
FATAL     "code": "BadRequest",                    
FATAL     "message": "Please provide below info when asking for support: timestamp = 2023-11-27T06:25:26.3355852Z, correlationId = b4dfd555-86b0-4e68-aec7-f75cd7307c69.", 
FATAL     "details": [                             
FATAL       {                                      
FATAL         "code": "NoValidSubscriptionsInQueryRequest", 
FATAL         "message": "There must be at least one subscription that is eligible to contain resources. Given: '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7'." 
FATAL       }                                      
FATAL     ]                                        
FATAL   }                                          
FATAL }                                            
FATAL -------------------------------------------------------------------------------- 
FATAL

Issue should be introduced by https://github.com/openshift/installer/pull/7611/, since all accepted nightly builds on 4.15 contains PR#7611, it is unable to verify on previous payloads, but checked Prow CI jobs, installation succeeded with 4.15.0-0.nightly-2023-11-20-045323.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-25-110147

How reproducible:

always

Steps to Reproduce:

1. Install cluster on existing vnet on MAG and ASH

Actual results:

Installation failed.

Expected results:

Installation succeeded.

https://github.com/openshift/installer/pull/7768

Bug OCPBUGS-19381: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3024

Bug OCPBUGS-24162: Update 4.15 ose-cluster-kube-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/772

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/772

Bug OCPBUGS-25856: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/107

Bug OCPBUGS-45955: Massive service disruption during OpenShift Container Platform 4.16 upgrade due to OVS table flush

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-45806~~. The following is the description of the original issue:
—
Description of problem:
During the OpenShift Container Platform 4.16 upgrade, when openshift-sdn pods are rolling out, the OVS table are being flushed, causing all ports for existing pods to be re-created.

The OVS table flush is happening because of a flowVersion change that is required for some effort around the Limited Live Migration (similar changes were done in the past).

Apparently for some customers, this flush is causing massive service disruption with, impacting production services for multiple minutes until they are recovering and are back into fully functional state.

Such an impact in production is not acceptable and needs to be investigated to provide guidance, how disruption can be lowered/minimized.

Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.16

How reproducible:
Random

Steps to Reproduce:
1. Upgrade OpenShift Container Platform 4 with OpenShiftSDN from OpenShift Container Platform 4.15 to 4.16
2. Observe application for failing probes and how long it takes them to recover (probes should rely on other services)

Actual results:
Many services are reporting probe failures for multiple minutes until they recover or are being forefully re-created.

Dec 04 13:26:22 worker-01 kubenswrapper[1980]: I1204 13:26:22.859373    1980 prober.go:107] "Probe failed" probeType="Liveness" pod="namespace/pod-abcde" podUID=4cbc08e5-16e3-491e-a1db-ce44ca0410c5 containerName="container" probeResult=failure output="Get \"http://10.1.1.25:5001/healthz\": dial tcp 10.1.1.25:5001: connect: no route to host"
Dec 04 13:26:22 worker-01 kubenswrapper[1980]: I1204 13:26:22.859378    1980 prober.go:107] "Probe failed" probeType="Readiness" pod="namespace/pod-abcde" podUID=4cbc08e5-16e3-491e-a1db-ce44ca0410c5 containerName="container" probeResult=failure output="Get \"http://10.1.1.25:5001/healthz\": dial tcp 10.1.1.25:5001: connect: no route to host"
Dec 04 13:26:25 worker-01 kubenswrapper[1980]: I1204 13:26:25.931213    1980 prober.go:107] "Probe failed" probeType="Liveness" pod="namespace/pod-abcde" podUID=4cbc08e5-16e3-491e-a1db-ce44ca0410c5 containerName="container" probeResult=failure output="Get \"http://10.1.1.25:5001/healthz\": dial tcp 10.1.1.25:5001: connect: no route to host"
Dec 04 13:26:25 worker-01 kubenswrapper[1980]: I1204 13:26:25.931265    1980 prober.go:107] "Probe failed" probeType="Readiness" pod="namespace/pod-abcde" podUID=4cbc08e5-16e3-491e-a1db-ce44ca0410c5 containerName="container" probeResult=failure output="Get \"http://10.1.1.25:5001/healthz\": dial tcp 10.1.1.25:5001: connect: no route to host"

Expected results:
No disruption of service at all respectively it should go unnoticed and recover within a matter of seconds and not minutes.

Additional info:

Affected Platforms:
The effect was seen on multiple large OpenShift Container Platform 4 - Clusters with +80 OpenShift Container Platform 4 - Node(s). The OpenShift Container Platform 4 - Clusters are running on Microsoft Azure and AWS and are showing the same effect.

https://github.com/openshift/sdn/pull/644

Bug OCPBUGS-21724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource-operator/pull/85

Bug OCPBUGS-25718: vSphere ABI failed due to storage operator degraded

View the Description View the linked PRs

Description of problem:

The degradation of the storage operator occurred because it couldn't locate the node by UUID. I noticed that the providerID was present for node 0, but it was blank for other nodes. A successful installation can be achieved on day 2 by executing step 4 after step 7 from this document: https://access.redhat.com/solutions/6677901. Additionally, if we provide credentials from the install-config, it's necessary to add a taint to the node using the uninitialized taint(oc adm taint node "$NODE" node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule) after the bootstrap completed.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

100%

Steps to Reproduce:

    1. Create an agent ISO image
    2. Boot the created ISO on vSphere VM

Actual results:

Installation is failing due to storage operator unable to find the node by UUID.

Expected results:

Storage operator should be installed without any issue.

Additional info:

Slack discussion: https://redhat-internal.slack.com/archives/C02SPBZ4GPR/p1702893456002729

https://github.com/openshift/assisted-installer/pull/785

Bug OCPBUGS-33117: HCP: imageRegistryOverrides information are extracted only once on HyperShift operator initialization and never refreshed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29110~~. The following is the description of the original issue:
—
Description of problem:

HO uses the ICSP/IDMS from mgmt cluster to extract the OCP release metadata to be used in the HostedCluster.

But they are extracted only once in main.go:
https://github.com/jparrill/hypershift/blob/9bf1403ae09c0f262ebfe006267e3b442cc70149/hypershift-operator/main.go#L287-L293
before starting the HC and NP controllers but they are never refreshed anymore when ICSP/IDMS changes on the management cluster neither when a new HostedCluster is created.

Version-Release number of selected component (if applicable):

    4.14 4.15 4.16

How reproducible:

100%

Steps to Reproduce:

    1. ensure that HO is already running
    2. create an ICSP or a IMDS on the management cluster
    3. try to create an hosted-cluster

Actual results:

the imageRegistryOverrides setting for the new hosted-cluster ignores the ICSP/IMDS created when the HO was already running.
Killing HO operator pod and wait for it to restart will bring to a different result.

Expected results:

HO is consistently consuming  ICSP/IMDS info at runtime without the need to be restarted

Additional info:

    It affects disconnected deployments

https://github.com/openshift/hypershift/pull/3962

Bug OCPBUGS-35074: HCP: hypershift-operator on disconnected clusters ignores RegistryOverrides inspecting for nodepool release image(setting hypershift.openshift.io/control-plane-operator-image is a workaround)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34773~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34734~~. The following is the description of the original issue:
—
Description of problem:

For the fix of ~~OCPBUGS-29494~~, only the hosted cluster was fixed, and changes to the node pool were ignored. The node pool encountered the following error:

    - lastTransitionTime: "2024-05-31T09:11:40Z"
      message: 'failed to check if we manage haproxy ignition config: failed to look
        up image metadata for registry.ci.openshift.org/ocp/4.14-2024-05-29-171450@sha256:9b88c6e3f7802b06e5de7cd3300aaf768e85d785d0847a70b35857e6d1000d51:
        failed to obtain root manifest for registry.ci.openshift.org/ocp/4.14-2024-05-29-171450@sha256:9b88c6e3f7802b06e5de7cd3300aaf768e85d785d0847a70b35857e6d1000d51:
        unauthorized: authentication required'
      observedGeneration: 1
      reason: ValidationFailed
      status: "False"
      type: ValidMachineConfig

Version-Release number of selected component (if applicable):

    4.14, 4.15, 4.16, 4.17

How reproducible:

    100%

Steps to Reproduce:

    1. try to deploy an hostedCluster on a disconnected environment without explicitly set hypershift.openshift.io/control-plane-operator-image annotation.
    2.
    3.

Expected results:

without set hypershift.openshift.io/control-plane-operator-image annotation
nodepool can be ready

https://github.com/openshift/hypershift/pull/4175

Bug OCPBUGS-21798: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/223

Bug OCPBUGS-22749: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13263

Bug OCPBUGS-25210: PipelineRuns is not loaded on repository details page

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13439

Story TRT-1376: 4.15 CI Payloads Failing on GCP with Credentials Operator problems

View the Description View the linked PRs

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1730445943465054208

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1730385372728266752

All jobs that run seem to hit the same quota problem we saw recently:

failed to grant creds: error syncing creds in mint-mode: error creating custom role: rpc error: code = ResourceExhausted desc = Maximum number of roles reached. Maximum is: 300\nerror details: retry in 24h0m1s

This time it seems to be surfacing on a new credentials request from storage: openshift-gcp-pd-csi-driver-operator which was just moved from predefined roles to fine grained permissions in https://github.com/openshift/cluster-storage-operator/pull/410, likely why we're now tripping over this limit.

We're going to revert and buy time for CCO team to investigate.

https://github.com/openshift/cluster-storage-operator/pull/426

Bug OCPBUGS-18772: MCO keeps attempting to pull baremetalRuntimeCfg image again and again

View the Description View the linked PRs

MCO installs resolve-prepender NetworkManager script on the nodes. In order to find out node details it needs to pull baremetalRuntimeCfgImage. However, this image needs to be pulled just the first time, in the followup attempts this script just verifies that this image is available.

This is not desirable in situations where mirror / quay are not available or having a temporary problem - these kind of issues should not prevent the node from starting kubelet. During certificate rotation testing I noticed that the node with a significant time skew won't start kubelet, as it tries to pull baremetalRuntimeCfgImage for kubelet to start - but the image is already on the nodes and it doesn't need refreshing.

Bug OCPBUGS-19144: Update 4.15 ose-powervs-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-powervs/pull/51

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-powervs/pull/51

Bug OCPBUGS-29436: Power VS: Destroy code needs to account for edge case of lists

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29425~~. The following is the description of the original issue:
—
Description of problem:

    In accounts with a large amount of resources, the destroy code will fail to list all resources. This has revealed some changes that need to be made to the destroy code to handle these situations.

Version-Release number of selected component (if applicable):

How reproducible:

    Difficult - but we have an account where we can reproduce it consistently

Steps to Reproduce:

    1. Try to destroy a cluster in an account with a large amount of resources.
    2. Fail.
    3.

Actual results:

Fail to destroy

Expected results:

Destroy succeeds

Additional info:

https://github.com/openshift/installer/pull/8013

Bug OCPBUGS-41913: Alerts with a non-standard severity label should be filtered out from Telemetry

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41910~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-41908~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39246. The following is the description of the original issue:
—
Description of problem:

    Alerts with non-standard severity labels are sent to Telemeter.

Version-Release number of selected component (if applicable):

    All supported versions

How reproducible:

    Always

Steps to Reproduce:

    1. Create an always firing alerting rule with severity=foo.
    2. Make sure that telemetry is enabled for the cluster.
    3.

Actual results:

    The alert can be seen on the telemeter server side.

Expected results:

    The alert is dropped by the telemeter allow-list.

Additional info:

Red Hat operators should use standard severities: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide
Looking at the current data, it looks like ~2% of the alerts reported to Telemter have an invalid severity.

https://github.com/openshift/cluster-monitoring-operator/pull/2472

Bug OCPBUGS-32690: AWS: Installer requires nonexistent s3:HeadBucket permission

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31813~~. The following is the description of the original issue:
—
Description of problem:

    Installer requires the `s3:HeadBucket` even though such permission does not exist. The correct permission for the `HeadBucket` action is `s3:ListBucket`

https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadBucket.html

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    always

Steps to Reproduce:

    1. Install a cluster using a role with limited permissions
    2.
    3.

Actual results:

    level=warning msg=Action not allowed with tested creds action=iam:DeleteUserPolicy
level=warning msg=Tested creds not able to perform all requested actions
level=warning msg=Action not allowed with tested creds action=s3:HeadBucket
level=warning msg=Tested creds not able to perform all requested actions
level=fatal msg=failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Permissions Check": validate AWS credentials: AWS credentials cannot be used to either create new creds or use as-is
Installer exit with code 1

Expected results:

    Installer should check only for s3:ListBucket

Additional info:

https://github.com/openshift/installer/pull/8302

Task HOSTEDCP-1206: Create e2e test for request serving isolation mode

View the Description View the linked PRs

We are going to be using request serving isolation mode in ROSA. We need an e2e test that helps us to not break that function as we continue HyperShift development.

https://github.com/openshift/hypershift/pull/3150

Bug OCPBUGS-16666: Move the setting of additionalTrustBundle to InfraEnv

View the Description View the linked PRs

Description of problem:

In https://github.com/openshift/installer/pull/7182 support was added to include AdditionalTrustBundle in the installconfigOverride for assisted-service in order to support Proxy with AdditionalTrustBundle. With the recent change to assisted-service https://github.com/openshift/assisted-service/pull/5357 to add it to the API we can remove setting this in installconfigOverride.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7485

Bug OCPBUGS-31310: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/monitoring-plugin/pull/107

Bug OCPBUGS-31942: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1253

Bug OCPBUGS-35359: [gcp][CORS-2420] deploying compact 3-nodes cluster on GCP, by setting mastersSchedulable as true and removing worker machineset YAMLs, got panic

View the Description View the linked PRs

Description of problem:

deploying compact 3-nodes cluster on GCP, by setting mastersSchedulable as true and removing worker machineset YAMLs, got panic

Version-Release number of selected component (if applicable):

$ openshift-install version
openshift-install 4.13.0-0.nightly-2022-12-04-194803
built from commit cc689a21044a76020b82902056c55d2002e454bd
release image registry.ci.openshift.org/ocp/release@sha256:9e61cdf7bd13b758343a3ba762cdea301f9b687737d77ef912c6788cbd6a67ea
release architecture amd64

How reproducible:

Always

Steps to Reproduce:

1. create manifests
2. set 'spec.mastersSchedulable' as 'true', in <installation dir>/manifests/cluster-scheduler-02-config.yml
3. remove the worker machineset YAML file from <installation dir>/openshift directory
4. create cluster

Actual results:

Got "panic: runtime error: index out of range [0] with length 0".

Expected results:

The installation should succeed, or giving clear error messages.

Additional info:

$ openshift-install version
openshift-install 4.13.0-0.nightly-2022-12-04-194803
built from commit cc689a21044a76020b82902056c55d2002e454bd
release image registry.ci.openshift.org/ocp/release@sha256:9e61cdf7bd13b758343a3ba762cdea301f9b687737d77ef912c6788cbd6a67ea
release architecture amd64
$ 
$ openshift-install create manifests --dir test1
? SSH Public Key /home/fedora/.ssh/openshift-qe.pub
? Platform gcp
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json"
? Project ID OpenShift QE (openshift-qe)
? Region us-central1
? Base Domain qe.gcp.devcluster.openshift.com
? Cluster Name jiwei-1205a
? Pull Secret [? for help] ******
INFO Manifests created in: test1/manifests and test1/openshift 
$ 
$ vim test1/manifests/cluster-scheduler-02-config.yml
$ yq-3.3.0 r test1/manifests/cluster-scheduler-02-config.yml spec.mastersSchedulable
true
$ 
$ rm -f test1/openshift/99_openshift-cluster-api_worker-machineset-?.yaml
$ 
$ tree test1
test1
├── manifests
│   ├── cloud-controller-uid-config.yml
│   ├── cloud-provider-config.yaml
│   ├── cluster-config.yaml
│   ├── cluster-dns-02-config.yml
│   ├── cluster-infrastructure-02-config.yml
│   ├── cluster-ingress-02-config.yml
│   ├── cluster-network-01-crd.yml
│   ├── cluster-network-02-config.yml
│   ├── cluster-proxy-01-config.yaml
│   ├── cluster-scheduler-02-config.yml
│   ├── cvo-overrides.yaml
│   ├── kube-cloud-config.yaml
│   ├── kube-system-configmap-root-ca.yaml
│   ├── machine-config-server-tls-secret.yaml
│   └── openshift-config-secret-pull-secret.yaml
└── openshift
    ├── 99_cloud-creds-secret.yaml
    ├── 99_kubeadmin-password-secret.yaml
    ├── 99_openshift-cluster-api_master-machines-0.yaml
    ├── 99_openshift-cluster-api_master-machines-1.yaml
    ├── 99_openshift-cluster-api_master-machines-2.yaml
    ├── 99_openshift-cluster-api_master-user-data-secret.yaml
    ├── 99_openshift-cluster-api_worker-user-data-secret.yaml
    ├── 99_openshift-machineconfig_99-master-ssh.yaml
    ├── 99_openshift-machineconfig_99-worker-ssh.yaml
    ├── 99_role-cloud-creds-secret-reader.yaml
    └── openshift-install-manifests.yaml2 directories, 26 files
$ 
$ openshift-install create cluster --dir test1
INFO Consuming Openshift Manifests from target directory
INFO Consuming Master Machines from target directory 
INFO Consuming Worker Machines from target directory 
INFO Consuming OpenShift Install (Manifests) from target directory 
INFO Consuming Common Manifests from target directory 
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
panic: runtime error: index out of range [0] with length 0goroutine 1 [running]:
github.com/openshift/installer/pkg/tfvars/gcp.TFVars({{{0xc000cf6a40, 0xc}, {0x0, 0x0}, {0xc0011d4a80, 0x91d}}, 0x1, 0x1, {0xc0010abda0, 0x58}, ...})
        /go/src/github.com/openshift/installer/pkg/tfvars/gcp/gcp.go:70 +0x66f
github.com/openshift/installer/pkg/asset/cluster.(*TerraformVariables).Generate(0x1daff070, 0xc000cef530?)
        /go/src/github.com/openshift/installer/pkg/asset/cluster/tfvars.go:479 +0x6bf8
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000c78870, {0x1a777f40, 0x1daff070}, {0x0, 0x0})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:226 +0x5fa
github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffc4c21413b?, {0x1a777f40, 0x1daff070}, {0x1dadc7e0, 0x8, 0x8})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:76 +0x48
main.runTargetCmd.func1({0x7ffc4c21413b, 0x5})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:259 +0x125
main.runTargetCmd.func2(0x1dae27a0?, {0xc000c702c0?, 0x2?, 0x2?})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:289 +0xe7
github.com/spf13/cobra.(*Command).execute(0x1dae27a0, {0xc000c70280, 0x2, 0x2})
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0xc000c3a500)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:918
main.installerMain()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0
main.main()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff
$

https://github.com/openshift/installer/pull/8581

Bug OCPBUGS-19120: Update 4.15 openshift-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-state-metrics/pull/102

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-state-metrics/pull/102

Bug OCPBUGS-19171: Update 4.15 configmap-reload image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/configmap-reload/pull/56

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/configmap-reload/pull/56

Bug OCPBUGS-19208: Update 4.15 ose-nutanix-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-nutanix/pull/51

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-nutanix/pull/51

Bug OCPBUGS-23071: inpect collections of resources is reported as an error (not a warning)

View the Description View the linked PRs

As part of this slack thread.

Description of problem:

When SRE collects data using `oc adm inspect`; the collection reports an error on 'secrets' (see below). This is because of the way SRE manages our hosted platforms, and the SRE users (service accounts) are not 'true admins' and must impersonate admins to preform operations.

$ oc adm inspect --dest-dir=must-gather ns/openshift-sdn

Gathering data for ns/openshift-sdn...
...
Wrote inspect data to must-gather.
error: errors occurred while gathering data:
    secrets is forbidden: User "system:serviceaccount:openshift-backplane-srep:f2b5cf795ef1fc5289490411d49ab042" cannot list resource "secrets" in API group "" in the namespace "openshift-sdn"

At the end of the day; the 'error' here is 'erroneous' (not a true error) but more of a warning, telling user that a specific object wasn't collected.

https://github.com/openshift/oc/pull/1601

Bug OCPBUGS-28209: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1873

Bug OCPBUGS-20295: The error message "The operator does not support single namespace or global installation modes." is confusing

View the Description View the linked PRs

Description of problem:

Cannot install singlenamespace operator using web console

Version-Release number of selected component (if applicable):

zhaoxia@xzha-mac doc_add_operator % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-10-08-220853   True        False         168m    Cluster version is 4.14.0-0.nightly-2023-10-08-220853

How reproducible:

always

Steps to Reproduce:

1.install catsrc
zhaoxia@xzha-mac doc_add_operator % cat catsrc-singlenamespace.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: nginx-index
  namespace: openshift-marketplace
spec:
  displayName: Test
  publisher: OLM-QE
  sourceType: grpc
  image: quay.io/olmqe/nginxolm-operator-index:v1-singlenamespace
  updateStrategy:
    registryPoll:
      interval: 10m
 oc apply -f catsrc-singlenamespace.yaml

zhaoxia@xzha-mac doc_add_operator % oc get packagemanifests nginx-operator -o yaml
      installModes:
      - supported: false
        type: OwnNamespace
      - supported: true
        type: SingleNamespace
      - supported: false
        type: MultiNamespace
      - supported: false
        type: AllNamespaces

2. install nginx-operator using web console

3.

Actual results:

nginxolm can't be installed with error message:

"nginxolm can't be installed
The operator does not support single namespace or global installation modes."

The error message confused me, nginx-operator does support SingleNamespace, but the error message said "The operator does not support single namespace or global installation modes."

Expected results:

nginxolm can be installed

Additional info:

The error message confused me, nginx-operator does support SingleNamespace, but the error message said "The operator does not support single namespace or global installation modes."

https://github.com/openshift/console/pull/13232

Bug OCPBUGS-31447: Rule ocp4-cis-file-permissions-cni-conf returned a false negative result

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22995~~. The following is the description of the original issue:
—
Description of problem:

Rule ocp4-cis-file-permissions-cni-conf returned false negative result
From the CIS benchmark v1.4.0, it is using below command to check the multus config on nodes:

$ for i in $(oc get pods -n openshift-multus -l app=multus -oname); do oc exec -n openshift-multus $i -- /bin/bash -c "stat -c \"%a %n\" /host/etc/cni/net.d/*.conf"; done
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf

Per the rule instructions, it is checking /etc/cni/net.d/ on the node.
However, the multus config on nodes is in path /etc/kubernetes/cni/net.d/, not /etc/cni/net.d/:

$ oc debug node/hongli-az-8pzqq-master-0 -- chroot /host ls -ltr /etc/cni/net.d/
Starting pod/hongli-az-8pzqq-master-0-debug ...
To use host binaries, run `chroot /host`
total 8
-rw-r--r--. 1 root root 129 Nov  7 02:18 200-loopback.conflist
-rw-r--r--. 1 root root 469 Nov  7 02:18 100-crio-bridge.conflist
Removing debug pod ...
$ oc debug node/hongli-az-8pzqq-master-0 -- chroot /host ls -ltr /etc/kubernetes/cni/net.d/
Starting pod/hongli-az-8pzqq-master-0-debug ...
To use host binaries, run `chroot /host`
total 4
drwxr-xr-x. 2 root root  60 Nov  7 02:23 whereabouts.d
-rw-------. 1 root root 352 Nov  7 02:23 00-multus.conf
Removing debug pod ...

$  for node in `oc get node --no-headers|awk '{print $1}'`; do oc debug node/$node -- chroot /host ls -l /etc/kubernetes/cni/net.d/; done
Starting pod/hongli-az-8pzqq-master-0-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:23 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:23 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-master-1-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:23 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:23 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-master-2-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:23 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:23 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-worker-westus-2mx6t-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:38 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:38 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-worker-westus-9qhf5-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:38 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:38 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-worker-westus-bcdpd-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:38 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:38 whereabouts.d
Removing debug pod ...

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-11-05-194730

How reproducible:

Always

Steps to Reproduce:

1. $ for i in $(oc get pods -n openshift-multus -l app=multus -oname); do oc exec -n openshift-multus $i -- /bin/bash -c "stat -c \"%a %n\" /host/etc/cni/net.d/*.conf"; done
$for node in `oc get node --no-headers|awk '{print $1}'`; do oc debug node/$node -- chroot /host ls -l /etc/kubernetes/cni/net.d/; done

Actual results:

The rule should check the wrong path and return FAIL

Expected results:

The rule should check the right path and return PASS

Additional info:

It was also applicable for both SDN and OVN

https://github.com/openshift/cluster-network-operator/pull/2324

Bug OCPBUGS-15934: logSizeMax automatically applied to containerRuntimeConfig even if not specified

View the Description View the linked PRs

Description of problem:

According to https://docs.openshift.com/container-platform/4.11/release_notes/ocp-4-11-release-notes.html#ocp-4-11-deprecated-features-crio-parameters and Red Hat Insights, logSizeMax is deprecated in ContainerRuntimeConfig and shall instead be created via containerLogMaxSize in KubeletConfig.

When starting that transition though, it was noticed that a ContainerRuntimeConfig as shown below, would still add logSizeMax and even overlaySize to the ContainerRuntimeConfig spec.

$ bat /tmp/crio.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: pidlimit
spec:
 machineConfigPoolSelector:
   matchLabels:
     pools.operator.machineconfiguration.openshift.io/worker: '' 
 containerRuntimeConfig:
   pidsLimit: 4096 
   logLevel: debug

$ oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig'
{
  "logLevel": "debug",
  "logSizeMax": "0",
  "overlaySize": "0",
  "pidsLimit": 4096
}

When checking on the OpenShift Container Platform 4 - Node, using crio coonfig, we can see that the values are not applied. Yet it's disturbing to see those options added in the specification when in fact Red Hat is recommending to move them into KubeletConfig and remove them from ContainerRuntimeConfig.

Further, having them still set in ContainerRuntimeConfig will trigger a false/positive alert in Red Hat Insights as generally the customer may have followed the recommendation but the system does not comply with the changes made :-)

Also interesting , similar problem was reported a while ago in https://bugzilla.redhat.com/show_bug.cgi?id=1941936 and fixed. Hence it's interesting that this is coming back again.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.13.4

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.13.4
2. Create ContainerRuntimeConfig as shown above and validate the actual object created
3. Run oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig' to validate the object created and inspect the spec.

Actual results:

$ oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig'
{
  "logLevel": "debug",
  "logSizeMax": "0",
  "overlaySize": "0",
  "pidsLimit": 4096
}

Expected results:

$ oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig'
{
  "logLevel": "debug",
  "pidsLimit": 4096
}

Additional info:

https://github.com/openshift/machine-config-operator/pull/4044

Bug OCPBUGS-24141: Update 4.15 ose-vmware-vsphere-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/100

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/100

Bug OCPBUGS-28545: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4311

Bug OCPBUGS-30232: dualStack HostPrefix validation failures for non-(sdn/ovn) plugins

View the Description View the linked PRs

This is a regression due to the fix for https://issues.redhat.com/browse/OCPBUGS-23069.

When using dual-stack networks with networks other than OVN or SDN a validation failure results. For example when using this networking config:

networking:
  clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 25
    - cidr: fd01::/48
      hostPrefix: 64
  networkType: Calico

The following error will be returned:

{
  "id": "network-prefix-valid",
  "status": "failure",
  "message": "Unexpected status ValidationError"
},

When the clusterNetwork prefixes are removed the following error will result:

{
  "id": "network-prefix-valid",
  "status": "failure",
  "message": "Invalid Cluster Network prefix: Host prefix, now 0, must be a positive integer."
},

https://github.com/openshift/assisted-service/pull/6137

Bug OCPBUGS-34726: [4.15] Allow adding new node during live migration

View the Description View the linked PRs

Description of problem:

    As, the live migration process may take hours for a large cluster. The workload in the cluster may trigger cluster extension by adding new nodes. We need to support adding new nodes when an SDN live migration is running in progress.We need to backport this to 4.15.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2393

Bug OCPBUGS-37407: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/529

Bug OCPBUGS-44705: aws autoscaler broken with dhcp option domain-name

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44629~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43312~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-36871. The following is the description of the original issue:
—
Description of problem:

Customer has a cluster in AWS that was born on an old OCP version (4.7) and was upgraded all the way through 4.15.
During the lifetime of the cluster they changed the DHCP option in AWS to "domain name". 
During the node provisioning during MachineSet scaling the Machine can successfully be created at the cloud provider but the Node is never added to the cluster. 
The CSR remain pending and do not get auto-approved

This issue is eventually related or similar to the bug fixed via https://issues.redhat.com/browse/OCPBUGS-29290

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

   CSR don't get auto approved. New nodes have a different domain name when CSR is approved manually.

Expected results:

    CSR should get approved automatically and domain name scheme should not change.

Additional info:

https://github.com/openshift/cluster-machine-approver/pull/245

Bug OCPBUGS-16762: HAProxy Log Length will take only destination type when both syslog and container is configured on default ingress controller

View the Description View the linked PRs

Description of problem:

During the testing of NE1264 epic, i configured both syslog and container destination type of logging on the same default ingress controller. In the ingress controller spec we can see, it is taking both the destination type, but it is not reflect in ROUTER_LOG_MAX_LENGTH env  or the haproxy.config file

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator get ingresscontroller/default -oyaml
apiVersion: operator.openshift.io/v1
kind: IngressController
<-----snip--->
spec:
  clientTLS:
    clientCA:
      name: ""
    clientCertificatePolicy: ""
  httpCompression: {}
  httpEmptyRequestsPolicy: Respond
  httpErrorCodePages:
    name: ""
  logging:
    access:
      destination:
        container:
          maxLength: 1024
        syslog:
          address: 1.2.3.4
          maxLength: 1024
          port: 514
        type: Container
      logEmptyRequests: Log
  replicas: 2
  tuningOptions:
    reloadInterval: 0s
  unsupportedConfigOverrides: null



melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress exec router-default-6c86ff75d9-g24q5    -- env | grep ROUTER_LOG_MAX_LENGTH
Defaulted container "router" out of: router, logs
ROUTER_LOG_MAX_LENGTH=1024
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress exec router-default-6c86ff75d9-l9rjv -- cat haproxy.config | grep 1024  
Defaulted container "router" out of: router, logs
  log /var/lib/rsyslog/rsyslog.sock len 1024 local1 info


when we patch changes to log length, it is not reflect as expected for one destination.

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator patch ingresscontroller/default -p '{"spec":{"logging":{"access":{"destination":{"container":{"maxLength":480}}}}}}' --type=merge
ingresscontroller.operator.openshift.io/default patched
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress exec router-default-6476d6c69d-tlhqd -- env | grep ROUTER_LOG_MAX_LENGTH    
Defaulted container "router" out of: router, logs
ROUTER_LOG_MAX_LENGTH=480


melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator patch ingresscontroller/default -p '{"spec":{"logging":{"access":{"destination":{"syslog":{"maxLength":4096}}}}}}' --type=merge
ingresscontroller.operator.openshift.io/default patched

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator get ingresscontroller/default -oyaml                                                                                    
apiVersion: operator.openshift.io/v1
kind: IngressController
<----snip---->
spec:
  clientTLS:
    clientCA:
      name: ""
    clientCertificatePolicy: ""
  httpCompression: {}
  httpEmptyRequestsPolicy: Respond
  httpErrorCodePages:
    name: ""
  logging:
    access:
      destination:
        container:
          maxLength: 480
        syslog:
          address: 1.2.3.4
          maxLength: 4096
          port: 514
        type: Container
      logEmptyRequests: Log
  replicas: 2
  tuningOptions:
    reloadInterval: 0s
  unsupportedConfigOverrides: null


melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress exec router-default-59cf55666d-shq98 -- env | grep ROUTER_LOG_MAX_LENGTH 
Defaulted container "router" out of: router, logs
ROUTER_LOG_MAX_LENGTH=480


In another round of testing i can see only the syslog destination type is reflecting on env and not the container destination type.

I am also not sure whether it is a valid situation where we can use both type of destination  type on default ingress controller.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1. Edit the default ingress controller and add both destination type configs
2.
3.

Actual results:

Either one type value is only reflecting in the haproxy.config file

Expected results:

Both type should we reflected

Additional info:

Bug OCPBUGS-18498: spec.containers.image is empty when use 'oc new-app' created deploy when build/deploymentconfig are not installed

View the Description View the linked PRs

Description of problem:

If not installed capability operator build and deploymentconfig, when use `oc new-app registry.redhat.io/<namespace>/<image>:<tag>` , the created deployment emptied spec.containers[0].image. The deploy will fail to start pod.

Version-Release number of selected component (if applicable):

oc version
Client Version: 4.14.0-0.nightly-2023-08-22-221456
Kustomize Version: v5.0.1
Server Version: 4.14.0-0.nightly-2023-09-02-132842
Kubernetes Version: v1.27.4+2c83a9f

How reproducible:

Always

Steps to Reproduce:

1. Installed cluster without build/deploymentconfig function
Set "baselineCapabilitySet: None" in install-config
2.Create a deploy using 'new-app' cmd
oc new-app registry.redhat.io/ubi8/httpd-24:latest
3.

Actual results:

2.
$oc new-app registry.redhat.io/ubi8/httpd-24:latest
--> Found container image c412709 (11 days old) from registry.redhat.io for "registry.redhat.io/ubi8/httpd-24:latest"    Apache httpd 2.4
    ----------------
    Apache httpd 2.4 available as container, is a powerful, efficient, and extensible web server. Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Virtual hosting allows one Apache installation to serve many different Web sites.    Tags: builder, httpd, httpd-24    * An image stream tag will be created as "httpd-24:latest" that will track this image--> Creating resources ...
    imagestream.image.openshift.io "httpd-24" created
    deployment.apps "httpd-24" created
    service "httpd-24" created
--> Success
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose service/httpd-24'
    Run 'oc status' to view your app

3. oc get deploy -o yaml
 apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "1"
      image.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"httpd-24:latest"},"fieldPath":"spec.template.spec.containers[?(@.name==\"httpd-24\")].image"}]'
      openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: "2023-09-04T07:44:01Z"
    generation: 1
    labels:
      app: httpd-24
      app.kubernetes.io/component: httpd-24
      app.kubernetes.io/instance: httpd-24
    name: httpd-24
    namespace: wxg
    resourceVersion: "115441"
    uid: 909d0c4e-180c-4f88-8fb5-93c927839903
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        deployment: httpd-24
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        annotations:
          openshift.io/generated-by: OpenShiftNewApp
        creationTimestamp: null
        labels:
          deployment: httpd-24
      spec:
        containers:
        - image: ' '
          imagePullPolicy: IfNotPresent
          name: httpd-24
          ports:
          - containerPort: 8080
            protocol: TCP
          - containerPort: 8443
            protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Created new replica set "httpd-24-7f6b55cc85"
      reason: NewReplicaSetCreated
      status: "True"
      type: Progressing
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: 'Pod "httpd-24-7f6b55cc85-pvvgt" is invalid: spec.containers[0].image:
        Invalid value: " ": must not have leading or trailing whitespace'
      reason: FailedCreate
      status: "True"
      type: ReplicaFailure
    observedGeneration: 1
    unavailableReplicas: 1
kind: List
metadata:

Expected results:

Should set spec.containers[0].image to registry.redhat.io/ubi8/httpd-24:latest

Additional info:

Bug OCPBUGS-21789: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/image-customization-controller/pull/103

Bug OCPBUGS-24217: auth operator TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-authentication-operator/pull/642

Bug OCPBUGS-33510: Kube-apiserver-proxy pod in Hosted Control Plane cluster does not use no_proxy variable

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33237~~. The following is the description of the original issue:
—
Description of problem:

Looks like we are facing a bug when trying to spin up a hosted control plane cluster while using proxy settings to connect to the internet. For example, on our worker node, the static pod kube-apiserver-proxy.yaml doesn't contain the noProxy settings which seem to cause the failure of deploying the hosted cluster.

~~~
[root@ocpugbo2cogswo03 manifests]# cat kube-apiserver-proxy.yaml_
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    k8s-app: kube-apiserver-proxy
  name: kube-apiserver-proxy
  namespace: kube-system
spec:
  containers:
  - command:
    - control-plane-operator
    - kubernetes-default-proxy
    - --listen-addr=<IP-Addr>:6443
    - --proxy-addr=<Proxy-Addr>:<Proxy-port>
    - --apiserver-addr=<API-IP-Addr>:6443
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7ca95b9a71e41157c70378896758618b993ad90e6d80a23c46170da5c11f441f
    name: kubernetes-default-proxy
    resources:
      requests:
        cpu: 13m
        memory: 16Mi
    securityContext:
      runAsUser: 1001
  hostNetwork: true
  priorityClassName: system-node-critical
status: {}
~~~

Can you please check this issue.

Steps to Reproduce:

    1. Install a cluster with ACM and HCP
    2. Try to create a hosted cluster using proxy configuration
    3. kube-apiserver-proxy is using proxy to reach API.

Actual results:

    The kube-apiserver-proxy is using proxy to reach API. Worker nodes are unable to reach a Hosted Control Plane's API when a cluster-wide http proxy is configured.

Expected results:

    kube-apiserver-proxy should not use proxy to reach API

Additional info:

https://github.com/openshift/hypershift/pull/4014

Bug OCPBUGS-47545: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/5334

Bug OCPBUGS-27818: High memory usage by Kube APIServer on HostedCluster upgrades

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27817~~. The following is the description of the original issue:
—
Description of problem:

When performing upgrades on ROSA HCP clusters with a large number of worker nodes (> 51), the Kube APIServer pods of the cluster use up memory exceeding the capacity of their nodes, resulting in OOMKills.

Version-Release number of selected component (if applicable):

   4.14, 4.15

How reproducible:

    always

Steps to Reproduce:

    1. Create ROSA HCP Cluster
    2. Add 100 workers to Cluster
    3. Upgrade the cluster

Actual results:

    Kube APIServer pods are OOMKilled

Expected results:

    Upgrade completes successfully

Additional info:

https://github.com/openshift/hypershift/pull/3457

Bug OCPBUGS-29781: [release-4.15] certificate signed by unknown authority while uninstalling operators from console.

View the Description View the linked PRs

Description of problem:

The customer has a custom apiserver certificate.

This error can be found while trying to uninstall any operator by console:

openshift-console/pods/console-56494b7977-d7r76/console/console/logs/current.log:

2023-10-24T14:13:21.797447921+07:00 E1024 07:13:21.797400 1 operands_handler.go:67] Failed to get new client for listing operands: Get "https://api.<cluster>.<domain>:6443/api?timeout=32s": x509: certificate signed by unknown authority

when trying the same request from the console pod we can see no issue.

We see the root ca that signs apiserver certificate and this CA is trusted in the pod.

It seems the code that provokes this issue is:

https://github.com/openshift/console/blob/master/pkg/server/operands_handler.go#L62-L70

https://github.com/openshift/console/pull/13633

Bug OCPBUGS-29927: origin needs workaround for ROSA's infra labels

View the Description View the linked PRs

This is a clone of issue OCPBUGS-29858. The following is the description of the original issue:
—
The convention is a format like node-role.kubernetes.io/role: "", not node-role.kubernetes.io: role, however ROSA uses the latter format to indicate the infra role. This changes the node watch code to ignore it, as well as other potential variations like node-role.kubernetes.io/.

The current code panics when run against a ROSA cluster:
{{ E0209 18:10:55.533265 78 runtime.go:79] Observed a panic: runtime.boundsError{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23])
goroutine 233 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x7a71840?, 0xc0018e2f48})
k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1000251f9fe?})
k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:49 +0x75
panic({0x7a71840, 0xc0018e2f48})
runtime/panic.go:884 +0x213
github.com/openshift/origin/pkg/monitortests/node/watchnodes.nodeRoles(0x7ecd7b3?)
github.com/openshift/origin/pkg/monitortests/node/watchnodes/node.go:187 +0x1e5
github.com/openshift/origin/pkg/monitortests/node/watchnodes.startNodeMonitoring.func1(0}}

https://github.com/openshift/origin/pull/28615

Bug OCPBUGS-19178: Update 4.15 baremetal-machine-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/196

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/196

Bug OCPBUGS-19992: Missing v6-primary logic on VSphere UPI

View the Description View the linked PRs

Description of problem:

We are missing the new logic for handling v6-primary in the VSphere UPI nodeip-configuration service: https://github.com/openshift/machine-config-operator/blob/ea88304dd6de521d55a9d3413a764f618af2425a/templates/common/vsphere/units/nodeip-configuration-vsphere-upi.service.yaml#L40

https://github.com/openshift/machine-config-operator/pull/3670 addresses that, but unfortunately did not make 4.14 so we will need to backport it.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3670

Bug OCPBUGS-22369: Ccoctl create Azure Workload Identity resource does not work properly in eastus region because the storage account does not allow Public access.

View the Description View the linked PRs

Description of problem:

Default security settings for new Azure Storage accounts be updated. Using ccoctl to create Azure Workload Identity resources in region eastus is not work.

I found several commonly used regions and did the test. The test results are as follows.

List of regions not working properly: eastus

$ az storage account list -g mihuangtt0947-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsv
mihuangtt0947rgoidc False


 List of regions working properly: westus, australiacentral, australiaeast, centralus, australiasoutheast, southindia…

$ az storage account list -g mihuangdispri0929-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsv
mihuangdispri0929rgoidc	True

Version-Release number of selected component (if applicable):

4.14/4.15

How reproducible:

Always

Steps to Reproduce:

1.Running ccoctl azure create-all command to create azure workload identity resources in region eastus.

[huangmingxia@fedora CCO-bugs]$ ./ccoctl azure create-all  --name 'mihuangp1' --region 'eastus' --subscription-id  {SUBSCRIPTION-ID} --tenant-id {TENANNT-ID} --credentials-requests-dir=./credrequests --dnszone-resource-group-name 'os4-common' --storage-account-name='mihuangp1oidc' --output-dir test

Actual results:

[huangmingxia@fedora CCO-bugs]$  ./ccoctl azure create-all  --name 'mihuangp1' --region 'eastus' --subscription-id  {SUBSCRIPTION-ID} --tenant-id {TENANNT-ID} --credentials-requests-dir=./credrequests --dnszone-resource-group-name 'os4-common' --storage-account-name='mihuangp1oidc' --output-dir test
2023/10/25 11:14:36 Using existing RSA keypair found at test/serviceaccount-signer.private
2023/10/25 11:14:36 Copying signing key for use by installer
2023/10/25 11:14:36 No --oidc-resource-group-name provided, defaulting OIDC resource group name to mihuangp1-oidc
2023/10/25 11:14:36 No --installation-resource-group-name provided, defaulting installation resource group name to mihuangp1
2023/10/25 11:14:36 No --blob-container-name provided, defaulting blob container name to mihuangp1
2023/10/25 11:14:39 Created resource group /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mihuangp1-oidc
2023/10/25 11:15:01 Created storage account /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mihuangp1-oidc/providers/Microsoft.Storage/storageAccounts/mihuangp1oidc
2023/10/25 11:15:03 failed to create blob container: PUT https://management.azure.com/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mihuangp1-oidc/providers/Microsoft.Storage/storageAccounts/mihuangp1oidc/blobServices/default/containers/mihuangp1--------------------------------------------------------------------------------RESPONSE 409: 409 ConflictERROR CODE: PublicAccessNotPermitted--------------------------------------------------------------------------------{  "error": {    "code": "PublicAccessNotPermitted",    "message": "Public access is not permitted on this storage account.\nRequestId:415c51f1-c01e-0017-7ef1-06ec0c000000\nTime:
2023-10-25T03:15:02.7928767Z"  }}--------------------------------------------------------------------------------

$ az storage account list -g mihuangtt0947-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsvmihuangtt0947rgoidc False

Expected results:

Resources created successfully.

$ az storage account list -g mihuangtt0947-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsv
mihuangtt0947rgoidc True

Additional info:

Google email: Important notice: Default security settings for new Azure Storage accounts will be updated

https://github.com/openshift/cloud-credential-operator/pull/610

Bug OCPBUGS-27343: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1854

Bug OCPBUGS-43918: Fix TestOperandProxyConfiguration and TestLeaderElection flakes on Image Registry Operator

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43797~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43564~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-43508. The following is the description of the original issue:
—
Description of problem:

    These two tests have been flaking more often lately. The TestLeaderElection flake is partially (but not solely) connected to OCPBUGS-41903.

   TestOperandProxyConfiguration seems to fail in the teardown while waiting for other cluster operators to become available.

   Although these flakes aren't customer facing, they considerably slow development cycles (due to retests) and also consume more resources than they should (every retest runs on a new cluster), so we want to backport the fixes.

Version-Release number of selected component (if applicable):

    4.18, 4.17, 4.16, 4.15, 4.14

How reproducible:

    Sometimes

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1150

Bug OCPBUGS-48298: builder Unit Test Permanently Failing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-47775~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-47712~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45321. The following is the description of the original issue:
—
Description of problem:

Unit tests for openshift/builder permanently failing for v4.18

Version-Release number of selected component (if applicable):

4.18

How reproducible:

Always

Steps to Reproduce:

    1. Run PR against openshift/builder

Actual results:

Test fails: 
--- FAIL: TestUnqualifiedClone (0.20s)
    source_test.go:171: unable to add submodule: "Cloning into '/tmp/test-unqualified335202210/sub'...\nfatal: transport 'file' not allowed\nfatal: clone of 'file:///tmp/test-submodule643317239' into submodule path '/tmp/test-unqualified335202210/sub' failed\n"
    source_test.go:195: unable to find submodule dir
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference

Expected results:

Tests pass

Additional info:

Example: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_builder/401/pull-ci-openshift-builder-master-unit/1853816128913018880

https://github.com/openshift/builder/pull/428

Bug OCPBUGS-32365: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/485

Bug OCPBUGS-19199: Update 4.15 ose-alibaba-cloud-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/alibaba-cloud-csi-driver/pull/33

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/alibaba-cloud-csi-driver/pull/33

Bug OCPBUGS-19223: Update 4.15 ose-csi-external-snapshotter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/105

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/105

Bug OCPBUGS-22457: platform-operators-aggregated ClusterOperator manifest should not declare a namespace

View the Description View the linked PRs

Description of problem:

Cluster-scoped resources do not need (or want) metadata.namespace defined. Currently the platform-operators-aggregated ClusterOpreator manifest requests a namespace, but that request should be dropped to avoid confusing human and robot readers.

Version-Release number of selected component (if applicable):

At least 4.15. I haven't dug back to count previous 4.y.

How reproducible:

100%

Steps to Reproduce:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.15.0-ec.1-x86_64
grep -r5 platform-operators-aggregated manifests/ | grep namespace:

Actual results:

manifests/0000_50_cluster-platform-operator-manager_07-aggregated-clusteroperator.yaml-  namespace: openshift-platform-operators

Expected results:

No hits.

https://github.com/openshift/platform-operators/pull/100

Bug OCPBUGS-26555: Power VS: machine-api is unable to launch VMs in new Power VS regions

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

Easily

Steps to Reproduce:

    1. Deploy in wdc with 4.15
    2. Observe that workers don't launch
    3. Installer fails

Actual results:

    worker nodes will not launch

Expected results:

    install completes

Additional info:

https://github.com/openshift/machine-api-provider-powervs/pull/72

Bug OCPBUGS-27946: oc-mirror requires that the default channel of an operator is mirrored

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-385~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/786

Bug OCPBUGS-32220: HCP: hypershift-operator on disconnected clusters ignores RegistryOverrides inspecting the control-plane-operator-image (setting hypershift.openshift.io/control-plane-operator-image is a workaround)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29494~~. The following is the description of the original issue:
—
Description of problem:

    The hypershift operator ignores RegistryOverrides (from ICSP/IDMS) inspecting the control-plane-operator-image so on disconnected cluster the user should explicitly set hypershift.openshift.io/control-plane-operator-image annotation pointing to the mirrored image on the internal registry.

Example:
the correct match is in the IDMS:
# oc get imagedigestmirrorset -oyaml | grep -B2 registry.ci.openshift.org/ocp/4.14-2024-02-14-135111
 ...
    - mirrors:
      - virthost.ostest.test.metalkube.org:5000/localimages/local-release-image
      source: registry.ci.openshift.org/ocp/4.14-2024-02-14-135111

Creating an hosted cluster with:
hcp create cluster kubevirt --image-content-sources /home/mgmt_iscp.yaml  --additional-trust-bundle /etc/pki/ca-trust/source/anchors/registry.2.crt --name simone3 --node-pool-replicas 2 --memory 16Gi --cores 4 --root-volume-size 64 --namespace local-cluster --release-image virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:66c6a46013cda0ad4e2291be3da432fdd03b4a47bf13067e0c7b91fb79eb4539 --pull-secret /tmp/.dockerconfigjson --generate-ssh

on the hostedCluster object we see:
status:
  conditions:
  - lastTransitionTime: "2024-02-14T22:01:30Z"
    message: 'failed to look up image metadata for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6:
      failed to obtain root manifest for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6:
      unauthorized: authentication required'
    observedGeneration: 3
    reason: ReconciliationError
    status: "False"
    type: ReconciliationSucceeded


and in the logs of the hypershift operator:
{"level":"info","ts":"2024-02-14T22:18:11Z","msg":"registry override coincidence not found","controller":"hostedcluster","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedCluster","HostedCluster":{"name":"simone3","namespace":"local-cluster"},"namespace":"local-cluster","name":"simone3","reconcileID":"6d6a2479-3d54-42e3-9204-8d0ab1013745","image":"4.14-2024-02-14-135111"}
{"level":"error","ts":"2024-02-14T22:18:12Z","msg":"Reconciler error","controller":"hostedcluster","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedCluster","HostedCluster":{"name":"simone3","namespace":"local-cluster"},"namespace":"local-cluster","name":"simone3","reconcileID":"6d6a2479-3d54-42e3-9204-8d0ab1013745","error":"failed to look up image metadata for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6: failed to obtain root manifest for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6: unauthorized: authentication required","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}


so the hypershift-operator is not using the RegistryOverrides mechanism to inspect the image from the internal registry (virthost.ostest.test.metalkube.org:5000/localimages/local-release-image in this example).

Explicitly setting annotation:
hypershift.openshift.io/control-plane-operator-image: virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6
on the hosted-cluster directly pointing to the mirrored control-plane-operator image is required to proceed on disconnected environments.

Version-Release number of selected component (if applicable):

    4.14, 4.15, 4.16

How reproducible:

    100%

Steps to Reproduce:

    1. try to deploy an hostedCluster on a disconnected environment without explicitly set hypershift.openshift.io/control-plane-operator-image annotation.
    2.
    3.

Actual results:

    A reconciliation error reported on the hostedCluster object:
status:
  conditions:
  - lastTransitionTime: "2024-02-14T22:01:30Z"
    message: 'failed to look up image metadata for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6:
      failed to obtain root manifest for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6:
      unauthorized: authentication required'
    observedGeneration: 3
    reason: ReconciliationError
    status: "False"
    type: ReconciliationSucceeded

The hostedCluster is not spawn.

Expected results:

    The hypershift operator uses the RegistryOverrides mechanism also for the control-plane-operator image.
    Explicitly setting hypershift.openshift.io/control-plane-operator-image annotation is not required.

Additional info:

    - Maybe related to OCPBUGS-29110
    - Explicitly setting hypershift.openshift.io/control-plane-operator-image annotation pointing to the mirrored image on the internal registry is a valid workaround.

https://github.com/openshift/hypershift/pull/3881

Bug OCPBUGS-29964: OCP 4.15.0-rc.5 : Generating the install-config file or creating manifest or creating cluster fails with Invalid value: "Standard_D8s_v3": with multiple reason

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29469~~. The following is the description of the original issue:
—
Description of problem:

Hello Team,

We have observed weird and uncertain behavior while we execute any of the  below commands to proceed for the installation on Azure platform for OCP deployment 4.15.0-rc.5.

$ ./openshift-install create install-config --dir=<installation_dir> 
or
$ ./openshift-install create manifests --dir=<installation_dir> 
or 
$ ./openshift-install create cluster --dir=<installation_dir> 


Attempt 1: It fails
~~~
$ ./openshift-install create install-config --dir=. 
? SSH Public Key /root/.ssh/id_rsa.pub
? Platform azure
INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json" 
? Region eastus
? Base Domain india.az.cee.support
? Cluster Name ocpdummyocp415
? Pull Secret [? for help] *******************************************************************************************************************************************************************FATAL failed to fetch Install Config: failed to generate asset "Install Config": [controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": not found in region eastus, controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": unable to determine HyperVGeneration version] 
~~~

Attempt 2 : It succeeded
~~~
./openshift-install create install-config --dir=. 
? SSH Public Key /root/.ssh/id_rsa.pub
? Platform azure
INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json" 
? Region eastus
? Base Domain india.az.cee.support
? Cluster Name ocpdummyocp415
? Pull Secret [? for help] *******************************************************************************************************************************************************************INFO Install-Config created in: .     
~~~

This is very uncertain even though I have full permission and access of azure and have available disk size in all region/location as well.

~~~
$ az vm list-sizes --location  "eastus" | grep -i Standard_D8s_v3
   "name": "Standard_D8s_v3",
~~~

Attempt 3 : Fails - Now if we create manifest it fails.

~~~
./openshift-install create manifests --dir=. 
INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json" 
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: [controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": not found in region eastus, controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": unable to determine HyperVGeneration version, compute[0].platform.azure.type: Invalid value: "Standard_D4s_v3": not found in region eastus] 
~~~

Version-Release number of selected component (if applicable):

How reproducible:

Uncertain sometimes it works and sometime it does not.

Steps to Reproduce:

With any of the below commands we are encountering the issue:

$ ./openshift-install create install-config --dir=<installation_dir> 
or
$ ./openshift-install create manifests --dir=<installation_dir> 
or 
$ ./openshift-install create cluster --dir=<installation_dir> 

FATAL failed to fetch Install Config: failed to generate asset "Install Config": [controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": not found in region eastus, controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": unable to determine HyperVGeneration version]

Actual results:

 Reverse of Expected results:

 FATAL failed to fetch Install Config: failed to generate asset "Install Config": [controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": not found in region eastus, controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": unable to determine HyperVGeneration version] 

FATAL failed to fetch Install Config: failed to generate asset "Install Config": [controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": not found in region eastus, controlPlane.platform.azure.type: Invalid value: "Standard_D8s_v3": unable to determine HyperVGeneration version]

Expected results:

It should generate the install-config file or manifest or create cluster.

Additional info:

https://github.com/openshift/installer/pull/8086

Bug OCPBUGS-29775: [release-4.15] tls: bad certificate from kube-apiserver-operator

View the Description View the linked PRs

As this shows tls: bad certificate from kube-apiserver operator, for example, https://reportportal-openshift.apps.ocp-c1.prod.psi.redhat.com/ui/#prow/launches/all/470214, checked its must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-aws-ipi-imdsv2-fips-f14/1726036030588456960/artifacts/aws-ipi-imdsv2-fips-f14/gather-must-gather/artifacts/

MacBook-Pro:~ jianzhang$ omg logs prometheus-operator-admission-webhook-6bbdbc47df-jd5mb | grep "TLS handshake"
2023-11-27 10:11:50.687 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
2023-11-19T00:57:08.318983249Z ts=2023-11-19T00:57:08.318923708Z caller=stdlib.go:105 caller=server.go:3215 msg="http: TLS handshake error from 10.129.0.35:48334: remote error: tls: bad certificate"
2023-11-19T00:57:10.336569986Z ts=2023-11-19T00:57:10.336505695Z caller=stdlib.go:105 caller=server.go:3215 msg="http: TLS handshake error from 10.129.0.35:48342: remote error: tls: bad certificate"
...
MacBook-Pro:~ jianzhang$ omg get pods -A -o wide | grep "10.129.0.35"
2023-11-27 10:12:16.382 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
openshift-kube-apiserver-operator                 kube-apiserver-operator-f78c754f9-rbhw9                          1/1    Running    2         5h27m  10.129.0.35   ip-10-0-107-238.ec2.internal

for more information slack - https://redhat-internal.slack.com/archives/CC3CZCQHM/p1700473278471309

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1647

Task MGMT-16980: Move default value of ENABLE_SKIP_MCO_REBOOT to false for ABI and ACM

View the Description View the linked PRs

https://github.com/openshift/assisted-service/pull/6005

Bug OCPBUGS-21669: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-apiserver/pull/99

Bug OCPBUGS-29416: HyperShift operator should not apply PKI operator RBAC if PKI disabled

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29209~~. The following is the description of the original issue:
—
Description of problem:

    HyperShift operator is applying control-plane-pki-operator RBAC resources regardless of if PKI reconciliation is disabled for the HostedCluster.

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    100%

Steps to Reproduce:

    1. Create 4.15 HostedCluster with PKI reconciliation disabled
    2. Unused RBAC resources for control-plane-pki-operator is created

Actual results:

    Unused RBAC resources for control-plane-pki-operator is created

Expected results:

RBAC resources for control-plane-pki-operator should not be created if deployment for control-plane-pki-operator itself is not created.

Additional info:

https://github.com/openshift/hypershift/pull/3565

Bug OCPBUGS-37458: [release-4.15] one OAuth.config.openshift.io item on Global Configuration page links to non-existing resource

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34986~~. The following is the description of the original issue:
—
Description of problem:

non-existing oauth.config.openshift.io resource is  listed on Global Configuration page

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-06-05-082646

How reproducible:

Always

Steps to Reproduce:

1. visit global configuration page /settings/cluster/globalconfig
2. check listed items on the page
3.

Actual results:

2. There are two OAuth.config.openshift.io entries, one is linking to /k8s/cluster/config.openshift.io~v1~OAuth/oauth-config, this will return 404: Not Found

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-06-05-082646   True        False         171m    Cluster version is 4.16.0-0.nightly-2024-06-05-082646

$ oc get oauth.config.openshift.io
NAME      AGE
cluster   3h26m

Expected results:

from CLI output we can see there is only one oauth.config.openshift.io resource, but we are showing one more 'oauth-config'  

Only one oauth.config.openshift.io resource should be listed

Additional info:

https://github.com/openshift/console/pull/14085

Bug OCPBUGS-24071: Update 4.15 ose-csi-snapshot-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/115

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/115

Bug OCPBUGS-24074: Update 4.15 ose-vsphere-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-vsphere/pull/26

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-vsphere/pull/26

Bug OCPBUGS-42304: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/786

Bug OCPBUGS-4426: [CI Watcher]: All Projects' dropdown": Roles and RoleBindings "before all" hook for "test Roles detail page breadcrumbs to list page restores

View the Description View the linked PRs

Description of problem:

All Projects' dropdown test is failing in CI

Search CI

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13190

Bug OCPBUGS-24112: Update 4.15 ose-cluster-storage-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-storage-operator/pull/424

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-storage-operator/pull/424

Bug OCPBUGS-25706: Archieved in Tekton Results icon is not shown in list and details page for PipelineRuns imported from Tekton Results db

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25396~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13461

Bug OCPBUGS-33043: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-43760: Azure installs are slow to create manifests

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31546~~. The following is the description of the original issue:
—
Description of problem:

When running an Azure install, the installer noticeably hangs for a long time when running create manifests or create cluster. It will sit unresponsive for almost 2 minutes at:

DEBUG OpenShift Installer unreleased-master-9741-gbc9836aa9bd3a4f10d229bb6f87981dddf2adc92
DEBUG Built from commit bc9836aa9bd3a4f10d229bb6f87981dddf2adc92
DEBUG Fetching Metadata...
DEBUG Loading Metadata...
DEBUG Loading Cluster ID...
DEBUG Loading Install Config...
DEBUG Loading SSH Key...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Cluster Name...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Pull Secret...
DEBUG Loading Platform...
INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json"

This could also be related to failures we see in CI such as this:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_installer/8123/pull-ci-openshift-installer-master-e2e-azure-ovn/1773611162923962368

level=info msg=Consuming Worker Machines from target directory
level=info msg=Credentials loaded from file "/var/run/secrets/ci.openshift.io/cluster-profile/osServicePrincipal.json"
level=fatal msg=failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": error connecting to Azure client: failed to list SKUs: compute.ResourceSkusClient#List: Failure responding to request: StatusCode=200 -- Original Error: Error occurred reading http.Response#Body - Error = 'read tcp 10.128.117.2:43870->4.150.240.10:443: read: connection reset by peer'

If the call takes too long and the context timeout is canceled, we might potentially see this error.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Run azure install
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8134
has a partial fix

https://github.com/openshift/installer/pull/9135

Bug OCPBUGS-14718: Garbage in cloud-controller-manager status

View the Description View the linked PRs

Description of problem:

The cloud-controller-manager operator can show garbage in its status:

# oc get co cloud-controller-manager
NAME                       VERSION                                    AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
cloud-controller-manager   4.14.0-0.nightly-arm64-2023-06-07-071657   True        False         True       58m     Failed to resync for operator: 4.14.0-0.nightly-arm64-2023-06-07-071657 because &{%!e(string=failed to apply resources because TrustedCABundleControllerControllerDegraded condition is set to True)}

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-arm64-2023-06-07-071657

How reproducible:

always

Steps to Reproduce:

1. oc delete project openshift-cloud-controller-manager
2. wait a couple of minutes
3. oc get co openshift-cloud-controller-manager

Actual results:

Failed to resync for operator: 4.14.0-0.nightly-arm64-2023-06-07-071657 because &{%!e(string=failed to apply resources because TrustedCABundleControllerControllerDegraded condition is set to True)}

Expected results:

A helpful error message

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/275

Bug OCPBUGS-20229: Service details page shows revisions and routes from other service also

View the Description View the linked PRs

Description of problem:

In the newly added tabs for Revisions and Routes in service details page, the details of other service is also displayed. It should filter for the particular service

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. install serverless operator
2. Create serving instance
3. create multiple service in a namespace
4. Click on any service and go to Revisions, Routes and Pods page

Actual results:

Revisions and routes from other service also displayed

Expected results:

Revisions and routes for that particular service should be displayed

Additional info:

https://github.com/openshift/console/pull/13221

Bug OCPBUGS-23918: Infinite network call to tekton-results API in PAC repository list and details page

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Install Pipeline operator and setup tekton-results on the cluster
    2. Create a PAC repository and trigger a PLR
    3. open network tab and visit Repository list page

Actual results:

    infinite internet API call

Expected results:

    internet API call should not get call continuously

Additional info:

https://github.com/openshift/console/pull/13364

Bug OCPBUGS-15504: Remove kube-apiserver PrometheusRule from manifests

View the Description View the linked PRs

Description of problem:

0000_90_kube-apiserver-operator_04_servicemonitor-apiserver lists Prometheus Rule `kube-apiserver` which is meant to be deleted by CVO (has `release.openshift.io/delete: "true"` annotation).
This manifests is no longer needed, as `cluster:apiserver_current_inflight_requests:sum:max_over_time:2m` recording rule is already provided by other PrometheusRules.

If this is meant to be removed in 4.13, its safe to remove the manifest in 4.14, as we don't allow skipping 4.13 and by the time users will start 4.14 update this manifest would already be removed in the clusters by CVO

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1543

Bug OCPBUGS-24170: Update 4.15 ose-powervs-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-provider-powervs/pull/61

Bug OCPBUGS-30801: Switch to service to get the PLR and TR logs from the Tekton results summary API

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30551~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13663

Bug OCPBUGS-46430: [release-4.15] Console CSS adds bullets to dynamic plugin dropdown menu

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-46022~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42558~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-36705. The following is the description of the original issue:
—
Description of problem:

CSS overrides in the OpenShift console are applied to ACM dropdown menu

Version-Release number of selected component (if applicable):

4.14, 4.15

How reproducible:

Always

Steps to Reproduce:

View ACM, Governance > Policies. Actions dropdown

Actual results:

Actions are indented and preceded by bullets

Expected results:

Dropdown menu style should not be affected

Additional info:

https://github.com/openshift/console/pull/14626

Bug OCPBUGS-13968: Rebase coredns to upstream version based on k8s APIs v0.27

View the Description View the linked PRs

Description of problem:

The current version of openshift/coredns vendors Kubernetes 1.26 packages. OpenShift 4.14 is based on Kubernetes 1.27.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Check https://github.com/openshift/coredns/blob/release-4.14/go.mod

Actual results:

Kubernetes packages (k8s.io/api, k8s.io/apimachinery, and k8s.io/client-go) are at version v0.26

Expected results:

Kubernetes packages are at version v0.27.0 or later.

Additional info:

Using old Kubernetes API and client packages brings risk of API compatibility issues.

https://github.com/openshift/coredns/pull/94

Bug OCPBUGS-19203: Update 4.15 ose-cluster-baremetal-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-baremetal-operator/pull/362

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-baremetal-operator/pull/362

Bug OCPBUGS-35524: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/egress-router-cni/pull/86

Bug OCPBUGS-44283: The setting of NTO cloud provider doesn't work

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33682~~. The following is the description of the original issue:
—
Description of problem:

The cloud provider feature of NTO doesn't work as expected

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create a cloud-provider profile like as 
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: provider-aws
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=GCE Cloud provider-specific profile
      # Your tuning for GCE Cloud provider goes here.
      [sysctl]
      vm.admin_reserve_kbytes=16386
    name: provider-aws     2.
    3.

Actual results:

    the value of vm.admin_reserve_kbytes still using default value

Expected results:

    the value of vm.admin_reserve_kbytes should change to 16386

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/1208

Bug OCPBUGS-17669: Validate Cluster Name in HostedCluster Controller

View the Description View the linked PRs

Description of problem:

The HostedCluster name is not currently validated against RFC1123.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1.
2.
3.

Actual results:

Any HostedCluster name is allowed

Expected results:

Only HostedCluster names meeting RFC1123 validation should be allowed.

Additional info:

https://github.com/openshift/hypershift/pull/3036

Bug OCPBUGS-20505: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-operator-controller/pull/28

Bug OCPBUGS-33838: TaskRuns should not be fetched for Failed PLR's

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32631~~. The following is the description of the original issue:
—
Description of problem:

1. wrt changes done in PR - https://github.com/openshift/console/pull/13676 TaskRuns are fetched for Failed and Cancelled PipelineRuns. 

In order to still improve the performance of PLR list page, use pipelinerun.status.conditions.message  for Failed TaskRuns as well and along with that, for any PLR, if string pipelinerun.status.conditions.message having data about Tasks status use that string only instead of fetching TaskRuns 

ex string : 'Tasks Completed: 2 (Failed: 1, Cancelled 0), Skipped: 1'

2. For Failed PLR, to show the log snippet, make the API call on click of Failed status column in the list page

https://github.com/openshift/console/pull/13863

Bug OCPBUGS-18848: Update 4.15 ose-multus-route-override-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/route-override-cni/pull/48

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/route-override-cni/pull/48

Bug OCPBUGS-19688: agent-tui should work on the serial console

View the Description View the linked PRs

The agent-tui interface for editing the network config for the Agent ISO at boot time only runs on the graphical console (tty1). It's difficult to run two copies, so this gives the most value for now when there is a graphical console available.

However, when the host has only a serial console, there are two consequences:

there's no way to edit the network config
console output is frozen while agent-tui is running in the background. If it is not possible to pull the release image, this freeze will last forever and the user will never get to the getty screen on the serial console that tells you about how the release image is not available.

Both situations could be resolved by allowing agent-tui to run on the serial console instead of the graphical console when there is no graphical console.

https://github.com/openshift/installer/pull/7526

Bug OCPBUGS-29797: Excessive node status updates causing high control plane CPU

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29713~~. The following is the description of the original issue:
—
Description of problem:

OCPBUGS-29424 revealed that setting the node status update frequency in kubelet (introduced with OCPBUGS-15583) causes a lot of control plane CPU. 

The reason is the increased frequency of kubelet node status updates will trigger second order effects in all control plane operators that usually trigger on node changes (api server, etcd, PDB guard pod controllers, or any other static pod based machinery).

Reverting the code in OCPBUGS-15583, or manually setting the report/status frequency to 0s causes the CPU to drop immediately.

Version-Release number of selected component (if applicable):

any version that OCPBUGS-15583 was backported to, 4.16 down to 4.11 AFAIU

How reproducible:

always

Steps to Reproduce:

1. create a cluster that contains a fix for OCPBUGS-15583
2. observe the apiserver metrics (eg rate(apiserver_request_total[5m])), those should show abnormal values for pod/configmap GET
    alternatively the rate of node updates is increaed (rate(apiserver_request_total{resource="nodes", subresource="status", verb="PATCH"}[1m]))

Actual results:

the node status updates every 10s, which causes high CPU usage on control plane operators and apiserver

Expected results:

the node status should not update that frequently, meaning the control plane CPU usage should go down again

Additional info:

slack thread with the node team:
https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1708429189987849

https://github.com/openshift/machine-config-operator/pull/4211

Bug OCPBUGS-19510: CI issue with cluster-dns-operator TestCoreDNSDaemonSetReconciliation

View the Description View the linked PRs

Description of problem:

I noticed this in the logs at https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-dns-operator/373/pull-ci-openshift-cluster-dns-operator-master-e2e-aws-ovn-operator/1704287854600916992/build-log.txt:

=== RUN   TestCoreDNSDaemonSetReconciliation
[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 205 [running]:
runtime/debug.Stack()
	/usr/lib/golang/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	/go/src/github.com/openshift/cluster-dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0xc000061340, {0x182213b, 0x14})
	/go/src/github.com/openshift/cluster-dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:147 +0x4c
github.com/go-logr/logr.Logger.WithName({{0x1aa8468, 0xc000061340}, 0x0}, {0x182213b?, 0x0?})
	/go/src/github.com/openshift/cluster-dns-operator/vendor/github.com/go-logr/logr/logr.go:336 +0x46
sigs.k8s.io/controller-runtime/pkg/client.newClient(0xc000789200, {0x0, 0xc0001a7730, {0x1aa9d90, 0xc00011c700}, 0x0, {0x0, 0x0}, 0x0})
	/go/src/github.com/openshift/cluster-dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:115 +0xb4
sigs.k8s.io/controller-runtime/pkg/client.New(0xc000789200?, {0x0, 0xc0001a7730, {0x1aa9d90, 0xc00011c700}, 0x0, {0x0, 0x0}, 0x0})
	/go/src/github.com/openshift/cluster-dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:101 +0x85
github.com/openshift/cluster-dns-operator/pkg/operator/client.NewClient(0x0?)
	/go/src/github.com/openshift/cluster-dns-operator/pkg/operator/client/client.go:52 +0x145
github.com/openshift/cluster-dns-operator/test/e2e.getClient()
	/go/src/github.com/openshift/cluster-dns-operator/test/e2e/utils.go:451 +0x77
github.com/openshift/cluster-dns-operator/test/e2e.TestCoreDNSDaemonSetReconciliation(0xc000501520)
	/go/src/github.com/openshift/cluster-dns-operator/test/e2e/operator_test.go:330 +0x45
testing.tRunner(0xc000501520, 0x193c038)
	/usr/lib/golang/src/testing/testing.go:1576 +0x10b
created by testing.(*T).Run
	/usr/lib/golang/src/testing/testing.go:1629 +0x3ea
    operator_test.go:374: found "foo" node selector on daemonset openshift-dns/dns-default: <nil>
    operator_test.go:378: observed absence of "foo" node selector on daemonset openshift-dns/dns-default: <nil>
--- PASS: TestCoreDNSDaemonSetReconciliation (1.63s)

We need to make a minor change in https://github.com/openshift/cluster-dns-operator/blob/7d2a16c0abf80d09fdcbeef8464994b78aa0589d/test/e2e/operator_test.go#L374-L375

Version-Release number of selected component (if applicable):

4.15 and earlier

How reproducible:

Be unlucky in CI testing

Steps to Reproduce:

1.
2.
3.

Actual results:

Stack trace and prints a <nil>
 operator_test.go:374: found "foo" node selector on daemonset openshift-dns/dns-default: <nil>
    operator_test.go:378: observed absence of "foo" node selector on daemonset openshift-dns/dns-default: <nil>

Expected results:

No stack trace and no print of <nil>

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/381

Bug OCPBUGS-21633: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/173

Bug OCPBUGS-27188: Ensure Passwords are Redacted in Agent Gather manifest Files

View the Description View the linked PRs

This is a clone of issue OCPBUGS-26434. The following is the description of the original issue:
—
When platform specific passwords are included in the install-config.yaml they are stored in the generated agent-cluster-install.yaml, which is included in the output of the agent-gather command. These passwords should be redacted.

https://github.com/openshift/installer/pull/7910

Bug OCPBUGS-28940: Fix usersettings identifier creation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28836~~. The following is the description of the original issue:
—
Description of problem:

Usernames can contain all kinds of characters that are not allowed in resource names. Hash the name instead and use hex representation of the result to get a usable identifier.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1. log in to the web console configured with a login to a 3rd party OIDC provider
    2. go to the User Preferences page / check the logs in the javascript console

Actual results:

The User Preferences page shows empty values instead of defaults.
The javascript console reports things like
```
consoleFetch failed for url /api/kubernetes/api/v1/namespaces/openshift-console-user-settings/configmaps/user-settings-kubeadmin r: configmaps "user-settings-kubeadmin" not found
```

Expected results:

   I am able to persist my user preferences.

Additional info:

https://github.com/openshift/console/pull/13571

Bug OCPBUGS-43022: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/6894

Bug MGMT-16843: non-lowercase hostname in DHCP breaks assisted installation

View the Description View the linked PRs

Description of the problem:

non-lowercase hostname in DHCP breaks assisted installation

How reproducible:

100%

Steps to reproduce:

https://issues.redhat.com/browse/AITRIAGE-10248
User did ask for a valid requested_hostname

Actual results:

bootkube fails

Expected results:{}

bootkube should succeed

slack thread

Bug OCPBUGS-48513: e2e installs wrong lib versions

View the Description View the linked PRs

Description of problem:

    Our e2e setup `go install` a few packages with the `@latest` tag. `go install` does not take `go.mod` into consideration, so in older branches we can pull package versions not compatible with the system Go version.

Version-Release number of selected component (if applicable):

    All branches using Go < 1.23

How reproducible:

    always on branch <= 4.18

Steps to Reproduce:

    1. 
    2.
    3.

Actual results:

                ./test/e2e/e2e-simple.sh ././bin/oc-mirror
                /go/src/github.com/openshift/oc-mirror/test/e2e/operator-test.17343 /go/src/github.com/openshift/oc-mirror
                go: downloading github.com/google/go-containerregistry v0.20.3
                go: github.com/google/go-containerregistry/cmd/crane@latest: github.com/google/go-containerregistry@v0.20.3 requires go >= 1.23.0 (running go 1.22.9; GOTOOLCHAIN=local)             
                /go/src/github.com/openshift/oc-mirror/test/e2e/lib/util.sh: line 17: PID_DISCONN: unbound variable
              
    
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_oc-mirror/1006/pull-ci-openshift-oc-mirror-release-4.18-e2e/1879913390239911936

Expected results:

    The package version selected is compatible with the system Go version.

Additional info:

https://github.com/openshift/oc-mirror/pull/1022

Bug OCPBUGS-19212: Update 4.15 csi-driver-nfs image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-nfs/pull/129

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-nfs/pull/129

Bug OCPBUGS-20238: [OVN-Kubernetes] Incorret webhook error & exit handling

View the Description View the linked PRs

When there is an error on HTTP listen, webhook does not handle the error in a way that recovery is possible and instead hangs without printing anything useful on the logs.

Seen after this change https://issues.redhat.com//browse/OCPBUGS-20104 where the webhook was re-configured to run as non-root but listen would fail on upgrade as the old webhook instance was running as root which causes an error due to the SOREUSE socket option.

The webhook should crashloop instead which would provide a chance of recovery although the recovery itself might still be racey depending on whether k8s is able to kill the old webhook instance before noticing the crash of the new instance.

https://github.com/openshift/ovn-kubernetes/pull/1931

Bug OCPBUGS-41798: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-autoscaler-operator)

View the Description View the linked PRs

Backport to 4.17 of AUTH-482 specifically for the cluster-autoscaler-operator

All workloads of the following namespaces need SCC pinning:

cluster-autoscaler-operator

See 4.18 PR for more info on what needs pinning.

https://github.com/openshift/cluster-autoscaler-operator/pull/332

Bug OCPBUGS-18352: winc upgrades are failing from 4.13 -> 4.14 due to remote ovnkube-controller is not ready

View the Description View the linked PRs

Description of problem:

From CLBO ovnkube-node logs:

Upgrade hack: Timed out waiting for the remote ovnkube-controller to be ready even after 5 minutes, err : context deadline exceeded, unable to fetch node-subnet annotation for node ip-10-0-133-201.us-east-2.compute.internal: err, could not find "k8s.ovn.org/node-subnets" annotation

ovnkube-controller not ready implies the absence of node-subnets annotation

CNO upgrade stuck at DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2023-08-30T10:06:44Z

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1.Install OCP cluster with RHCOS and win nodes on 4.13
2.Perform upgrade to 4.14
3.

Actual results:

Upgrades failed on CNO

Expected results:

Upgrade should pass

Additional info:

must-gather: http://shell.lab.bos.redhat.com/~anusaxen/must-gather.local.1473221474492991466/

https://github.com/openshift/ovn-kubernetes/pull/1907

Bug OCPBUGS-18999: Intermittent 504 Gateway Time-out

View the Description View the linked PRs

Description of problem:

Image pulls fail with http status 504, gateway timeout until image registry pods are restarted.

Version-Release number of selected component (if applicable):

4.13.12

How reproducible:

Intermittent

Steps to Reproduce:

1.
2.
3.

Actual results:

Images can't be pulled: 
podman pull registry.ci.openshift.org/ci/applyconfig:latest Trying to pull registry.ci.openshift.org/ci/applyconfig:latest... Getting image source signatures Error: reading signatures: downloading signatures for sha256:83c1b636069c3302f5ba5075ceeca5c4a271767900fee06b919efc3c8fa14984 in registry.ci.openshift.org/ci/applyconfig: received unexpected HTTP status: 504 Gateway Time-out


Image registry pods contain errors:
time="2023-09-01T02:25:39.596485238Z" level=warning msg="error authorizing context: access denied" go.version="go1.19.10 X:strictfipsruntime" http.request.host=registry.ci.openshift.org http.request.id=3e805818-515d-443f-8d9b-04667986611d http.request.method=GET http.request.remoteaddr=18.218.67.82 http.request.uri="/v2/ocp/4-dev-preview/manifests/sha256:caf073ce29232978c331d421c06ca5c2736ce5461962775fdd760b05fb2496a0" http.request.useragent="containers/5.24.1 (github.com/containers/image)" vars.name=ocp/4-dev-preview vars.reference="sha256:caf073ce29232978c331d421c06ca5c2736ce5461962775fdd760b05fb2496a0"

Expected results:

Image registry does not return gateway timeouts

Additional info:

Must gather(s) attached, additional information in linked OHSS ticket.

https://github.com/openshift/image-registry/pull/380

Bug OCPBUGS-20572: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/375

Bug OCPBUGS-37397: [release-4.15] [UI] RWOP accessMode is not available on OpenShift console UI

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29777~~. The following is the description of the original issue:
—
Description of problem:

RWOP accessMode is tech preview feature starting from OCP 4.14 and GA in 4.16. But on OCP console UI, there is not option available for creating a PVC with RWOP accessMode

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Login to OCP console in Administrator mode (4.14/4.15/4.16)
    2. Go to 'Storage -> PersistentVolumeClaim -> Click on Create PersistentVolumeClaim' 
    3. Check under 'Access Mode*', RWOP option is not present

Actual results:

    RWOP accessMode option is not present

Expected results:

    RWOP accessMode option is present

Additional info:

Storage feature: https://issues.redhat.com/browse/STOR-1171

https://github.com/openshift/console/pull/14078

Bug OCPBUGS-31476: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13707

Bug OCPBUGS-19823: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5565

Bug OCPBUGS-22600: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/94

Bug OCPBUGS-23161: cluster-network-operator does not emit logs from logr

View the Description View the linked PRs

See log:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 422 [running]:
runtime/debug.Stack()
runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
sigs.k8s.io/controller-runtime@v0.15.0/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0xc000521140, {0x2d0b2ef, 0x14})
sigs.k8s.io/controller-runtime@v0.15.0/pkg/log/deleg.go:147 +0x4c
github.com/go-logr/logr.Logger.WithName({

Unknown macro: {0x31b3e78, 0xc000521140}

, 0x0}, {0x2d0b2ef?, 0x40?})
github.com/go-logr/logr@v1.2.4/logr.go:336 +0x46
sigs.k8s.io/controller-runtime/pkg/client.newClient(0xc000471440, {0x0, 0x0,

Unknown macro: {0x31b5c00, 0xc000eb3100}
, 0x0, {0x0, 0x0}, 0x0})
sigs.k8s.io/controller-runtime@v0.15.0/pkg/client/client.go:115 +0xb4
sigs.k8s.io/controller-runtime/pkg/client.New(0x319b2b0?, {0x0, 0x0,

, 0x0, {0x0, 0x0}, 0x0})
sigs.k8s.io/controller-runtime@v0.15.0/pkg/client/client.go:101 +0x85
github.com/openshift/cluster-network-operator/pkg/client.NewClusterClient(0xc000471440, 0xc000499b00)
github.com/openshift/cluster-network-operator/pkg/client/client.go:188 +0x2b0
github.com/openshift/cluster-network-operator/pkg/client.NewClient(0x0?, 0x0?, {0x2cecdf7, 0x7}, 0x0?)
github.com/openshift/cluster-network-operator/pkg/client/client.go:100 +0xa5
github.com/openshift/cluster-network-operator/pkg/operator.RunOperator({0x31ace70, 0xc0009a0b90}, 0xc000318a40, {0x2cecdf7, 0x7}, 0x0?)
github.com/openshift/cluster-network-operator/pkg/operator/operator.go:46 +0xbd
main.newNetworkOperatorCommand.func2({0x31ace70?, 0xc0009a0b90?}, 0x31acee0?)
github.com/openshift/cluster-network-operator/cmd/cluster-network-operator/main.go:49 +0x3b
github.com/openshift/library-go/pkg/controller/controllercmd.ControllerBuilder.getOnStartedLeadingFunc.func1.1()
github.com/openshift/library-go@v0.0.0-20230503144409-4cb26a344c37/pkg/controller/controllercmd/builder.go:351 +0x74
created by github.com/openshift/library-go/pkg/controller/controllercmd.ControllerBuilder.getOnStartedLeadingFunc.func1
github.com/openshift/library-go@v0.0.0-20230503144409-4cb26a344c37/pkg/controller/controllercmd/builder.go:349 +0x10a

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn-conformance/1722551726378061824/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-dfc32df54edf6e3b2a2e/core/pods/logs/cluster-network-operator-d79876885-l5h6b-cluster-network-operator.log

I think we should want logs

https://github.com/openshift/cluster-network-operator/pull/2129

Bug OCPBUGS-24648: OCP 4.15 nightly deployment on a bare-metal server without using the provisioning network is stuck during deployment.

View the Description View the linked PRs

Description of problem:

OCP 4.15 nightly deployment on a Bare-metal servers without using the provisioning network is stuck during deployment.

Job history:

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-telco5g

Deployment stuck similiar to this:

Upstream job logs:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-telco5g/1732520780954079232/artifacts/e2e-telco5g/telco5g-cluster-setup/artifacts/cloud-init-output.log

~~~

level=debug msg=ironic_node_v1.openshift-master-host[2]: Creating...level=debug msg=ironic_node_v1.openshift-master-host[0]: Creating...level=debug msg=ironic_node_v1.openshift-master-host[1]: Creating...level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [10s elapsed]..level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [2h28m51s elapsed]level=debug msg=ironic_node_v1.openshift-master-host[1]: Still creating... [2h28m51s elapsed]
~~~

Ironic logs from bootstrap node:
~~~
Dec 07 13:10:13 localhost.localdomain start-provisioning-nic.sh[3942]: Error: failed to modify ipv4.addresses: invalid IP address: Invalid IPv4 address ''.
Dec 07 13:10:13 localhost.localdomain systemd[1]: provisioning-interface.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 07 13:10:13 localhost.localdomain systemd[1]: provisioning-interface.service: Failed with result 'exit-code'.
Dec 07 13:10:13 localhost.localdomain systemd[1]: Failed to start Provisioning interface.
Dec 07 13:10:13 localhost.localdomain systemd[1]: Dependency failed for DHCP Service for Provisioning Network.
Dec 07 13:10:13 localhost.localdomain systemd[1]: ironic-dnsmasq.service: Job ironic-dnsmasq.service/start failed with result 'dependency'
~~~

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Everytime

Steps to Reproduce:

1.Deploy OCP

More information about our setup:
In our environment, We have 3 virtual master node, 1 virtual worker and 1 baremetal worker. We use KCLI tool for creation of the virtual environment and for running the deployment workflow using IPI, In our setup we don't use provisioning network. (Same setup is used for other OCP version till 4.14 and are working fine.)

We have attached our install-config.yaml (for RH employees) and logs from bootstrap node.

Actual results:

Deployment is failing

Dec 07 13:10:13 localhost.localdomain start-provisioning-nic.sh[3942]: Error: failed to modify ipv4.addresses: invalid IP address: Invalid IPv4 address ''.

Expected results:

Deployment should pass

Additional info:

https://github.com/openshift/ironic-image/pull/439

Bug OCPBUGS-36208: [release-4.15] Allow configuring router default connect timeout on an ingress controller

View the Description View the linked PRs

Description of problem

As a cluster-admin, I want to configure the router default connect timeout, so that HAProxy doesn't time out when connecting to a slow backend server, or so that I can set a shorter timeout to mitigate DoS attacks. This setting is controlled by the ROUTER_DEFAULT_CONNECT_TIMEOUT environment variable on the router deployment, which OpenShift router uses to set HAProxy's timeout connect setting.

This issue is being tracked as a defect rather than a new feature request because configurability of ROUTER_DEFAULT_CONNECT_TIMEOUT / timeout connect was an explicit requirement for ~~RFE-403~~ (which was accepted) and ~~NE-412~~ (which was completed and closed), but somehow the option got dropped during the design and implementation of the feature. This should have been caught during review of the pull request for the enhancement proposal, and the EP should have either included the option or justified its exclusion. As it stands, the feature is incomplete, and therefore this is a defect, not an RFE.

Version-Release number of selected component (if applicable)

4.16 through 4.9. The feature that was supposed to include this option shipped in OCP 4.9.

How reproducible

100%.

Steps to Reproduce

1. Check the supported router tuning options: oc explain IngressController.spec.tuningOptions

Actual results

There is no option to configure the server connect timeout:

% oc explain IngressController.spec.tuningOptions
KIND: IngressController
VERSION: operator.openshift.io/v1

RESOURCE: tuningOptions <Object>

DESCRIPTION:
tuningOptions defines parameters for adjusting the performance of ingress
controller pods. All fields are optional and will use their respective
defaults if not set. See specific tuningOptions fields for more details.
Setting fields within tuningOptions is generally not recommended. The
default values are suitable for most configurations.

FIELDS:
clientFinTimeout <string>
clientFinTimeout defines how long a connection will be held open while
waiting for the client response to the server/backend closing the
connection. If unset, the default timeout is 1s

clientTimeout <string>
clientTimeout defines how long a connection will be held open while waiting
for a client response. If unset, the default timeout is 30s

headerBufferBytes <integer>
headerBufferBytes describes how much memory should be reserved (in bytes)
for IngressController connection sessions. Note that this value must be at
least 16384 if HTTP/2 is enabled for the IngressController
(https://tools.ietf.org/html/rfc7540). If this field is empty, the
IngressController will use a default value of 32768 bytes. Setting this
field is generally not recommended as headerBufferBytes values that are too
small may break the IngressController and headerBufferBytes values that are
too large could cause the IngressController to use significantly more
memory than necessary.

headerBufferMaxRewriteBytes <integer>
headerBufferMaxRewriteBytes describes how much memory should be reserved
(in bytes) from headerBufferBytes for HTTP header rewriting and appending
for IngressController connection sessions. Note that incoming HTTP requests
will be limited to (headerBufferBytes - headerBufferMaxRewriteBytes) bytes,
meaning headerBufferBytes must be greater than headerBufferMaxRewriteBytes.
If this field is empty, the IngressController will use a default value of
8192 bytes. Setting this field is generally not recommended as
headerBufferMaxRewriteBytes values that are too small may break the
IngressController and headerBufferMaxRewriteBytes values that are too large
could cause the IngressController to use significantly more memory than
necessary.

healthCheckInterval <string>
healthCheckInterval defines how long the router waits between two
consecutive health checks on its configured backends. This value is applied
globally as a default for all routes, but may be overridden per-route by
the route annotation "router.openshift.io/haproxy.health.check.interval".
Expects an unsigned duration string of decimal numbers, each with optional
fraction and a unit suffix, eg "300ms", "1.5h" or "2h45m". Valid time units
are "ns", "us" (or "µs" U+00B5 or "μs" U+03BC), "ms", "s", "m", "h".
Setting this to less than 5s can cause excess traffic due to too frequent
TCP health checks and accompanying SYN packet storms. Alternatively,
setting this too high can result in increased latency, due to backend
servers that are no longer available, but haven't yet been detected as
such. An empty or zero healthCheckInterval means no opinion and
IngressController chooses a default, which is subject to change over time.
Currently the default healthCheckInterval value is 5s. Currently the
minimum allowed value is 1s and the maximum allowed value is 2147483647ms
(24.85 days). Both are subject to change over time.

maxConnections <integer>
maxConnections defines the maximum number of simultaneous connections that
can be established per HAProxy process. Increasing this value allows each
ingress controller pod to handle more connections but at the cost of
additional system resources being consumed. Permitted values are: empty, 0,
-1, and the range 2000-2000000. If this field is empty or 0, the
IngressController will use the default value of 50000, but the default is
subject to change in future releases. If the value is -1 then HAProxy will
dynamically compute a maximum value based on the available ulimits in the
running container. Selecting -1 (i.e., auto) will result in a large value
being computed (~520000 on OpenShift >=4.10 clusters) and therefore each
HAProxy process will incur significant memory usage compared to the current
default of 50000. Setting a value that is greater than the current
operating system limit will prevent the HAProxy process from starting. If
you choose a discrete value (e.g., 750000) and the router pod is migrated
to a new node, there's no guarantee that that new node has identical
ulimits configured. In such a scenario the pod would fail to start. If you
have nodes with different ulimits configured (e.g., different tuned
profiles) and you choose a discrete value then the guidance is to use -1
and let the value be computed dynamically at runtime. You can monitor
memory usage for router containers with the following metric:
'container_memory_working_set_bytes{container="router",namespace="openshift-ingress"}'.
You can monitor memory usage of individual HAProxy processes in router
containers with the following metric:
'container_memory_working_set_bytes{container="router",namespace="openshift-ingress"}/container_processes{container="router",namespace="openshift-ingress"}'.

reloadInterval <string>
reloadInterval defines the minimum interval at which the router is allowed
to reload to accept new changes. Increasing this value can prevent the
accumulation of HAProxy processes, depending on the scenario. Increasing
this interval can also lessen load imbalance on a backend's servers when
using the roundrobin balancing algorithm. Alternatively, decreasing this
value may decrease latency since updates to HAProxy's configuration can
take effect more quickly. The value must be a time duration value; see
<https://pkg.go.dev/time#ParseDuration>. Currently, the minimum value
allowed is 1s, and the maximum allowed value is 120s. Minimum and maximum
allowed values may change in future versions of OpenShift. Note that if a
duration outside of these bounds is provided, the value of reloadInterval
will be capped/floored and not rejected (e.g. a duration of over 120s will
be capped to 120s; the IngressController will not reject and replace this
disallowed value with the default). A zero value for reloadInterval tells
the IngressController to choose the default, which is currently 5s and
subject to change without notice. This field expects an unsigned duration
string of decimal numbers, each with optional fraction and a unit suffix,
e.g. "300ms", "1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"
U+00B5 or "μs" U+03BC), "ms", "s", "m", "h". Note: Setting a value
significantly larger than the default of 5s can cause latency in observing
updates to routes and their endpoints. HAProxy's configuration will be
reloaded less frequently, and newly created routes will not be served until
the subsequent reload.

serverFinTimeout <string>
serverFinTimeout defines how long a connection will be held open while
waiting for the server/backend response to the client closing the
connection. If unset, the default timeout is 1s

serverTimeout <string>
serverTimeout defines how long a connection will be held open while waiting
for a server/backend response. If unset, the default timeout is 30s

threadCount <integer>
threadCount defines the number of threads created per HAProxy process.
Creating more threads allows each ingress controller pod to handle more
connections, at the cost of more system resources being used. HAProxy
currently supports up to 64 threads. If this field is empty, the
IngressController will use the default value. The current default is 4
threads, but this may change in future releases. Setting this field is
generally not recommended. Increasing the number of HAProxy threads allows
ingress controller pods to utilize more CPU time under load, potentially
starving other pods if set too high. Reducing the number of threads may
cause the ingress controller to perform poorly.

tlsInspectDelay <string>
tlsInspectDelay defines how long the router can hold data to find a
matching route. Setting this too short can cause the router to fall back to
the default certificate for edge-terminated or reencrypt routes even when a
better matching certificate could be used. If unset, the default inspect
delay is 5s

tunnelTimeout <string>
tunnelTimeout defines how long a tunnel connection (including websockets)
will be held open while the tunnel is idle. If unset, the default timeout
is 1h

% oc version
Client Version: 4.13.0-0.ci-2022-11-11-144318
Kustomize Version: v4.5.7
Server Version: 4.12.0-0.nightly-2024-02-06-121927
Kubernetes Version: v1.25.16+6df2177
%

Expected results

spec.tuningOptions should have an option to configure the connect timeout. For example:

   connectTimeout        <string>
     connectTimeout defines how long the router will wait for a response when
     establishing a connection to a backend server. If unset, the default
     timeout is 5s.

https://github.com/openshift/cluster-ingress-operator/pull/1097

Bug OCPBUGS-36813: [4.15] The certificate relating to operator-lifecycle-manager-packageserver isn't rotated after expired

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36138~~. The following is the description of the original issue:
—
Cluster operator status showing `Unavailable`:

ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: APIServiceResourceIssue, message: found the CA cert is not active

Below script used for checking validity of the certificate and recreate them

# Check Cluster Existing Certificates :
        echo -e "NAMESPACE\tNAME\tEXPIRY" && oc get secrets -A -o go-template='{{range .items}}{{if eq .type "kubernetes.io/tls"}}{{.metadata.namespace}}{{" "}}{{.metadata.name}}{{" "}}{{index .data "tls.crt"}}{{"\n"}}{{end}}{{end}}' | while read namespace name cert; do echo -en "$namespace\t$name\t"; echo $cert | base64 -d | openssl x509 -noout -enddate; done | column -t
# Manually Update Cluster Certificates : 
        az aro update -n xxxx -g xxxx  --refresh-credentials --debug
# Check again Cluster Existing Certificates :

        echo -e "NAMESPACE\tNAME\tEXPIRY" && oc get secrets -A -o go-template='{{range .items}}{{if eq .type "kubernetes.io/tls"}}{{.metadata.namespace}}{{" "}}{{.metadata.name}}{{" "}}{{index .data "tls.crt"}}{{"\n"}}{{end}}{{end}}' | while read namespace name cert; do echo -en "$namespace\t$name\t"; echo $cert | base64 -d | openssl x509 -noout -enddate; done | column -t
#Renew Secret/Certificate for OLM :
        # Check Secret Expiration :
                oc get secret packageserver-service-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates
        # Backup the current secret :
                oc get secret packageserver-service-cert -o json -n openshift-operator-lifecycle-manager > packageserver-service-cert.yaml
        # Delete the Secret :
                oc delete secret packageserver-service-cert -n openshift-operator-lifecycle-manager
        # Check Secret Expiration again :
                oc get secret packageserver-service-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates
# Get Cluster Operator :
  oc get co
  oc get co operator-lifecycle-manager
  oc get co operator-lifecycle-manager-catalog
  oc get co operator-lifecycle-manager-packageserver
# Go to the kube-system namespace and take the backup of extension-apiserver-authentication configmap:
  oc project kube-system 
  oc get cm extension-apiserver-authentication -oyaml >> extcm_backup.yaml
# delete the extension-apiserver-authentication configmap to :
  oc delete cm extension-apiserver-authentication -n kube-system
  oc get cm -n kube-system |grep extension-apiserver-authentication
  oc get apiservice v1.packages.operators.coreos.com -o jsonpath='{.spec.caBundle}' | base64 -d | openssl x509 -noout -text

We have check the certificate details as below :

$ oc get apiservice 
v1.packages.operators.coreos.com
-o jsonpath='{.spec.caBundle}' | base64 -d | openssl x509 -text
E1213 10:24:41.606151 3802053 memcache.go:255] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1213 10:24:41.639144 3802053 memcache.go:106] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1213 10:24:41.651532 3802053 memcache.go:106] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1213 10:24:41.660851 3802053 memcache.go:106] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 5319897470906267024 (0x49d4129052ddf590)
        Signature Algorithm: ecdsa-with-SHA256
        Issuer: O = "Red Hat, Inc."
        Validity
            Not Before: Nov 29 18:41:35 2021 GMT
            Not After : Nov 29 18:41:35 2023 GMT
        Subject: O = "Red Hat, Inc."
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:ea:c0:af:d3:af:e6:0e:61:82:c8:f4:fe:ec:22:
                    8d:c5:c1:08:6f:91:92:8b:09:05:e9:72:ca:d4:68:
                    fb:aa:e1:ec:e2:e8:ca:32:4c:1f:e7:fc:3a:eb:61:
                    0b:df:9c:b4:13:62:f4:67:6c:d2:8f:97:a0:a8:a8:
                    69:08:22:4d:62
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Certificate Sign
            X509v3 Extended Key Usage:
                TLS Web Client Authentication, TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:TRUE
            X509v3 Subject Key Identifier:
                53:A4:1D:22:F8:0F:8E:C5:74:8C:C6:F4:90:F0:2D:29:B0:65:89:19
    Signature Algorithm: ecdsa-with-SHA256
         30:45:02:21:00:f5:32:98:3d:34:b6:fd:65:47:3b:31:0d:88:
         fc:fe:35:cd:4f:51:75:a0:89:16:1a:9e:56:d5:f7:49:e6:3a:
         a3:02:20:43:fa:81:78:56:f4:1f:9b:3a:5b:7f:28:7e:a8:5b:
         b7:7a:3e:0a:99:67:88:0e:66:e4:c9:d5:9d:2f:79:80:3e
----BEGIN CERTIFICATE----
MIIBhzCCAS2gAwIBAgIISdQSkFLd9ZAwCgYIKoZIzj0EAwIwGDEWMBQGA1UEChMN
UmVkIEhhdCwgSW5jLjAeFw0yMTExMjkxODQxMzVaFw0yMzExMjkxODQxMzVaMBgx
FjAUBgNVBAoTDVJlZCBIYXQsIEluYy4wWTATBgcqhkjOPQIBBggqhkjOPQMBBwNC
AATqwK/Tr+YOYYLI9P7sIo3FwQhvkZKLCQXpcsrUaPuq4ezi6MoyTB/n/DrrYQvf
nLQTYvRnbNKPl6CoqGkIIk1io2EwXzAOBgNVHQ8BAf8EBAMCAoQwHQYDVR0lBBYw
FAYIKwYBBQUHAwIGCCsGAQUFBwMBMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYE
FFOkHSL4D47FdIzG9JDwLSmwZYkZMAoGCCqGSM49BAMCA0gAMEUCIQD1Mpg9NLb9
ZUc7MQ2I/P41zU9RdaCJFhqeVtX3SeY6owIgQ/qBeFb0H5s6W38ofqhbt3o+Cpln
iA5m5MnVnS95gD4=

https://github.com/openshift/operator-framework-olm/pull/819

Bug OCPBUGS-19218: Update 4.15 ose-aws-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-aws/pull/48

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-aws/pull/48

Bug OCPBUGS-24164: Update 4.15 ose-azure-cloud-node-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/97

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/97

Bug OCPBUGS-27435: [AMQ Broker Operator] OLM deployed operator with watching multiple namespaces can't deploy its resources

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25989~~. The following is the description of the original issue:
—
Description of problem:

    Since OCP 4.15 we see issue with OLM deployed operator unable to operate in watched namespaces (multiple). It works fine with single watched namespace (subscription). Also, same test passes if we don't deploy operator using OLM, but using files.
It seems like it is permission issue based on operator log. Same test works fine on any other previous OCP 4.14 and older.

Version-Release number of selected component (if applicable):

Server Version: 4.15.0-ec.3
Kubernetes Version: v1.28.3+20a5764

How reproducible:

Always

Steps to Reproduce:

    0. oc login OCP4.15
    1. git clone https://gitlab.cee.redhat.com/amq-broker/claire
    2. make -f Makefile.downstream build ARTEMIS_VERSION=7.11.4 RELEASE_TYPE=released
    3. make -f Makefile.downstream operator_test OLM_IIB=registry-proxy.engineering.redhat.com/rh-osbs/iib:636350 OLM_CHANNEL=7.11.x  TESTS=ClusteredOperatorSmokeTests TEST_LOG_LEVEL=debug DISABLE_RANDOM_NAMESPACES=true

Actual results:

    Can't deploy artemis broker custom resource in given namespace (permission issue - see details below)

Expected results:

    Successfully deployed broker on watched namespaces

Additional info:

Log from AMQ Broker operator - seems like some permission issues since 4.15

    E0103 10:04:54.425202       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1beta1.ActiveMQArtemis: failed to list *v1beta1.ActiveMQArtemis: activemqartemises.broker.amq.io is forbidden: User "system:serviceaccount:cluster-tests:amq-broker-controller-manager" cannot list resource "activemqartemises" in API group "broker.amq.io" in the namespace "cluster-testsa"
E0103 10:04:54.425207       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1beta1.ActiveMQArtemisSecurity: failed to list *v1beta1.ActiveMQArtemisSecurity: activemqartemissecurities.broker.amq.io is forbidden: User "system:serviceaccount:cluster-tests:amq-broker-controller-manager" cannot list resource "activemqartemissecurities" in API group "broker.amq.io" in the namespace "cluster-testsa"
E0103 10:04:54.425221       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:cluster-tests:amq-broker-controller-manager" cannot list resource "pods" in API group "" in the namespace "cluster-testsa"
W0103 10:04:54.425296       1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1beta1.ActiveMQArtemisScaledown: activemqartemisscaledowns.broker.amq.io is forbidden: User "system:serviceaccount:cluster-tests:amq-broker-controller-manager" cannot list resource "activemqartemisscaledowns" in API group "broker.amq.io" in the namespace "cluster-testsa"

https://github.com/openshift/operator-framework-olm/pull/664

Bug OCPBUGS-35059: libvirt-installer image is permafailing in CI

View the Description View the linked PRs

Description of problem:

Because of the Centos Stream 8 EOL, the libvirt-installer image is failing to build in CI.

[1] https://blog.centos.org/2023/04/end-dates-are-coming-for-centos-stream-8-and-centos-linux-7/

Version-Release number of selected component (if applicable):

4.15

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

                [5/5] STEP 6/16: RUN yum update -y &&     yum install --setopt=tsflags=nodocs -y     genisoimage     gettext     google-cloud-sdk-365.0.1     libvirt-client     libvirt-libs     nss_wrapper     openssh-clients &&     yum clean all && rm -rf /var/cache/yum/*
                CentOS Stream 8 - AppStream                     187  B/s |  38  B     00:00    
                Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist
                error: build error: building at STEP "RUN yum update -y &&     yum install --setopt=tsflags=nodocs -y     genisoimage     gettext     google-cloud-sdk-365.0.1     libvirt-client     libvirt-libs     nss_wrapper     openssh-clients &&     yum clean all && rm -rf /var/cache/yum/*": while running runtime: exit status 1

Expected results:

Additional info:

For now only 4.15 is affected but I expect older releases will be hit when centos 7 goes EOL at 30 Jun 2024.

https://github.com/openshift/installer/pull/8550

Bug OCPBUGS-22471: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-controller-manager/pull/280

Bug OCPBUGS-23495: "duplicate port definition" warning message in 4.15 UWM prometheus-operator

View the Description View the linked PRs

Description of problem:

4.15.0-0.nightly-2023-10-06-123200, Prometheus Operator version is 0.68.0, there is "duplicate port definition" warning message in 4.15 prometheus-operator

$ oc logs deployment/prometheus-operator -n openshift-monitoring | grep "duplicate port definition with" -C2
level=info ts=2023-10-08T01:44:40.586511278Z caller=operator.go:655 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2023-10-08T01:44:40.626492507Z caller=operator.go:655 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=warn ts=2023-10-08T01:44:40.628520232Z caller=klog.go:96 component=k8s_client_runtime func=Warning msg="spec.template.spec.containers[5].ports[0]: duplicate port definition with spec.template.spec.containers[2].ports[0]"
level=info ts=2023-10-08T01:44:40.63072762Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2023-10-08T01:44:40.91709494Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
--
level=info ts=2023-10-08T01:45:19.85277831Z caller=operator.go:655 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2023-10-08T01:45:24.014118091Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=warn ts=2023-10-08T01:45:24.256334754Z caller=klog.go:96 component=k8s_client_runtime func=Warning msg="spec.template.spec.containers[5].ports[0]: duplicate port definition with spec.template.spec.containers[2].ports[0]"
level=info ts=2023-10-08T01:45:24.259230552Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2023-10-08T01:45:24.50510448Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
--
level=info ts=2023-10-08T07:33:33.724893975Z caller=operator.go:1310 component=prometheusoperator key=openshift-monitoring/k8s statefulset=prometheus-k8s shard=0 msg="recreating StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden"
level=info ts=2023-10-08T07:33:35.232445429Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=warn ts=2023-10-08T07:33:35.442232343Z caller=klog.go:96 component=k8s_client_runtime func=Warning msg="spec.template.spec.containers[5].ports[0]: duplicate port definition with spec.template.spec.containers[2].ports[0]"
level=info ts=2023-10-08T07:33:35.445827197Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2023-10-08T07:33:35.708322936Z caller=operator.go:1189 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"

kube-rbac-proxy-thanos and thanos-sidecar container use the same 10902 port, no functional affect, the warning maybe expected, if so, we could close this bug

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[5].ports[0]}' | jq
{
  "containerPort": 10902,
  "name": "thanos-proxy",
  "protocol": "TCP"
}

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[2].ports[0]}' | jq
{
  "containerPort": 10902,
  "name": "http",
  "protocol": "TCP"
}

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[5].name}' 
kube-rbac-proxy-thanos

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[2].name}' 
thanos-sidecar

checked in 4.14, prometheus-operator versio is 0.67.1 no such issue

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-10-06-234925   True        False         3h33m   Cluster version is 4.14.0-0.nightly-2023-10-06-234925

$ oc logs deployment/prometheus-operator -n openshift-monitoring | grep "duplicate port definition with" -C2
no result

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[5].ports[0]}' | jq
{
  "containerPort": 10902,
  "name": "thanos-proxy",
  "protocol": "TCP"
}

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[2].ports[0]}' | jq
{
  "containerPort": 10902,
  "name": "http",
  "protocol": "TCP"
}

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[5].name}'
kube-rbac-proxy-thanos

$ oc -n openshift-monitoring get sts prometheus-k8s -ojsonpath='{.spec.template.spec.containers[2].name}'
thanos-sidecar

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2023-10-06-123200   True        False         7h1m    Cluster version is 4.15.0-0.nightly-2023-10-06-123200

How reproducible:

always in 4.15

Steps to Reproduce:

1. check prometheus-operator logs

Actual results:

"duplicate port definition" warning message in 4.15 prometheus-operator

Expected results:

Additional info:

we could close this bug, since it seems it's expected

https://github.com/openshift/cluster-monitoring-operator/pull/2164

Bug OCPBUGS-32114: AWS HyperShift clusters' nodes cannot join cluster with custom domain name in DHCP Option Set

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29391~~. The following is the description of the original issue:
—
Description of problem:

AWS HyperShift clusters' nodes cannot join cluster with custom domain name in DHCP Option Set

Version-Release number of selected component (if applicable):

Any

How reproducible:

100%

Steps to Reproduce:

1. Create a VPC for a HyperShift/ROSA HCP cluster in AWS
2. Replace the VPC's DHCP Option Set with another with a custom domain name (example.com or really any domain of your choice)
3. Attempt to install a HyperShift/ROSA HCP cluster with a nodepool

Actual results:

All EC2 instances will fail to become nodes. They will generate CSR's based on the default domain name - ec2.internal for us-east-1 or ${region}.compute.internal for other regions (e.g. us-east-2.compute.internal)

Expected results:

Either that they become nodes or that we document that custom domain names in DHCP Option Sets are not allowed with HyperShift at this time. There is currently no pressing need for this feature, though customers do use this in ROSA Classic/OCP successfully.

Additional info:

This is a known gap currently in cluster-api-provider-aws (CAPA) https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1691

Bug OCPBUGS-19648: Introduce a node-identity with a validating webhook

View the linked PRs

https://github.com/openshift/cluster-network-operator/pull/1983

Bug OCPBUGS-20178: Use a private IPv4 address range for the transit switch subnet in OVN IC

View the Description View the linked PRs

The IP range 168.254.0.0/16 that we chose as default for the transit switch is a public one. Let's use a private one instead, making sure it won't collide with address blocks already in use.

In the future we might want to make this configurable, but for now let's just make sure we pick an IP range that is not used elsewhere in openshift.

https://github.com/openshift/ovn-kubernetes/pull/1931

Bug OCPBUGS-25216: [azure] using marketplace image fails while retrieving the image

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7829

Bug OCPBUGS-28543: If OLMPlacement is set to management, disableAllDefaultSources doesn't get updated in the guest cluster after it is removed in the HostedCluster CR

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26940~~. The following is the description of the original issue:
—
Description of problem:

If OLMPlacement is set to management,  the cluster is up with disableAllDefaultSources set to true, remove it in the HostedCluster CR, in the guest cluster disableAllDefaultSources isn't removed and still set to true

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3480

Bug OCPBUGS-37551: edit nncp to update dns nameserver failed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37550~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-30955~~. The following is the description of the original issue:
—
Description of problem:

apply nncp to configure DNS, then edit nncp to update nameserver, but /etc/resolv.conf is not updated.

Version-Release number of selected component (if applicable):

OCP version: 4.16.0-0.nightly-2024-03-13-061822
knmstate operator version: kubernetes-nmstate-operator.4.16.0-202403111814

How reproducible:

always

Steps to Reproduce:

1. install knmstate operator
2. apply below nncp to configure dns on one of the node
---
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: dns-staticip-4
spec:
  nodeSelector:
    kubernetes.io/hostname: qiowang-031510-k4cjs-worker-0-rw4nt
  desiredState:
    dns-resolver:
      config:
        search:
        - example.org
        server:
        - 192.168.221.146
        - 8.8.9.9
    interfaces:
    - name: dummy44
      type: dummy
      state: up
      ipv4:
        address:
        - ip: 192.0.2.251
          prefix-length: 24
        dhcp: false
        enabled: true
        auto-dns: false
% oc apply -f dns-staticip-noroute.yaml 
nodenetworkconfigurationpolicy.nmstate.io/dns-staticip-4 created
% oc get nncp
NAME             STATUS      REASON
dns-staticip-4   Available   SuccessfullyConfigured
% oc get nnce
NAME                                                 STATUS      STATUS AGE   REASON
qiowang-031510-k4cjs-worker-0-rw4nt.dns-staticip-4   Available   5s           SuccessfullyConfigured


3. check dns on the node, dns configured correctly
sh-5.1# cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
search qiowang-031510.qe.devcluster.openshift.com example.org
nameserver 192.168.221.146
nameserver 192.168.221.146
nameserver 8.8.9.9
# nameserver 192.168.221.1
sh-5.1# 
sh-5.1# cat /var/run/NetworkManager/resolv.conf 
# Generated by NetworkManager
search example.org
nameserver 192.168.221.146
nameserver 8.8.9.9
nameserver 192.168.221.1
sh-5.1# 
sh-5.1# nmcli | grep 'DNS configuration' -A 10
DNS configuration:
	servers: 192.168.221.146 8.8.9.9
	domains: example.org
	interface: dummy44
... ...


4. edit nncp, update nameserver, save the modification
---
spec:
  desiredState:
    dns-resolver:
      config:
        search:
        - example.org
        server:
        - 192.168.221.146
        - 8.8.8.8       <---- update from 8.8.9.9 to 8.8.8.8
    interfaces:
    - ipv4:
        address:
        - ip: 192.0.2.251
          prefix-length: 24
        auto-dns: false
        dhcp: false
        enabled: true
      name: dummy44
      state: up
      type: dummy
  nodeSelector:
    kubernetes.io/hostname: qiowang-031510-k4cjs-worker-0-rw4nt
% oc edit nncp dns-staticip-4
nodenetworkconfigurationpolicy.nmstate.io/dns-staticip-4 edited
% oc get nncp
NAME             STATUS      REASON
dns-staticip-4   Available   SuccessfullyConfigured
% oc get nnce
NAME                                                 STATUS      STATUS AGE   REASON
qiowang-031510-k4cjs-worker-0-rw4nt.dns-staticip-4   Available   8s           SuccessfullyConfigured


5. check dns on the node again

Actual results:

the dns nameserver in file /etc/resolv.conf is not updated after nncp updated, file /var/run/NetworkManager/resolv.conf updated correctly: 

sh-5.1# cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
search qiowang-031510.qe.devcluster.openshift.com example.org
nameserver 192.168.221.146
nameserver 192.168.221.146
nameserver 8.8.9.9        <---- it is not updated
# nameserver 192.168.221.1
sh-5.1# 
sh-5.1# cat /var/run/NetworkManager/resolv.conf 
# Generated by NetworkManager
search example.org
nameserver 192.168.221.146
nameserver 8.8.8.8        <---- updated correctly
nameserver 192.168.221.1
sh-5.1# 
sh-5.1# nmcli | grep 'DNS configuration' -A 10
DNS configuration:
	servers: 192.168.221.146 8.8.8.8
	domains: example.org
	interface: dummy44
... ...

Expected results:

the dns nameserver in file /etc/resolv.conf can be updated accordingly

Additional info:

https://github.com/openshift/machine-config-operator/pull/4499

Bug OCPBUGS-20210: Invalid egressIP object caused ovnkube-node pods CLBO

View the Description View the linked PRs

Description of problem:

Invalid egressIP object caused ovnkube-node pods CLBO

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-05-195247

How reproducible:

Always

Steps to Reproduce:

1. Label one node as egress node
2. Created an egressIP object, with empty label key and value
oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2023-10-07T09:08:28Z"
    generation: 2
    name: egressip-test
    resourceVersion: "122021"
    uid: 23445450-37d5-4ec3-b8fe-d8352a19e703
  spec:
    egressIPs:
    - 10.0.70.100
    namespaceSelector:
      matchLabels:
        "": ""
    podSelector:
      matchLabels:
        "": ""
  status:
    items:
    - egressIP: 10.0.70.100
      node: ip-10-0-70-135
kind: List
metadata:
  resourceVersion: ""

3. Created namespace and test pods

Actual results:

Test pods was stuck in ContainerCreating status  
% oc get pods -n hrw
NAME            READY   STATUS              RESTARTS   AGE
test-rc-hwmns   0/1     ContainerCreating   0          45s
test-rc-p9kl8   0/1     ContainerCreating   0          45s
 % oc describe pod test-rc-hwmns   -n hrw
Name:             test-rc-hwmns
Namespace:        hrw
Priority:         0
Service Account:  default
Node:             ip-10-0-70-125/10.0.70.125
Start Time:       Sat, 07 Oct 2023 17:08:50 +0800
Labels:           name=test-pods
Annotations:      k8s.ovn.org/pod-networks:
                    {"default":{"ip_addresses":["10.129.2.11/23"],"mac_address":"0a:58:0a:81:02:0b","gateway_ips":["10.129.2.1"],"routes":[{"dest":"10.128.0.0...
                  openshift.io/scc: restricted-v2
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicationController/test-rc
Containers:
  test-pod:
    Container ID:   
    Image:          quay.io/openshifttest/hello-sdn@sha256:c89445416459e7adea9a5a416b3365ed3d74f2491beb904d61dc8d1eb89a72a4
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  340Mi
    Requests:
      memory:     340Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7vlz8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-7vlz8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               59s   default-scheduler  Successfully assigned hrw/test-rc-hwmns to ip-10-0-70-125
  Warning  FailedCreatePodSandBox  59s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_test-rc-hwmns_hrw_d72a4216-b94b-4034-a9f7-526758055994_0(1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca): error adding pod hrw_test-rc-hwmns to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca Netns:/var/run/netns/131f3670-1a49-4088-9002-5624a3acc6d3 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=hrw;K8S_POD_NAME=test-rc-hwmns;K8S_POD_INFRA_CONTAINER_ID=1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca;K8S_POD_UID=d72a4216-b94b-4034-a9f7-526758055994 Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 104 111 115 116 114 111 111 116 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca" Netns:"/var/run/netns/131f3670-1a49-4088-9002-5624a3acc6d3" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=hrw;K8S_POD_NAME=test-rc-hwmns;K8S_POD_INFRA_CONTAINER_ID=1ad74472b9e985cee4a3081f5912b3d4553351d14764d3bfece1d174146f90ca;K8S_POD_UID=d72a4216-b94b-4034-a9f7-526758055994" Path:"" ERRORED: error configuring pod [hrw/test-rc-hwmns] networking: [hrw/test-rc-hwmns/d72a4216-b94b-4034-a9f7-526758055994:ovn-kubernetes]: error adding container to network "ovn-kubernetes": failed to send CNI request: Post "http://dummy/": dial unix /var/run/ovn-kubernetes/cni//ovn-cni-server.sock: connect: connection refused
'
 
% oc get pods -n openshift-ovn-kubernetes                         
NAME                                     READY   STATUS             RESTARTS        AGE
ovnkube-control-plane-85f96b444b-2bdwf   2/2     Running            0               5h27m
ovnkube-control-plane-85f96b444b-2mhfj   2/2     Running            0               5h27m
ovnkube-control-plane-85f96b444b-ddjhx   2/2     Running            0               5h27m
ovnkube-node-5fkb5                       7/8     CrashLoopBackOff   6 (2m52s ago)   13m
ovnkube-node-p7qvr                       7/8     CrashLoopBackOff   6 (2m56s ago)   13m
ovnkube-node-tzhlb                       7/8     CrashLoopBackOff   6 (2m51s ago)   13m
ovnkube-node-x5849                       7/8     CrashLoopBackOff   6 (2m57s ago)   13m
ovnkube-node-xscbr                       7/8     CrashLoopBackOff   6 (2m35s ago)   13m

    exec /usr/bin/ovnkube --init-ovnkube-controller "${K8S_NODE}" --init-node "${K8S_NODE}" \
        --config-file=/run/ovnkube-config/ovnkube.conf \
        --ovn-empty-lb-events \
        --loglevel "${OVN_KUBE_LOG_LEVEL}" \
        --inactivity-probe="${OVN_CONTROLLER_INACTIVITY_PROBE}" \
        ${gateway_mode_flags} \
        ${node_mgmt_port_netdev_flags} \
        --metrics-bind-address "127.0.0.1:29103" \
        --ovn-metrics-bind-address "127.0.0.1:29105" \
        --metrics-enable-pprof \
        --metrics-enable-config-duration \
        --export-ovs-metrics \
        --disable-snat-multiple-gws \
        ${export_network_flows_flags} \
        ${multi_network_enabled_flag} \
        ${multi_network_policy_enabled_flag} \
        ${admin_network_policy_enabled_flag} \
        --enable-multicast \
        --zone ${K8S_NODE} \
        --enable-interconnect \
        --acl-logging-rate-limit "20" \
        ${gw_interface_flag} \
        --enable-multi-external-gateway=true \
        ${ip_forwarding_flag} \
        ${NETWORK_NODE_IDENTITY_ENABLE}
      
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   vn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func1.1({0xc0007cb368, 0x11})
                 /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:531 +0x2c7
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).DoWithLock(0xc000d4eb40, {0xc0007cb368, 0x11}, 0xc000e43dd0)
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:137 +0xce
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func1({0x22eede0, 0xc000c6fec0})
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:504 +0x265
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:243
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnAdd({0xc00111bdc0?, {0x26d0aa0?, 0xc001580570?}}, {0x22eede0, 0xc000c6fec0}, 0xa0?)
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:306 +0x6e
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*Handler).OnAdd(...)
  /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/factory/handler.go:52
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.newQueuedInformer.func1.1(0xc000e43da0?)

      Exit Code:    2
      Started:      Sat, 07 Oct 2023 17:14:38 +0800
      Finished:     Sat, 07 Oct 2023 17:14:39 +0800
    Ready:          False
    Restart Count:  6
    Requests:
      cpu:      10m
      memory:   600Mi

Expected results:

Add some checking point about labels ? Give the warning that the key should not be empty and not able to apply?

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1942

Bug OCPBUGS-30208: [release-4.15] excessive Back-off restarting failed container console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29479~~. The following is the description of the original issue:
—
Component Readiness has found a potential regression in [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.15
Start Time: 2024-02-08T00:00:00Z
End Time: 2024-02-14T23:59:59Z
Success Rate: 91.30%
Successes: 63
Failures: 6
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 100.00%
Successes: 735
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=ovn%20upgrade-micro%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=azure&platform=azure&sampleEndTime=2024-02-14%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-02-08%2000%3A00%3A00&testId=openshift-tests-upgrade%3A37f1600d4f8d75c47fc5f575025068d2&testName=%5Bsig-cluster-lifecycle%5D%20pathological%20event%20should%20not%20see%20excessive%20Back-off%20restarting%20failed%20containers&upgrade=upgrade-micro&upgrade=upgrade-micro&variant=standard&variant=standard

Note: When you look at the link above you will notice some of the failures mention the bare metal operator. That's being investigated as part of https://issues.redhat.com/browse/OCPBUGS-27760. There have been 3 cases in the last week where the console was in a fail loop. Here's an example:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1757637415561859072

We need help understanding why this is happening and what needs to be done to avoid it.

https://github.com/openshift/console/pull/13807

Story SDN-4047: Move bug dispatch tools from dougsland to network-tools repo

View the Description View the linked PRs

We heavily rely on scripts located in
https://github.com/dougsland/bz-query
in order to assign Jiras to members of the SDN team.

as a person in charge of knowing the bug load on each of our developers to decide
who is the best person to own un-assigned Jiras, we should have the scripts in a more fomal location.

https://github.com/openshift/network-tools/pull/88

Bug OCPBUGS-31953: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/apiserver-network-proxy/pull/51

Bug OCPBUGS-35935: Missing management cluster capabilities check on ovnkube-sbdb route removal

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35874~~. The following is the description of the original issue:
—
Description of problem:

The ovnkube-sbdb route removal is missing a management cluster capabilities check and thus fails on a Kubernetes based management cluster.

Version-Release number of selected component (if applicable):

4.15.z, 4.16.0, 4.17.0

How reproducible:

Always

Steps to Reproduce:

Deploy an OpenShift version 4.16.0-rc.6 cluster control plane using HyperShift on a Kubernetes based management cluster.

Actual results:

Cluster control plane deployment fails because the cluster-network-operator pod is stuck in Init state due to the following error:

{"level":"error","ts":"2024-06-19T20:51:37Z","msg":"Reconciler error","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","HostedControlPlane":{"name":"cppjslm10715curja3qg","namespace":"master-cppjslm10715curja3qg"},"namespace":"master-cppjslm10715curja3qg","name":"cppjslm10715curja3qg","reconcileID":"037842e8-82ea-4f6e-bf28-deb63abc9f22","error":"failed to update control plane: failed to reconcile cluster network operator: failed to clean up ovnkube-sbdb route: error getting *v1.Route: no matches for kind \"Route\" in version \"route.openshift.io/v1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}

Expected results:

Cluster control plane deployment succeeds.

Additional info:

https://ibm-argonauts.slack.com/archives/C01C8502FMM/p1718832205747529

https://github.com/openshift/hypershift/pull/4265

Bug OCPBUGS-21744: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-config-operator/pull/366

Bug OCPBUGS-25240: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/180

Bug OCPBUGS-29303: No Functionality Exists To Revoke Break-Glass Signer Certificates

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29088~~. The following is the description of the original issue:
—
Description of problem:

    Customer has no method to revoke break-glass signer certificate for HCP.

Version-Release number of selected component (if applicable):

    4.16.0

How reproducible:

    always

Steps to Reproduce:

    1. not possible

Actual results:

    nothing

Expected results:

    expected a path to do this

Additional info:

In order to use the new flow introduced to fix this, create a CertificateRevocationRequest in the namespace of a HostedControlPlane as described in the test:

create a private key, certificate and certificate signing request using e.g. openssl
create admin credentials for the cluster by using a CertificateSigningRequest and {{CertificateSigningRequestApproval }}{{
}}
revoke the break-glass signing certificate using the CertificateRevocationRequest
wait for the CRR status to show that it succeeded
ensure that the credentials created in step 2 are no longer valid

https://github.com/openshift/hypershift/pull/3532

Bug OCPBUGS-28591: The third link title doesn't show up on feedback modal

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25843~~. The following is the description of the original issue:
—
Description of problem:

On customer feedback modal, there are 3 links for user to feedback to Red Hat, the third link lacks a title.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-21-155123

How reproducible:

Always

Steps to Reproduce:

    1.Login admin console. Click on "?"->"Share Feedback", check the links on the modal
    2.
    3.

Actual results:

1. The third link lacks a link title (the link for "Learn about opportunities to ……").

Expected results:

1. There is link title "Inform the direction of Red Hat" in 4.14, it should also exists for 4.15.

Additional info:

screenshot for 4.14 page: https://drive.google.com/file/d/19AnPlE0h9WwvIjxV0gLuf5x27jLN7TLS/view?usp=drive_link
screenshot for 4.15 page: https://drive.google.com/file/d/19MRjzNGRWfYnK-zcoMozh7Z7eaDDG2L-/view?usp=drive_link

https://github.com/openshift/console/pull/13551

Bug OCPBUGS-24340: gather extra job_metrics.json is empty

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2178

Bug OCPBUGS-24428: Ensure Passwords are Redacted in Agent Log Files

View the Description View the linked PRs

Description of problem:

I've noticed that 'agent-cluster-install.yaml' and 'journal.export' from the agent gather process contain passwords. It's important not to expose password information in any of these generated files.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

    1. Generate an agent ISO by utilising agent-config and install-config, including platform credentials
    2. Boot the ISO that was created
    3. Run the agent-gather command on the node 0 machine to generate files.

Actual results:

The 'agent-cluster-install.yaml' and 'journal.export' are containing the passwords information.

Expected results:

Password should be redacted.

Additional info:

https://github.com/openshift/assisted-service/pull/5905

Bug OCPBUGS-29731: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4199

Bug OCPBUGS-31467: az.EnsureHostInPool panic when Azure VM instance not found

View the Description View the linked PRs

Description of problem:

    on Azure, when kube-controller-manager verify whether a machine exists or not, if the machine was already deleted, the code may panic with sigsegv

I0320 12:02:55.806321       1 azure_backoff.go:91] GetVirtualMachineWithRetry(worker-e32ads-westeurope2-f72dr): backoff success
I0320 12:02:56.028287       1 azure_wrap.go:201] Virtual machine "worker-e16as-westeurope1-hpz2t" is under deleting
I0320 12:02:56.028328       1 azure_standard.go:752] GetPrimaryInterface(worker-e16as-westeurope1-hpz2t, ) abort backoff
E0320 12:02:56.028334       1 azure_standard.go:825] error: az.EnsureHostInPool(worker-e16as-westeurope1-hpz2t), az.VMSet.GetPrimaryInterface.Get(worker-e16as-westeurope1-hpz2t, ), err=instance not found
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x60 pc=0x33d21f6]goroutine 240642 [running]:
k8s.io/legacy-cloud-providers/azure.(*availabilitySet).EnsureHostInPool(0xc000016580, 0xc0262fb400, {0xc02d8a5080, 0x32}, {0xc021c1bc70, 0xc4}, {0x0, 0x0}, 0xa8?)
        vendor/k8s.io/legacy-cloud-providers/azure/azure_standard.go:831 +0x4f6
k8s.io/legacy-cloud-providers/azure.(*availabilitySet).EnsureHostsInPool.func2()
        vendor/k8s.io/legacy-cloud-providers/azure/azure_standard.go:928 +0x5f
k8s.io/apimachinery/pkg/util/errors.AggregateGoroutines.func1(0xc0159d0788?)

Version-Release number of selected component (if applicable):

    4.12.48

(ships https://github.com/openshift/kubernetes/commit/6df21776c7879727ab53895df8a03e53fb725d74)
issue introduced by https://github.com/kubernetes/kubernetes/pull/111428/files#diff-0414c3aba906b2c0cdb2f09da32bd45c6bf1df71cbb2fc55950743c99a4a5fe4

How reproducible:

    was unable to reproduce, happens occasionally

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    panic

Expected results:

    no panic

Additional info:

    internal case 03772590

https://github.com/openshift/kubernetes/pull/1964

Bug OCPBUGS-40930: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes-autoscaler/pull/315

Bug MGMT-16001: Agent stuck in unbinding-pending-user-action after scaling down an ipv6 hypershift cluster

View the Description View the linked PRs

Description of the problem:

Installed an ipv6 disconnected agent-based hosted cluster and added 3 workers to it using the boot-it-yourself flow. When scaling down the nodepool to 2 replicas, the agent that should be unbound is stuck in unbinding-pending-user-action state:

    state: unbinding-pending-user-action
    stateInfo: Host is waiting to be unbound from the cluster

How reproducible:

100%

Steps to reproduce:

1.

2.

3.

Actual results:

Agent stuck in unbinding-pending-user-action state

Expected results:

Agent reaches known-unbound state

Bug OCPBUGS-19254: Update 4.15 openshift-enterprise-keepalived-ipfailover image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/152

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/152

Bug OCPBUGS-21642: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/132

Bug OCPBUGS-24610: Update 4.15 ose-ovn-kubernetes-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1963

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1963

Bug OCPBUGS-31948: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/437

Bug OCPBUGS-33622: Triage CVE-2023-45288 - openshift/ibm-powervs-block-csi-driver: bump x/net [openshift-4.15]

View the Description View the linked PRs

Description of problem:

openshift/ibm-powervs-block-csi-driver: bump x/net to mitigate the denial of service attacks over HTTP/2 protocol

Version-Release number of selected component (if applicable):

v4.15.0

This security tracking issue was filed based on manifesting data available to Product Security in https://deptopia.prodsec.redhat.com/ui/home. This data indicates that the component noted in the "pscomponent" label was found to be affected by this vulnerability. If you believe this issue is not actionable and was created erroneously, please fill out the following form and close this issue as Closed with a resolution of Obsolete. This will prompt Product Security to review what type of error caused this Jira issue to be created, and prevent further mistakes of this type in the future.

https://forms.gle/LnXaf5aCAHaV6g8T8

To better understand the distinction between a component being Affected vs Not Affected, please read the following article:
https://docs.engineering.redhat.com/pages/viewpage.action?spaceKey=PRODSEC&title=Understanding+Affected+and+Not+Affected

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/80

Bug OCPBUGS-8777: oauth-server with a single non-login identity provider creates a fail loop with console

View the Description View the linked PRs

Description of problem:
When configured with a single identity provider that's not capable of login authentication flows, the oauth-server returns error when accessed from the browser. When the oauth-server is accessed from the web console, this error causes redirect loop between the oauth-server and the console.

Version-Release number of selected component (if applicable):
4.5

How reproducible:
100%

Steps to Reproduce:
1. configure request header IdP with some bogus ChallengeURL and no LoginURL
2. disable the kubeadmin user by deleting the kube-system/kubeadmin secret
3. wait for the changes to be applied to the oauth-server's deployment
4. go to the console's URL

Actual results:
The console tries to access a resource, gets "unauthorized" error, redirects user to the oauth-server, the oauth-server errors out because it does not allow browser login, redirects user to console, and the loop repeats infinitely.

Expected results:
The oauth-server presents the user with a login page that won't allow them to log in OR the server errors out with a clear error that tells the console not to try to loop back to it again.

https://github.com/openshift/console/pull/13102

Bug OCPBUGS-21630: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/thanos/pull/123

Bug OCPBUGS-23110: [CI-Watcher] Disable Pipelines E2E Tests

View the Description View the linked PRs

Description of problem:

Pipeline E2E tests have been disabled as the CI is failing.

The probable guess is that our clusters says that we're 4.15 now and that the operator couldn't be found because its only compatible with 4.x-4.14.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13318

Bug OCPBUGS-24267: Cluster configuration fields are not visible

View the Description View the linked PRs

Cluster configuration page fields are not visible.

Screenshot : https://drive.google.com/file/d/17TrZNE2dY-AH-vUwcsjvC4E8wxiyPb9n/view?usp=drive_link

https://github.com/openshift/console/pull/13386

Bug OCPBUGS-30917: PAC: Repositories list page breaks with a TypeError

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30052~~. The following is the description of the original issue:
—
Description of problem:

    Repositories list page breaks with a TypeError 
cannot read properties of undefined (reading `pipelinesascode.tekton.dev/repository`)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://drive.google.com/file/d/1TpH_PTyBxNX0b9SPZ2yS8b-q-tbvp6Ok/view?usp=sharing

https://github.com/openshift/console/pull/13672

Bug TRT-1368: 4.15 Nightly Payloads Failing on GCP Credentials Quota

View the Description View the linked PRs

Hit seemingly every job in the last payload:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade/1728713834463498240

Credentials request shows:

"conditions": [                                                                                                
      {                                                                                                            
        "lastProbeTime": "2023-11-26T11:20:40Z",                                                                   
        "lastTransitionTime": "2023-11-26T11:20:40Z",
        "message": "failed to grant creds: error syncing creds in mint-mode: error creating custom role: rpc error: code = ResourceExhausted desc = Maximum number of roles reached. Maximum is: 300\nerror details: retry in 24h0m1s",
        "reason": "CredentialsProvisionFailure",
        "status": "True",                             
        "type": "CredentialsProvisionFailure"         
      }                                         
    ],

We've heard a new gcp account is live, but we're not sure if these are landing in it or not. Perhaps they are and a limit needs to be bumped?

Additional info

This issue shows up as a Cluster Version Operator component readiness regression due to failing the following tests:

[sig-cluster-lifecycle] Cluster completes upgrade
[sig-arch][Feature:ClusterUpgrade] Cluster should be upgradeable after finishing upgrade [Late][Suite:upgrade]
[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]

https://github.com/openshift/cluster-image-registry-operator/pull/965

Bug OCPBUGS-19257: Update 4.15 operator-lifecycle-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/564

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/564

Bug OCPBUGS-23062: Volume metrics test never passes

View the Description View the linked PRs

The following test is permafailing (see below for sippy link)

[sig-storage] [Serial] Volume metrics PVC should create metrics for total time taken in volume operations in P/V Controller [Suite:openshift/conformance/serial] [Suite:k8s]

Example failure

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-aws-sdn-serial/1722165323320266752

The test doesn't seem to always run in serial jobs, but whenever it does run, it fails. And it's often the only test that fails in the run. This only started a few days ago, around the 4th.

Additional context here:

https://sippy.dptools.openshift.org/sippy-ng/tests/4.15/analysis?test=%5Bsig-storage%5D%20%5BSerial%5D%20Volume%20metrics%20PVC%20should%20create%20metrics%20for%20total%20time%20taken%20in%20volume%20operations%20in%20P%2FV%20Controller%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%20%5BSuite%3Ak8s%5D&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5Bsig-storage%5D%20%5BSerial%5D%20Volume%20metrics%20PVC%20should%20create%20metrics%20for%20total%20time%20taken%20in%20volume%20operations%20in%20P%2FV%20Controller%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%20%5BSuite%3Ak8s%5D%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D

https://github.com/openshift/csi-external-provisioner/pull/78

Bug OCPBUGS-25938: Update downstream OWNERS to include Surya

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25810~~. The following is the description of the original issue:
—
No QA required, updating approvers across releases

https://github.com/openshift/ovn-kubernetes/pull/2005

Bug OCPBUGS-35028: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/1081

Bug OCPBUGS-36325: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-image-registry-operator)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the cluster-image-registry-operator.

Namespaces with workloads that need pinning:

openshift-image-registry

See the 4.16 PR for more information on what needs to be fixed.

https://github.com/openshift/cluster-image-registry-operator/pull/1067

Task OSASINFRA-3284: Add Events permission to OpenStack's CCM cluster role

View the Description View the linked PRs

Required after https://github.com/kubernetes/cloud-provider-openstack/pull/2383

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/292

Task OSASINFRA-3295: Openshift create install-config command broken due to wrong client used to list flavors

View the linked PRs

https://github.com/openshift/installer/pull/7723

Bug OCPBUGS-19289: Update 4.15 ose-ovn-kubernetes image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1884

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1884

Bug OCPBUGS-20362: Delete results results.tekton.dev annotations on rerun of PipelineRuns

View the Description View the linked PRs

Description of problem:

Creating a pipelinerun with previous annotations leads to the result not being created. But records are updated with new taskruns.

https://github.com/tektoncd/results/issues/556

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Install TektonResults on the cluster
2. Create a Pipeline and start the Pipeline
3. Rerun the PipelineRun
3. Check the records endpoint. eg: https://tekton-results-api-service-openshift-pipelines.apps.viraj-11-10-2023.devcluster.openshift.com/apis/results.tekton.dev/v1alpha2/parents/viraj/results/-/records
the new PipelineRun is not get saved.

Actual results:

New PipelineRun get created after the rerun is not get saved in the records

Expected results:

All PipelineRun should be saved in the records

Additional info:

Document to install TektonResults on the cluster https://gist.github.com/vikram-raj/257d672a38eb2159b0368eaed8f8970a

https://github.com/openshift/console/pull/13230

Bug OCPBUGS-23796: not possible to drain a master node after multiple master nodes experience network disruption

View the Description View the linked PRs

Description of problem:

- upgrade the cluster
- 2 or more kube-apiserver pod do not become online. Network access could be lost due to misconfiguration or wrong rhel update. We can simulate this with:
    ssh into a node
    run iptables -A INPUT -p tcp --destination-port 6443 -j DROP
- 2 or more kube-apiserver-guard pods lose readiness
- kube-apiserver-guard-pdb PDB blocks the node drain because status.currentHealthy is less than status.desiredHealthy
- it is not possible to drain the node without overriding eviction requests (forcefully deleting the guard pods)`

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

in a description

Actual results:

evicting pod openshift-kube-apiserver/kube-apiserver-guard-ip-10-0-19-181.eu-north-1.compute.internal
    error when evicting pods/"kube-apiserver-guard-ip-10-0-19-181.eu-north-1.compute.internal" -n "openshift-kube-apiserver" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

Expected results:

it is possible to evict the unready pods

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1579

Bug OCPBUGS-25399: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-olm-operator/pull/40

Bug OCPBUGS-43268: CNO must report status while deploying IPsec

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32979~~. The following is the description of the original issue:
—
Description of problem:

CNO doesn't set operConfig status while deploying ipsec machine configs and ipsec daemonset, it must set progressing condition with true. When one of the components fail to deploy then it must report degraded condition set with true.

For more details, see the discussion here:
https://github.com/openshift/release/pull/50740#issuecomment-2076698580

https://github.com/openshift/cluster-network-operator/pull/2530

Bug OCPBUGS-24105: Update 4.15 prometheus-config-reloader-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/259

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/259

Bug OCPBUGS-24244: Update 4.15 ose-csi-external-snapshotter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/117

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25238: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver-operator/pull/90

Bug OCPBUGS-27221: [release-4.15] Install failure for console operator

View the Description View the linked PRs

Description of problem:

Reviewing 4.15 Install failures there are a number of variants impacted by recent install failures.

search.ci: Cluster operator console is not available

Jobs like periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-sdn-serial show failures that appear to start with 4.15.0-0.nightly-2023-12-07-225558 have installation failures due to console-operator

ConsoleOperator reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again

4.15.0-0.nightly-2023-12-07-225558 contains console-operator/pull/814, noting in case it is related

Version-Release number of selected component (if applicable):

 4.15

How reproducible:

Steps to Reproduce:

    1. Review link to install failures above
    2.
    3.

Actual results:

Expected results:

Additional info:
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-sdn
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade

https://github.com/openshift/console-operator/pull/837

Bug OCPBUGS-29752: day-0 with PerformanceProfile manifest renderer uses invalid uid

View the Description View the linked PRs

Description of problem:

Picked up 4.14-ec-4 (which uses cgroups v1 as default) and trying to create a cluster with following PerformanceProfile (and corresponding mcp) by placing them in the manifests folder,

 
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: clusterbotpp
spec:
  cpu:
    isolated: "1-3"
    reserved: "0"
  realTimeKernel:
    enabled: false
  nodeSelector:
    node-role.kubernetes.io/worker: ""
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/worker: ""

and,

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker 
spec:
  machineConfigSelector:
    matchLabels:
      machineconfiguration.openshift.io/role: worker
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: ""

The cluster often fails to install because bootkube spends a lot of time chasing this error,

 
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Created "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Failed to update status for the "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 11f98d74-af1b-4a4c-9692-6dce56ee5cd9, UID in object meta:
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: [#1717] failed to create some manifests:
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: "clusterbotpp_kubeletconfig.yaml": failed to update status for kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 11f98d74-af1b-4a4c-9692-6dce56ee5cd9, UID in object meta:
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Created "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Failed to update status for the "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 597dfcf3-012d-4730-912a-78efabb920ba, UID in object meta:

This leads to worker nodes not getting ready in time, which leads to installer marking the cluster installation failed. Ironically, even after the cluster installer returns with failure, if you wait long enough (sometimes) I have observed the cluster eventually reconciles and the worker nodes get provisioned.

I am attaching the installation logs from one such run with this issue.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Often

Steps to Reproduce:

1. Try to install new cluster by placing PeformanceProfile in the manifests folder
2.
3.

Actual results:

Cluster installation failed.

Expected results:

Cluster installation should succeed.

Additional info:

Also, I didn't observe this occurring in 4.13.9.

https://github.com/openshift/cluster-node-tuning-operator/pull/963

Bug OCPBUGS-19110: Update 4.15 azure-file-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver/pull/34

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-file-csi-driver/pull/34

Bug OCPBUGS-21775: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-admission-controller/pull/70

Bug OCPBUGS-25510: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-azure/pull/100

Bug OCPBUGS-35815: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4243

Task MGMT-15762: In a full cluster (3 masters, 3 workers), ODF validation fails if masters have a small disk

View the Description View the linked PRs

Assisted environment: SaaS (console.redhat.com)
Interface: REST API **
OCP version:
Configuration:
3 masters, 3 workers
3 masters having a small extra disk (2GB) for etcd
3 workers having an extra disk 100GB

Validations failing checking the small disk of the masters for ODF, increasing the disk for etcd, solves the issue.The validation code: https://github.com/openshift/assisted-service/blob/7e715004c9a4c77e056bd91fe698f7f68232418f/internal/operators/odf/validations.go#L162The code should check only the workers when is not a compact clusters

https://github.com/openshift/assisted-service/pull/5529

Bug OCPBUGS-19136: Update 4.15 ose-cluster-openshift-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/304

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-21915: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/193

Bug OCPBUGS-35732: vsphere-problem-detector - checkDataStoreWithURL fails both in newly installed and freshly upgraded 4.14 clusters

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35446~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35215~~. The following is the description of the original issue:
—
Description of problem:

We're seeing [0] in two customers environments, while one of the two confirmed this issue is replicated both in the context of a freshly installed 4.14.26 cluster, as well as an upgraded cluster.
Looking at [1] and the changes since 4.13 in the vsphere-problem-detector, I see we introduced some additional vSphere permissions checks in the checkDataStoreWithURL() [2][3] function: it was initially suspected that it was due to [4], but this was backported to 4.14.26, where the customer confirms the issue persists.

[0]

$ omc -n openshift-cluster-storage-operator logs vsphere-problem-detector-operator-78cbc7fdbb-2g9mx | grep -i -e datastore.go -e E0508
2024-05-08T07:44:05.842165300Z I0508 07:44:05.839356       1 datastore.go:329] checking datastore ds:///vmfs/volumes/vsan:526390016b19d2b5-21ae3fd76fa61150/ for permissions
2024-05-08T07:44:05.842165300Z I0508 07:44:05.839504       1 datastore.go:125] CheckStorageClasses: thin-csi: storage policy openshift-storage-policy-tc01-rpdd7: unable to find datastore with URL ds:///vmfs/volumes/vsan:526390016b19d2b5-21ae3fd76fa61150/
2024-05-08T07:44:05.842165300Z I0508 07:44:05.839522       1 datastore.go:142] CheckStorageClasses checked 7 storage classes, 1 problems found
2024-05-08T07:44:05.848251057Z E0508 07:44:05.848212       1 operator.go:204] failed to run checks: StorageClass thin-csi: storage policy openshift-storage-policy-tc01-rpdd7: unable to find datastore with URL ds:///vmfs/volumes/vsan:526390016b19d2b5-21ae3fd76fa61150/
[...]

[1] https://github.com/openshift/vsphere-problem-detector/compare/release-4.13...release-4.14
[2] https://github.com/openshift/vsphere-problem-detector/blame/release-4.14/pkg/check/datastore.go#L328-L344
[3] https://github.com/openshift/vsphere-problem-detector/pull/119
[4] https://issues.redhat.com/browse/OCPBUGS-28879

https://github.com/openshift/vsphere-problem-detector/pull/161

Story WINC-692: Add `oc debug` functionality for Windows nodes

View the Description View the linked PRs

Description

Windows host process containers are in alpha, as of Kubernetes 1.22. With this new feature, it should be possible to add `oc debug` functionality for Windows nodes. This would help us as developers, and has the potential to be useful for debugging customer issues as well.

Acceptance Criteria

oc debug is usable with a specified debug image when ran against Windows nodes. For example a user can run `oc debug no/e2e-wm-fsxc8 --image=mcr.microsoft.com/powershell:lts-nanoserver-ltsc2022` against a Windows Server 2022 node, and will have a debug container running on the node.
This functionality is documented

https://github.com/openshift/oc/pull/1524

Bug OCPBUGS-24111: Update 4.15 ose-alibaba-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-alibaba-cloud/pull/39

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/39

Bug OCPBUGS-28882: Add SNO to HighOverallControlPlaneCPU alert description

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27842~~. The following is the description of the original issue:
—
Current description of HighOverallControlPlaneCPU is wrong for SNO cases and can mislead users. We need to add information regarding SNO clusters to the description of the alert

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1636

Bug OCPBUGS-30219: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/712

Bug OCPBUGS-34797: [4.15.z] SCC pinning for all workloads in platform namespaces

View the Description View the linked PRs

Backport of AUTH-482

https://github.com/openshift/service-ca-operator/pull/243

Task MON-3503: Synchronize versions of the downstream components

View the linked PRs

Bug OCPBUGS-19126: Update 4.15 ose-cluster-dns-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/380

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/380

Bug OCPBUGS-33118: HCP: recycler pods are not starting on hostedcontrolplane in disconnected environments ( ImagePullBackOff on quay.io/openshift/origin-tools:latest )

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31398~~. The following is the description of the original issue:
—
Description of problem:

Recycler pods are not starting on hostedcontrolplane in disconnected environments ( ImagePullBackOff on quay.io/openshift/origin-tools:latest ).

The root cause is that the recycler-pod template (stored in the recycler-config ConfigMap) on hostedclusters is always pointing to `quay.io/openshift/origin-tools:latest` .

The same configMap for the management cluster is correctly pointing to an image which is part of the release payload:
$ oc get cm -n openshift-kube-controller-manager recycler-config -o json | jq -r '.data["recycler-pod.yaml"]' | grep "image"
      image: "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e458f24c40d41c2c802f7396a61658a5effee823f274be103ac22c717c157308"

but on hosted clusters we have:
$ oc get cm -n clusters-guest414a recycler-config -o json | jq -r '.data["recycler-pod.yaml"]' | grep "image" 
    image: quay.io/openshift/origin-tools:latest

This is likely due to:
https://github.com/openshift/hypershift/blob/e1b75598a62a06534fab6385d60d0f9a808ccc52/control-plane-operator/controllers/hostedcontrolplane/kcm/config.go#L80

quay.io/openshift/origin-tools:latest is not part of any mirrored release payload and it's referenced by tag so it will not be available on disconnected environments.

Version-Release number of selected component (if applicable):

    v4.14, v4.15, v4.16

How reproducible:

    100%

Steps to Reproduce:

    1. create an hosted cluster
    2. check the content of the recycler-config configmap in an hostedcontrolplane namespace
    3.

Actual results:

image field for the recycler-pod template is always pointing to `quay.io/openshift/origin-tools:latest` which is not part of the release payload

Expected results:

image field for the recycler-pod template is pointing to the right image (which one???) as extracted from the release payload

Additional info:

see: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/64b4c1ba/bindata/assets/kube-controller-manager/recycler-cm.yaml#L21
to compare with cluster-kube-controller-manager-operator on OCP

https://github.com/openshift/hypershift/pull/3963

Bug OCPBUGS-42011: Openstack UPI - Reintroduce unique resource names

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36855~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33973~~. The following is the description of the original issue:
—
Description of problem:

The network resource provisioning playbook for 4.15 dualstack UPI contains a task for adding an IPv6 subnet to the existing external router [1].
This task fails with:
- ansible-2.9.27-1.el8ae.noarch & ansible-collections-openstack-1.8.0-2.20220513065417.5bb8312.el8ost.noarch in OSP 16 env (RHEL 8.5) or
- openstack-ansible-core-2.14.2-4.1.el9ost.x86_64 & ansible-collections-openstack-1.9.1-17.1.20230621074746.0e9a6f2.el9ost.noarch in OSP 17 env (RHEL 9.2)

Besides that we need to have a way for identifying resources for particular deployment, as it may interfere with existing one.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-22-160236

How reproducible:

Always

Steps to Reproduce:

1. Set the os_subnet6 in the inventory file for setting dualstack
2. Run the 4.15 network.yaml playbook

Actual results:

Playbook fails:
TASK [Add IPv6 subnet to the external router] ********************************** fatal: [localhost]: FAILED! => {"changed": false, "extra_data": {"data": null, "details": "Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}.", "response": "{\"NeutronError\": {\"type\": \"HTTPBadRequest\", \"message\": \"Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}.\", \"detail\": \"\"}}"}, "msg": "Error updating router 8352c9c0-dc39-46ed-94ed-c038f6987cad: Client Error for url: https://10.46.43.81:13696/v2.0/routers/8352c9c0-dc39-46ed-94ed-c038f6987cad, Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}."}

Expected results:

Successful playbook execution

Additional info:

The router can be created in two different tasks, the playbook [2] worked for me.

[1] https://github.com/openshift/installer/blob/1349161e2bb8606574696bf1e3bc20ae054e60f8/upi/openstack/network.yaml#L43
[2] https://file.rdu.redhat.com/juriarte/upi/network.yaml

https://github.com/openshift/installer/pull/9047

Bug OCPBUGS-19743: Console couldn't be started anymore with local bridge (error: Invalid URL)

View the Description View the linked PRs

After https://github.com/openshift/console/pull/13102 got merged, it isn't possible to start the local console bridge anymore.

The UI crashes with this error:

Uncaught TypeError: Failed to construct 'URL': Invalid URL
    at ./public/module/auth.js (main-c115e44b78283c32bc69.js:81514:7)
    at __webpack_require__ (runtime~main-bundle.js:90:30)

The loginErrorURL is a string that couldn't get parsed with new URL:

window.SERVER_FLAGS.loginErrorURL
'/auth/error'

new URL(window.SERVER_FLAGS.loginErrorURL)
VM55:1 Uncaught TypeError: Failed to construct 'URL': Invalid URL

https://github.com/openshift/console/pull/13192

Bug OCPBUGS-30870: [release-4.15] update translations: Completed OCP-4.15/Master Branch/Sprint 245

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13641

Bug OCPBUGS-22107: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/61

Bug OCPBUGS-27177: [release-4.15] Wrong disk size filled in and cannot be changed when cloning a pvc in the UI

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26772~~. The following is the description of the original issue:
—
Description of problem:

When cloning a PVC of 60GiB size, the system autofills the remote size to be 8192 PeB. This size cannot be changed in the UI before starting the clone.

Version-Release number of selected component (if applicable):

CNV - 4.14.3

How reproducible:

always

Steps to Reproduce:

1.Create a VM with a PVC of 60Gib
2.Power off the VM
3.As a cluster admin, clone the 60GiB PVC (Storage -> PersistentVolumeClaims -> Kebab menu next to pvc

Actual results:

The system tries to clone the 60 GiB PVC as a 8192 PeB

Expected results:

A new pvc of the 60 GiB

Additional info:

This seems like the closed BZ 2177979.I will upload a screenshot of the UI.
Here is the yaml for the original pvc.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
cdi.kubevirt.io/storage.bind.immediate.requested: "true"
cdi.kubevirt.io/storage.contentType: kubevirt
cdi.kubevirt.io/storage.pod.phase: Succeeded
cdi.kubevirt.io/storage.populator.progress: 100.0%
cdi.kubevirt.io/storage.preallocation.requested: "false"
cdi.kubevirt.io/storage.usePopulator: "true"
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
creationTimestamp: "2023-12-05T17:34:19Z"
finalizers:kubernetes.io/pvc-protectionprovisioner.storage.kubernetes.io/cloning-protection
labels:
app: containerized-data-importer
app.kubernetes.io/component: storage
app.kubernetes.io/managed-by: cdi-controller
app.kubernetes.io/part-of: hyperconverged-cluster
app.kubernetes.io/version: 4.14.0
kubevirt.io/created-by: 60f46f91-2db3-4118-aaba-b1697b29c496
name: win2k19-base
namespace: base-images
ownerReferences:apiVersion: cdi.kubevirt.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: DataVolume
name: win2k19-base
uid: 8980e7b7-ce0b-47b4-a7e4-f4c79e984ebe
resourceVersion: "697047"
uid: fccb0aa9-8541-4b51-b49e-ddceaa22b68c
spec:
accessModes:ReadWriteMany
dataSource:
apiGroup: cdi.kubevirt.io
kind: VolumeImportSource
name: volume-import-source-8980e7b7-ce0b-47b4-a7e4-f4c79e984ebe
dataSourceRef:
apiGroup: cdi.kubevirt.io
kind: VolumeImportSource
name: volume-import-source-8980e7b7-ce0b-47b4-a7e4-f4c79e984ebe
resources:
requests:
storage: "64424509440"
storageClassName: ocs-storagecluster-ceph-rbd
volumeMode: Block
volumeName: pvc-dbfc9fe9-5677-469d-9402-c2f3a22dab3f
status:
accessModes:ReadWriteMany
capacity:
storage: 60Gi
phase: Bound



Here is the yaml for the cloning pvc.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
creationTimestamp: "2023-12-06T14:24:07Z"
finalizers:kubernetes.io/pvc-protection
name: win2k19-base-clone
namespace: base-images
resourceVersion: "1551054"
uid: f72665c3-6408-4129-82a2-e663d8ecc0cc
spec:
accessModes:ReadWriteMany
dataSource:
apiGroup: ""
kind: PersistentVolumeClaim
name: win2k19-base
dataSourceRef:
apiGroup: ""
kind: PersistentVolumeClaim
name: win2k19-base
resources:
requests:
storage: "9223372036854775807"
storageClassName: ocs-storagecluster-ceph-rbd
volumeMode: Block
status:
phase: Pending

https://github.com/openshift/console/pull/13508

Bug OCPBUGS-45097: [4.15] pin libreswan package to 4.6-3.el9_0.3 in ovnk

View the Description View the linked PRs

Description of problem:

This is part of the plan to improve stability of ipsec in ocp releases.

There are several regressions identified in libreswan-4.9 (default in 4.14.z and 4.15.z) which needs to be addressed in an incremental approach. The first step is to introduce libreswan-4.6-3.el9_0.3 which is the oldest major version(4.6) that can still be released in rhel9. It includes a libreswan crash fix and some CVE backports that are present in libreswan-4.9 but not in libreswan-4.5 (so that it can pass the internal CVE scanner check).

This pinning of libreswan-4.6-3.el9_0.3 is only needed for 4.14.z since containerized ipsec is used in 4.14. Starting 4.15, ipsec is moved to host and this CNO PR (about to merge as of writing) will allow ovnk to use host ipsec execs which only requires libreswan pkg update in rhcos extension.

https://github.com/openshift/ovn-kubernetes/pull/2374

Bug OCPBUGS-18854: Update 4.15 prom-label-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prom-label-proxy/pull/357

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prom-label-proxy/pull/357

Bug OCPBUGS-19346: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3021

Bug OCPBUGS-21836: When accessing API URL, jwks_uri endpoint returned is not correct.

View the Description View the linked PRs

Description of problem:

When accessing the URL https://api.test.lab.domain.com:6443/.well-known/openid-configuration
an jwks_uri endpoint containing an api-int URL is returned.
We expect that this endpoint would be on api instead of api-int.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

100%

Steps to Reproduce:

1. From web browser access https://api.test.lab.domain.com:6443/.well-known/openid-configuration
2. From CLI try curl -kvv https://api.test.lab.domain.com:6443/.well-known/openid-configuration
3. The output is as below. The jwks_uri returned is pointing to api-int but I think it should be api
~~~~~
{"issuer":"https://kubernetes.default.svc","jwks_uri":"https://api-int.test.lab.domain.com:6443/openid/v1/jwks","response_types_supported":["id_token"],"subject_types_supported":["public"],"id_token_signing_alg_values_supported":["RS256"]} 
~~~~~

Actual results:

"jwks_uri":"https://api-int.test.lab.domain.com:6443/openid/v1/jwks

Expected results:

"jwks_uri":"https://api.test.lab.domain.com:6443/openid/v1/jwks

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1578

Bug OCPBUGS-42139: [IBMCloud] fail to find "Tested instance types for IBMCloud"

View the Description View the linked PRs

This is a clone of issue OCPBUGS-42138. The following is the description of the original issue:
—
Description of problem:

in the doc installing_ibm_cloud_public/installing-ibm-cloud-customizations.html have not the tested instance type list

Version-Release number of selected component (if applicable):

4.15

How reproducible:

   Always

Steps to Reproduce:

    1.https://docs.openshift.com/container-platform/4.15/installing/installing_ibm_cloud_public/installing-ibm-cloud-customizations.html

    have not list the tested vm

Actual results:

  have not list the tested type

Expected results:

   list the tested instance type as https://docs.openshift.com/container-platform/4.15/installing/installing_azure/installing-azure-customizations.html#installation-azure-tested-machine-types_installing-azure-customizations

Additional info:

https://github.com/openshift/installer/pull/9032

Bug OCPBUGS-36312: "alertmanager-trusted-ca-bundle configmap not injected in alertmanager-user-workload pods

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34530~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33645~~. The following is the description of the original issue:
—
Description of problem:

After enabling separate alertmanager instance for user-defined alert routing, the alertmanager-user-workload pods are initialized but the configmap alertmanager-trusted-ca-bundle is not injected in the pods.
[-] https://docs.openshift.com/container-platform/4.15/observability/monitoring/enabling-alert-routing-for-user-defined-projects.html#enabling-a-separate-alertmanager-instance-for-user-defined-alert-routing_enabling-alert-routing-for-user-defined-projects

Version-Release number of selected component (if applicable):

RHOCP 4.13, 4.14 and 4.15

How reproducible:

100%

Steps to Reproduce:

1. Enable user-workload monitoring using[a]
2. Enable separate alertmanager instance for user-defined alert routing using [b]
3. Check if alertmanager-trusted-ca-bundle configmap is injected in alertmanager-user-workload pods which are running in openshift-user-workload-monitoring project.
$ oc describe pod alertmanager-user-workload-0 -n openshift-user-workload-monitoring | grep alertmanager-trusted-ca-bundle

[a] https://docs.openshift.com/container-platform/4.15/observability/monitoring/enabling-monitoring-for-user-defined-projects.html#enabling-monitoring-for-user-defined-projects_enabling-monitoring-for-user-defined-projects

[b] https://docs.openshift.com/container-platform/4.15/observability/monitoring/enabling-alert-routing-for-user-defined-projects.html#enabling-a-separate-alertmanager-instance-for-user-defined-alert-routing_enabling-alert-routing-for-user-defined-projects

Actual results:

alertmanager-user-workload pods are NOT injected with alertmanager-trusted-ca-bundle configmap.

Expected results:

alertmanager-user-workload pods should be injected with alertmanager-trusted-ca-bundle configmap.

Additional info:

Similar configmap is injected fine in alertmanager-main pods which are running in openshift-monitoring project.

https://github.com/openshift/cluster-monitoring-operator/pull/2400

Bug OCPBUGS-45003: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/784

Bug OCPBUGS-20070: CPMSO: Unsupported GCP e2-custom-* instance type in E2E test framework

View the Description View the linked PRs

Description of problem:

GCP e2-custom-* instance type is not supported by our E2E test framework.
Now that testplatform have started using those instance types, we are seeing permafailing E2E job runs on our CPMS E2E periodic tests.

Error sample:

• [FAILED] [285.539 seconds]475ControlPlaneMachineSet Operator With an active ControlPlaneMachineSet and the instance type is changed [BeforeEach] should perform a rolling update [Periodic]476  [BeforeEach] /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/test/e2e/periodic_test.go:39477  [It] /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/test/e2e/periodic_test.go:43478479  [FAILED] provider spec should be updated with bigger instance size480  Expected success, but got an error:481      <*fmt.wrapError | 0xc000358380>: 482      failed to get next instance size: instance type did not match expected format: e2-custom-6-16384483      {484          msg: "failed to get next instance size: instance type did not match expected format: e2-custom-6-16384",485          err: <*fmt.wrapError | 0xc000358360>{486              msg: "instance type did not match expected format: e2-custom-6-16384",487              err: <*errors.errorString | 0xc0001489f0>{488                  s: "instance type did not match expected format",489              },490          },491      }

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Use e2-custom in GCP in a cluster, run CPMSO E2E periodics
2.
3.

Actual results:

Permafailing E2Es

Expected results:

Successful E2Es

Additional info:

Bug OCPBUGS-28928: [Backport 4.15] Router fails to start/reload with SHA1 cert due to OpenSSL 3.0 in RHEL9

View the Description View the linked PRs

Backport for 4.15 - Manually Cloned from https://issues.redhat.com/browse/OCPBUGS-26498

Description of problem:

   Due to RHEL9 incorporating OpenSSL 3.0, HaProxy will refuse to start if provided with a cert using SHA1-based signature algorithm. RHEL9 is being introduced in 4.16. This means customers updating from 4.15 to 4.16 with a SHA1 cert will find their router in a failure state.


My Notes from experimenting with various ways of using a cert in ingress:
- Routes with SHA1 spec.tls.certificate WILL prevent HaProxy from reloading/starting
- It is NOT limited to FIPs, I broke a non-FIPs cluster with this
- Routes with SHA1 spec.tls.caCertificate will NOT prevent HaProxy starting, but route is rejected, due to extended route validation failure:
    - lastTransitionTime: "2024-01-04T20:18:01Z"
      message: 'spec.tls.certificate: Invalid value: "redacted certificate data":
        error verifying certificate: x509: certificate signed by unknown authority
        (possibly because of "x509: cannot verify signature: insecure algorithm SHA1-RSA
        (temporarily override with GODEBUG=x509sha1=1)" while trying to verify candidate
        authority certificate "www.exampleca.com")'

- Routes with SHA1 spec.tls.destinationCACertificate will NOT prevent HaProxy from starting. It actually seems to work as expected
- IngressController with SHA1 spec.defaultCertificate WILL prevent HaProxy from starting.
- IngressController with SHA1 spec.clientTLS.clientCA will NOT prevent HaProxy from starting.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

100%

Steps to Reproduce:

    1. Create a Ingress Controller with spec.defaultCertificate or a Route with spec.tls.certificate as a SHA1 cert
    2. Roll out the router

Actual results:

    Router fails to start

Expected results:

    Router should start

Additional info:

    We've previously documented via story in RHEL9 epic: https://issues.redhat.com/browse/NE-1449

Bug OCPBUGS-32214: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/k8s-prometheus-adapter/pull/102

Bug OCPBUGS-32299: Bump to kubernetes 1.28.9

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.9:

Changelog:
v1.28.9: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1288

https://github.com/openshift/kubernetes/pull/1946

Bug OCPBUGS-24280: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13387

Bug OCPBUGS-30892: Misformatted node labels causing origin-tests to panic

View the Description View the linked PRs

This is a clone of issue OCPBUGS-30604. The following is the description of the original issue:
—
Description of problem:

    Panic thrown by origin-tests

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. Create aws or rosa 4.15 cluster
    2. run origin tests
    3.

Actual results:

    time="2024-03-07T17:03:50Z" level=info msg="resulting interval message" message="{RegisteredNode  Node ip-10-0-8-83.ec2.internal event: Registered Node ip-10-0-8-83.ec2.internal in Controller map[reason:RegisteredNode roles:worker]}"
  E0307 17:03:50.319617      71 runtime.go:79] Observed a panic: runtime.boundsError{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23])
  goroutine 310 [running]:
  k8s.io/apimachinery/pkg/util/runtime.logPanic({0x84c6f20?, 0xc006fdc588})
  	k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:75 +0x99
  k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc008c38120?})
  	k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:49 +0x75
  panic({0x84c6f20, 0xc006fdc588})
  	runtime/panic.go:884 +0x213
  github.com/openshift/origin/pkg/monitortests/testframework/watchevents.nodeRoles(0x0?)
  	github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:251 +0x1e5
  github.com/openshift/origin/pkg/monitortests/testframework/watchevents.recordAddOrUpdateEvent({0x96bcc00, 0xc0076e3310}, {0x7f2a0e47a1b8, 0xc007732330}, {0x281d36d?, 0x0?}, {0x9710b50, 0xc000c5e000}, {0x9777af, 0xedd7be6b7, ...}, ...)
  	github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:116 +0x41b
  github.com/openshift/origin/pkg/monitortests/testframework/watchevents.startEventMonitoring.func2({0x8928f00?, 0xc00b528c80})
  	github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:65 +0x185
  k8s.io/client-go/tools/cache.(*FakeCustomStore).Add(0x8928f00?, {0x8928f00?, 0xc00b528c80?})
  	k8s.io/client-go@v0.29.0/tools/cache/fake_custom_store.go:35 +0x31
  k8s.io/client-go/tools/cache.watchHandler({0x0?, 0x0?, 0xe16d020?}, {0x9694a10, 0xc006b00180}, {0x96d2780, 0xc0078afe00}, {0x96f9e28?, 0x8928f00}, 0x0, ...)
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:756 +0x603
  k8s.io/client-go/tools/cache.(*Reflector).watch(0xc0005dcc40, {0x0?, 0x0?}, 0xc005cdeea0, 0xc005bf8c40?)
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:437 +0x53b
  k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0005dcc40, 0xc005cdeea0)
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:357 +0x453
  k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:291 +0x26
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?)
  	k8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:226 +0x3e
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc007974ec0?, {0x9683f80, 0xc0078afe50}, 0x1, 0xc005cdeea0)
  	k8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:227 +0xb6
  k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0005dcc40, 0xc005cdeea0)
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:290 +0x17d
  created by github.com/openshift/origin/pkg/monitortests/testframework/watchevents.startEventMonitoring
  	github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:83 +0x6a5
panic: runtime error: slice bounds out of range [24:23] [recovered]
	panic: runtime error: slice bounds out of range [24:23]

Expected results:

    execution of tests

Additional info:

https://github.com/openshift/origin/pull/28656

Bug OCPBUGS-18485: dev console, silence alert, alert state is changed from Silenced to Firing quickly

View the Description View the linked PRs

Description of problem:

developer console, go to "Observe -> openshift-moniotring -> Alerts", silence Watchdog alert, at the first, the alert state is Silenced in Alerts tab, but changed to Firing quickly(the alert is silenced actually), see the attached screen shoot

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-02-132842

How reproducible:

always

Steps to Reproduce:

1. silence alert in the dev console, and check alert state in Alerts tab
2.
3.

Actual results:

alert state is changed from Silenced to Firing quickly

Expected results:

state should be Silenced

https://github.com/openshift/console/pull/13151

Bug OCPBUGS-19037: agent-tui failure blocks ssh + console login

View the Description View the linked PRs

The agent-interactive-console service is required by both sshd and systemd-logind, so if it exits with an error code there is no way to connect or log in to the box to debug.

https://github.com/openshift/installer/pull/7490

Bug OCPBUGS-21734: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-18396: CI: MTU migraton failures in 4.14

View the Description View the linked PRs

CI is almost perma failing on mtu migration in 4.14 (both SDN and OVN-Kubernetes):

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-sdn-ipv4

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-ovn-ipv4

Looks like the common issue is waiting for MCO times out:

+ echo '[2023-08-31T03:58:16+00:00] Waiting for final Machine Controller Config...'
[2023-08-31T03:58:16+00:00] Waiting for final Machine Controller Config...
+ timeout 900s bash
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO 
...

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1979/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-sdn-ipv4/1697077984948654080/build-log.txt

https://github.com/openshift/cluster-network-operator/pull/2021

Bug OCPBUGS-33720: Undiagnosed panic detected in openshift-console-operator

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33505~~. The following is the description of the original issue:
—
Noticed in k8s 1.30 PR, here's the run where it happened:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_kubernetes/1953/pull-ci-openshift-kubernetes-master-e2e-aws-ovn-fips/1788800196772106240

E0510 05:58:26.315444       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 992 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x26915e0?, 0x471dff0})
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x26915e0?, 0x471dff0?})
	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/console-operator/pkg/console/controllers/healthcheck.(*HealthCheckController).CheckRouteHealth.func2()
	/go/src/github.com/openshift/console-operator/pkg/console/controllers/healthcheck/controller.go:156 +0x62
k8s.io/client-go/util/retry.OnError.func1()
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/client-go/util/retry/util.go:51 +0x30
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x2fdcde8?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:145 +0x3e
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0x2fdcde8?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:461 +0x5a
k8s.io/client-go/util/retry.OnError({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0x2667a00?, 0xc001b185d0?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/client-go/util/retry/util.go:50 +0xa5
github.com/openshift/console-operator/pkg/console/controllers/healthcheck.(*HealthCheckController).CheckRouteHealth(0xc001b097e8?, {0x2fdce90?, 0xc00057c870?}, 0x16?, 0x2faf140?)
	/go/src/github.com/openshift/console-operator/pkg/console/controllers/healthcheck/controller.go:152 +0x9a
github.com/openshift/console-operator/pkg/console/controllers/healthcheck.(*HealthCheckController).Sync(0xc000748ae0, {0x2fdce90, 0xc00057c870}, {0x7f84e80672b0?, 0x7f852f941108?})
	/go/src/github.com/openshift/console-operator/pkg/console/controllers/healthcheck/controller.go:143 +0x8eb
github.com/openshift/library-go/pkg/controller/factory.(*baseController).reconcile(0xc000b57950, {0x2fdce90, 0xc00057c870}, {0x2fd5350?, 0xc001b185a0?})
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:201 +0x43
github.com/openshift/library-go/pkg/controller/factory.(*baseController).processNextWorkItem(0xc000b57950, {0x2fdce90, 0xc00057c870})
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:260 +0x1b4
github.com/openshift/library-go/pkg/controller/factory.(*baseController).runWorker.func1({0x2fdce90, 0xc00057c870})
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:192 +0x89
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0x22
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0014b7b60?, {0x2faf040, 0xc001b18570}, 0x1, 0xc0014b7b60)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00057c870?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x2fdce90, 0xc00057c870}, 0xc00139c770, 0x0?, 0x0?, 0x0?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0x93
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(...)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:170
github.com/openshift/library-go/pkg/controller/factory.(*baseController).runWorker(0x0?, {0x2fdce90?, 0xc00057c870?})
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:183 +0x4d
github.com/openshift/library-go/pkg/controller/factory.(*baseController).Run.func2()
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:117 +0x65
created by github.com/openshift/library-go/pkg/controller/factory.(*baseController).Run in goroutine 749
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:112 +0x2ba

https://github.com/openshift/console-operator/pull/903

Bug OCPBUGS-42114: [4.15] opm creates FBCs which are incompatible with IIB catalogs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41540~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-39458~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37819. The following is the description of the original issue:
—
Description of problem:

    When we added new bundle metadata encoding as `olm.csv.metadata` in https://github.com/operator-framework/operator-registry/pull/1094 (downstreamed for 4.15+) we created situations where
- konflux onboarded operators, encouraged to use upstream:latest to generate FBC from templates; and
- IIB-generated catalog images which used earlier opm versions to serve content

could generate the new format but not be able to serve it. 

One only has to `opm render` an SQLite catalog image, or expand a catalog template.

Version-Release number of selected component (if applicable):

How reproducible:

every time

Steps to Reproduce:

    1. opm render an SQLite catalog image
    2.
    3.

Actual results:

    uses `olm.csv.metadata` in the output

Expected results:

    only using `olm.bundle.object` in the output

Additional info:

https://github.com/openshift/operator-framework-olm/pull/874

Bug MGMT-15680: Infraenv controller should reconcile pull secret changes

View the Description View the linked PRs

Description of the problem:

A user with an invalid pull secret cannot correct the issue without deleting the infraenv

How reproducible:

100%

Steps to reproduce:

1. Create a malformed pull secret (like this one)

kind: Secret
apiVersion: v1
metadata:
  name: pullsecret
data:
  '.dockerconfigjson': eyJhdXRocyI6eyJub3RoaW5nLmNvbSI6eyJhdXRoIjoiWTJsaGJ3PT09PSIsImVtYWlsIjoiZmFrZUBjaWFvLmNvbSJ9fX0=
type: 'kubernetes.io/dockerconfigjson'

2. Create an infraenv referencing this secret as the pull secret

3. Correct the pull secret

Actual results:

Infraenv still has error message about a malformed pull secret

Expected results:

Infraenv uses the updated pull secret

https://github.com/openshift/assisted-service/pull/5589

Bug OCPBUGS-23073: .spec.numberOfUsersToReport is not correctly applied in some circumstances

View the Description View the linked PRs

E1106 21:44:31.805740 18 apiaccess_count_controller.go:168] APIRequestCount.apiserver.openshift.io "nodes.v1" is invalid: [status.currentHour.byNode[0].byUser: Too many: 708: must have at most 500 items, status.last24h[21].byNode[0].byUser: Too many: 708: must have at most 500 items]

seen in a large-scale test; 750 nodes, 180,000 pods, 90,000 services, pods/services being created at 20 objects/second.

https://redhat-internal.slack.com/archives/CB48XQ4KZ/p1699307146216599

Luis Sanchez said "Just confirmed that under certain circumstances, the .spec.numberOfUsersToReport field is not being applied correctly. Open a bug please."

https://github.com/openshift/kubernetes/pull/1794

Bug OCPBUGS-27919: CCO reports manual instead of manualpodidentity mode in metrics for an Azure Workload Identity cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27446~~. The following is the description of the original issue:
—
Steps to Reproduce:

1. Install a cluster using Azure Workload Identity
2. Check the value of the cco_credentials_mode metric

Actual results:

mode = manual

Expected results:

mode = manualpodidentity

Additional info:

The cco_credentials_mode metric reports manualpodidentity mode for an AWS STS cluster.

https://github.com/openshift/cloud-credential-operator/pull/661

Bug OCPBUGS-29963: [release-4.15] Translations for "No VerticalPodAutoscalers" in ja/ko/zh langualges are not correct.

View the Description View the linked PRs

Description of problem:

After install operator "VerticalPodAutoscaler" from OperatorHub page. There is "VerticalPodAutoscalers" field with"No VerticalPodAutoscalers" value. The translation for "No VerticalPodAutoscalers" in ja/ko/zh files are not correct.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-02-22-193834

How reproducible:

Always

Steps to Reproduce:

    1.Install operator "VerticalPodAutoscaler" from OperatorHub page.
    2.Check on any one deployment details page.
    3.Check translations files for VerticalPodAutoscaler field.

Actual results:

    2. There is "VerticalPodAutoscalers" field with"No VerticalPodAutoscalers" value.
    3. In ja/ko/zh translation files, they are "No VerticalPodAutoscaler".
\# git checkout release-4.15
\# grep -nr "No VerticalPodAutoscaler" frontend/packages/*
frontend/packages/console-app/locales/en/console-app.json:577:  "No VerticalPodAutoscalers": "No VerticalPodAutoscalers",
frontend/packages/console-app/locales/ja/console-app.json:557:  "No VerticalPodAutoscaler": "No VerticalPodAutoscaler",
frontend/packages/console-app/locales/ko/console-app.json:557:  "No VerticalPodAutoscaler": "No VerticalPodAutoscaler",
frontend/packages/console-app/locales/zh/console-app.json:557:  "No VerticalPodAutoscaler": "没有 VerticalPodAutoscaler",
frontend/packages/console-app/src/components/vpa/VerticalPodAutoscalerRecommendations.tsx:66:          : t('console-app~No VerticalPodAutoscalers')}

Expected results:

    3. The translation in ja/ko/zh files should be the same with in master branch(currently 4.16).
\# git checkout master
Switched to branch 'master'
\# grep -nr "No VerticalPodAutoscaler" frontend/packages/*
frontend/packages/console-app/locales/en/console-app.json:585:  "No VerticalPodAutoscalers": "No VerticalPodAutoscalers",
frontend/packages/console-app/locales/ja/console-app.json:567:  "No VerticalPodAutoscalers": "VerticalPodAutoscalers がありません",
frontend/packages/console-app/locales/ko/console-app.json:567:  "No VerticalPodAutoscalers": "VerticalPodAutoscalers 없음",
frontend/packages/console-app/locales/zh/console-app.json:567:  "No VerticalPodAutoscalers": "没有 VerticalPodAutoscalers",
frontend/packages/console-app/src/components/vpa/VerticalPodAutoscalerRecommendations.tsx:66:          : t('console-app~No VerticalPodAutoscalers')}

Additional info:

Screenshot: https://drive.google.com/file/d/1I91oMV09CdBGabBpcm0TVSPH0NAkgdxl/view?usp=drive_link

https://github.com/openshift/console/pull/13670

Bug OCPBUGS-32389: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/474

Bug OCPBUGS-24151: Update 4.15 ose-machine-api-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1179

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1179

Bug OCPBUGS-32952: etcd-health-probe.log need to be deprecated on control plane node

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31249~~. The following is the description of the original issue:
—
Description of problem:

/var/log/etcd/etcd-health-probe.log exist on control plane node, but we only touch it in code:
https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/etcd/pod.yaml#L26

etcd's /var/log/etcd/etcd-health-probe.log be though audit log, because there are audit log in same directory tree for apiserver and auth:
/var/log/kube-apiserver/audit-2024-03-21T04-27-49.470.log
/var/log/oauth-server/audit.log

etcd-health-probe.log will bring some misunderstanding to user
    How reproducible:always


    Steps to Reproduce:
    1. login control plane node
    2. check /var/log/etcd/etcd-health-probe.log
    3. the file size is always zero

    Actual results:

    
    Expected results:remove this file in code/don't touch this file

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1257

Bug OCPBUGS-36319: ose-tests suite fails if cluster is running UWM

View the Description View the linked PRs

This is a clone of issue OCPBUGS-34403. The following is the description of the original issue:
—
Description of problem:

If a cluster is running with user-workload-monitoring enabled, running an ose-tests suite against the cluster will fail the data collection step.

This is because there is a query in the test framework that assumes that the number of prometheus instances that the thanos pods will connect to will match exactly the number of platform prometheus instances. However, it doesn't account for thanos also connecting to the user-workload-monitoring instances. As such, the test suite will always fail against a cluster that is healthy and running user-workload-monitoring in addition to the normal openshift-monitoring stack.

Version-Release number of selected component (if applicable):

4.15.13

How reproducible:

Consistent

Steps to Reproduce:

    1. Create an OpenShift cluster
    2. Enable workload monitoring
    3. Attempt to run an ose-tests suite. For example, the CNI conformance suite documented here: https://access.redhat.com/documentation/en-us/red_hat_software_certification/2024/html/red_hat_software_certification_workflow_guide/con_cni-certification_openshift-sw-cert-workflow-working-with-cloud-native-network-function#running-the-cni-tests_openshift-sw-cert-workflow-working-with-container-network-interface

Actual results:

The error message `#### at least one Prometheus sidecar isn't ready` will be displayed, and the metrics collection will fail

Expected results:

Metrics collection succeeds with no errors

Additional info:

https://github.com/openshift/origin/pull/28915

Bug OCPBUGS-13044: machine-config-operator does not honor ICSP when fetching machine-os-content

View the Description View the linked PRs

Description of problem:

During cluster installations/upgrades with an imageContentSourcePolicy in place but with access to quay.io, the ICSP is not honored to pull the machine-os-content image from a private registry.

Version-Release number of selected component (if applicable):

$ oc logs -n openshift-machine-config-operator ds/machine-config-daemon -c machine-config-daemon|head -1
Found 6 pods, using pod/machine-config-daemon-znknf
I0503 10:53:00.925942    2377 start.go:112] Version: v4.12.0-202304070941.p0.g87fedee.assembly.stream-dirty (87fedee690ae487f8ae044ac416000172c9576a5)

How reproducible:

100% in clusters with ICSP configured BUT with access to quay.io

Steps to Reproduce:

1. Create mirror repo:
$ cat <<EOF > /tmp/isc.yaml                                                    
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
archiveSize: 4
storageConfig:
  registry:
    imageURL: quay.example.com/mirror/oc-mirror-metadata
    skipTLS: true
mirror:
  platform:
    channels:
    - name: stable-4.12
      type: ocp
      minVersion: 4.12.13
    graph: true
EOF
$ oc mirror --dest-skip-tls  --config=/tmp/isc.yaml docker://quay.example.com/mirror/oc-mirror-metadata
<...>
info: Mirroring completed in 2m27.91s (138.6MB/s)
Writing image mapping to oc-mirror-workspace/results-1683104229/mapping.txt
Writing UpdateService manifests to oc-mirror-workspace/results-1683104229
Writing ICSP manifests to oc-mirror-workspace/results-1683104229

2. Confirm machine-os-content digest:
$ oc adm release info 4.12.13 -o jsonpath='{.references.spec.tags[?(@.name=="machine-os-content")].from}'|jq
{
  "kind": "DockerImage",
  "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a1660c8086ff85e569e10b3bc9db344e1e1f7530581d742ad98b670a81477b1b"
}
$ oc adm release info 4.12.14 -o jsonpath='{.references.spec.tags[?(@.name=="machine-os-content")].from}'|jq
{
  "kind": "DockerImage",
  "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ed68d04d720a83366626a11297a4f3c5761c0b44d02ef66fe4cbcc70a6854563"
}

3. Create 4.12.13 cluster with ICSP at install time:
$ grep imageContentSources -A6 ./install-config.yaml
imageContentSources:
  - mirrors:
    - quay.example.com/mirror/oc-mirror-metadata/openshift/release
    source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
  - mirrors:
    - quay.example.com/mirror/oc-mirror-metadata/openshift/release-images
    source: quay.io/openshift-release-dev/ocp-release

Actual results:

1. After the installation is completed, no pulls for a166 (4.12.13-x86_64-machine-os-content) are logged in the Quay usage logs whereas e.g. digest 22d2 (4.12.13-x86_64-machine-os-images) are reported to be pulled from the mirror. 

2. After upgrading to 4.12.14 no pulls for ed68 (4.12.14-x86_64-machine-os-content) are logged in the mirror-registry while the image was pulled as part of `oc image extract` in the machine-config-daemon:

[core@master-1 ~]$ sudo less /var/log/pods/openshift-machine-config-operator_machine-config-daemon-7fnjz_e2a3de54-1355-44f9-a516-2f89d6c6ab8f/machine-config-daemon/0.log                        2023-05-03T10:51:43.308996195+00:00 stderr F I0503 10:51:43.308932   11290 run.go:19] Running: nice -- ionice -c 3 oc image extract -v 10 --path /:/run/mco-extensions/os-extensions-content-4035545447 --registry- config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad48fe01f3e82584197797ce2151eecdfdcce67ae1096f06412e5ace416f66ce 2023-05-03T10:51:43.418211869+00:00 stderr F I0503 10:51:43.418008  184455 client_mirrored.go:174] Attempting to connect to quay.io/openshift-release-dev/ocp-v4.0-art-dev 2023-05-03T10:51:43.418211869+00:00 stderr F I0503 10:51:43.418174  184455 round_trippers.go:466] curl -v -XGET  -H "User-Agent: oc/4.12.0 (linux/amd64) kubernetes/31aa3e8" 'https://quay.io/v2/' 2023-05-03T10:51:43.419618513+00:00 stderr F I0503 10:51:43.419517  184455 round_trippers.go:495] HTTP Trace: DNS Lookup for quay.io resolved to [{34.206.15.82 } {54.209.210.231 } {52.5.187.29 } {52.3.168.193 }  {52.21.36.23 } {50.17.122.58 } {44.194.68.221 } {34.194.241.136 } {2600:1f18:483:cf01:ebba:a861:1150:e245 } {2600:1f18:483:cf02:40f9:477f:ea6b:8a2b } {2600:1f18:483:cf02:8601:2257:9919:cd9e } {2600:1f18:483:cf01 :8212:fcdc:2a2a:50a7 } {2600:1f18:483:cf00:915d:9d2f:fc1f:40a7 } {2600:1f18:483:cf02:7a8b:1901:f1cf:3ab3 } {2600:1f18:483:cf00:27e2:dfeb:a6c7:c4db } {2600:1f18:483:cf01:ca3f:d96e:196c:7867 }] 2023-05-03T10:51:43.429298245+00:00 stderr F I0503 10:51:43.429151  184455 round_trippers.go:510] HTTP Trace: Dial to tcp:34.206.15.82:443 succeed

Expected results:

All images are pulled from the location as configured in the ICSP.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3921

Bug OCPBUGS-19227: Update 4.15 ose-kube-storage-version-migrator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/199

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/199

Bug OCPBUGS-19286: Update 4.15 ose-installer-artifacts image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7496

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7496

Bug OCPBUGS-36090: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8655

Bug OCPBUGS-29651: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4192

Bug OCPBUGS-11344: Alertmanager service accounts auto mount token

View the Description View the linked PRs

Description of problem:

The ServiceAccounts for both in-cluster and UWM alertmanager set autoMountServiceAccountToken: true.
This should be improved and set at the pod level. Hence this will require a change in prometheus-operator and its configuration of Alertmanager pods.

A similar change for Prometheus pods was implemented in https://github.com/prometheus-operator/prometheus-operator/pull/4514.

https://github.com/openshift/cluster-monitoring-operator/blob/7702f6c7d6e1409dea9197e63dafcb0decbe60b9/assets/alertmanager-user-workload/service-account.yaml#L2

https://github.com/openshift/cluster-monitoring-operator/blob/7702f6c7d6e1409dea9197e63dafcb0decbe60b9/assets/alertmanager/service-account.yaml#L2

https://github.com/openshift/cluster-monitoring-operator/pull/2111

Bug OCPBUGS-19225: Update 4.15 csi-attacher image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-attacher/pull/57

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-attacher/pull/57

Bug OCPBUGS-42586: alert for metrics endpoint at port 9537 shows connection refused for windows nodes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36717~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-31250~~. The following is the description of the original issue:
—
Description of problem:

1. For the Linux nodes, the container runtime is CRI-O and the port 9537 has a crio process listening on it.While, windows nodes doesn't have CRIO container runtime.

2. Prometheus is trying to connect to /metrics endpoint on the windows nodes on port 9537 which actually does not have any process listening on it.

3. TargetDown is alerting crio job since it cannot reach the endpoint http://windows-node-ip:9537/metrics.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Install 4.13 cluster with windows operator
    2. In the Prometheus UI, go to > Status > Targets to know which targets are down.

Actual results:

    It gives the alert for targetDown

Expected results:

    It should not give any such alert.

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2487

Bug OCPBUGS-42648: Switch to use annotations as labels from PipelineRuns created through Pipelines as Code is deprecated

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37689~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-36619~~. The following is the description of the original issue:
—

Description of problem:

The labels added by PAC have been deprecated and added to PLR annotations. So, use annotations to get the value in the repository list page, repository PLRs list page, and on the PLR details page.

https://github.com/openshift/console/pull/14358

Bug OCPBUGS-33960: High Egress IP failover latency during scale testing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32161~~. The following is the description of the original issue:
—
Description of problem:

    We have created 24000 eips for 24000 pods (where each namespace has 1 EIP and 1 pod) on a 120 node baremetal environment and failed over the node which has 200 EIPs by blocking port  9107 using iptables and observed high pod connection latencies (varying between 41 sec to 221 msec) for which EIP failed over to other nodes.

pod	EIP Failover latency in sec
client-1-13103-78c6585bbb-jkr8h4	41.0 sec
client-1-2777-7d86cd47bf-djgnf	38.0 sec
client-1-2609-79cfd5ff55-7z446	23.2 sec
client-1-22868-7bf96cd49-fjrtj	16.0 sec
client-1-23491-56f499cc69-w5hbr	9.01 sec
client-1-11301-78b5bbc987-vrs8s	9.01 sec
client-1-6098-64b7d9d4f4-b62zm	2.00 sec
client-1-22599-5975f8bdc4-hgng2	2.00 sec
client-1-15570-86b979d584-j7cpb	221 msec

CPU usage and ovs flow metrics avaibale in grafana dashbaord https://grafana.rdu2.scalelab.redhat.com:3000/d/FwPsenbaa/kube-burner-report-eip?orgId=1&from=1712835501022&to=1712857101023&var-Datasource=AWS+Pro+-+ripsaw-kube-burner&var-workload=egressip&var-uuid=7f8a09af-8ed6-4027-bbc7-0583aa18db10&var-master=f20-h02-000-r640.rdu2.scalelab.redhat.com&var-worker=f20-h11-000-r640.rdu2.scalelab.redhat.com&var-infra=f36-h10-000-r640.rdu2.scalelab.redhat.com&var-namespace=All&var-latencyPercentile=P99

must-gahter http://storage.scalelab.redhat.com/anilvenkata/eip_failover_mg/must-gather.local.2880304935723177257.tgz

All the resources were already created before we issued node failover. Node on which port 9107 is blcoked also hosts 200 pods. This node also has 200 EIPs. We only issued iptables command to block port 9107

sudo iptables -A INPUT -p tcp --dport 9107 -j DROP

and we didn't delete any conntrack entries or ovs flows etc .. for failover simulation.

https://github.com/openshift/ovn-kubernetes/pull/2175

Bug OCPBUGS-18846: Update 4.15 golang-github-prometheus-alertmanager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-alertmanager/pull/75

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-19288: Update 4.15 ovn-kubernetes-microshift image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1883

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1883

Bug OCPBUGS-21593: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-22773: PowerVS: fix removeFromLoadBalancers

View the Description View the linked PRs

Description of problem:{code:none}

Deploying a cluster results in:

time="2023-10-30T19:10:59-04:00" level=debug msg="Apply complete! Resources: 0 added, 0 changed, 3 destroyed."
time="2023-10-30T19:10:59-04:00" level=fatal msg="error destroying bootstrap resources failed disabling bootstrap load balancing: %!w(<nil>)"

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Occasionally

Steps to Reproduce:

1. Deploy a PowerVS cluster in a zone with PER

Actual results:

Expected results:


It should deploy correctly

Additional info:

https://github.com/openshift/installer/pull/7653

Bug OCPBUGS-26060: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/64

Bug OCPBUGS-26516: Bump Helm version to 3.13 in ODC in release branch 4.15

View the Description View the linked PRs

Story (Required)
As an ODC helm backend developer I would like to be able to bump version of helm to 3.13 to stay synched up with the version we will ship with OCP 4.15

Background (Required)
Normal activity we do every time a new OCP version is release to stay current

Glossary
NA

Out of scope
NA

Approach(Required)
Bump version of helm to 3.13 run, build and unit test and make sure everything is working as expected. Last time we had a conflict with DevFile backend.

Dependencies
Might had dependencies with DevFile team to move some dependencies forward

https://github.com/openshift/console/pull/13465

Bug OCPBUGS-27286: Update GCP Credentials Request manifest for CNCC

View the Description View the linked PRs

https://issues.redhat.com//browse/SDN-4227

https://github.com/openshift/cluster-network-operator/pull/2190

Bug OCPBUGS-45207: Ability to sync OS time from NTP and update HW clock at the time of installation of OpenShift in ABI

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-45181~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-43486. The following is the description of the original issue:
—
Description of problem:

   Feature : https://issues.redhat.com/browse/MGMT-18411
when to assited installer v. 2.34.0 but apprently not including in any openshift version to be used in ABI installation.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Went thru a loop to very the different commits to check if this is delivered in any ocp version.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/9251

Bug OCPBUGS-16634: [OVN-Kubernetes] IP currently assigned to multiple pods

View the Description View the linked PRs

{  2023-07-19T16:52:37Z reason/ReusedPodIP podIP 10.128.0.39 is currently assigned to multiple pods: ns/e2e-replicaset-4951 pod/test-rs-ddhkn node/ip-10-0-151-233.us-west-1.compute.internal uid/117115dd-dc8f-4333-b972-ed880fcf8dd9;ns/openshift-apiserver pod/apiserver-5f7d4599b4-dvpdk node/ip-10-0-151-233.us-west-1.compute.internal uid/293cba9c-11ea-4258-9d38-4ff5b2cb52bd
2023-07-19T16:58:40Z reason/ReusedPodIP podIP 10.128.0.39 is currently assigned to multiple pods: ns/e2e-job-1076 pod/pod-disruption-failure-ignore-2-qlxp2 node/ip-10-0-151-233.us-west-1.compute.internal uid/3dda8eea-b221-433a-b254-fc7cf487189b;ns/openshift-apiserver pod/apiserver-5f7d4599b4-dvpdk node/ip-10-0-151-233.us-west-1.compute.internal uid/293cba9c-11ea-4258-9d38-4ff5b2cb52bd}

I0719 16:44:56.659916   49761 base_network_controller_pods.go:444] [default/openshift-apiserver/apiserver-5f7d4599b4-dvpdk] creating logical port openshift-apiserver_apiserver-5f7d4599b4-dvpdk for pod on switch ip-10-0-151-233.us-west-1.compute.internal

W0719 16:44:56.666407   49761 base_network_controller_pods.go:198] No cached port info for deleting pod default/openshift-kube-controller-manager/installer-7-ip-10-0-151-233.us-west-1.compute.internal. Using logical switch ip-10-0-151-233.us-west-1.compute.internal port uuid  and addrs [10.128.0.39/23]

I0719 16:44:56.680604   49761 base_network_controller_pods.go:234] Releasing IPs for Completed pod: openshift-kube-controller-manager/installer-7-ip-10-0-151-233.us-west-1.compute.internal, ips: 10.128.0.39

I0719 16:44:56.699279   49761 pods.go:134] Attempting to release IPs for pod: openshift-kube-controller-manager/installer-7-ip-10-0-151-233.us-west-1.compute.internal, ips: 10.128.0.39

I0719 16:44:56.790903   49761 client.go:783]  "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[\{Op:insert Table:Logical_Switch_Port Row:map[addresses:{GoSet:[0a:58:0a:80:00:27 10.128.0.39]} external_ids:\{GoMap:map[namespace:openshift-apiserver pod:true]} name:openshift-apiserver_apiserver-5f7d4599b4-dvpdk

Observed in
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-net[…]perator-master-e2e-aws-ovn-single-node/1681699276796727296

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_clus[…]netes_ovnkube-node-bsbt9_ovnkube-controller.log

https://github.com/openshift/ovn-kubernetes/pull/1942

Bug OCPBUGS-19117: Update 4.15 ose-olm-catalogd image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-catalogd/pull/27

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-catalogd/pull/28

Bug OCPBUGS-19452: DaemonSet fails to scale down during the rolling update when maxUnavailable=0

View the Description View the linked PRs

Description of problem:

The OpenShift DNS daemonset has the rolling update strategy. The "maxSurge" parameter is set to a non zero value which means that the "maxUnavailable" parameter is set to zero. When the user replaces the toleration in the daemonset's template spec (via the OpenShift DNS config API) from the one which helps to be scheduled on the master node into any other toleration: the new pods are still trying to be scheduled on the master nodes. The old pods from the tolerated nodes can be lucky enough to be recreated but only if they go before any pod from the intolerable node.

The new pods are not expected to be scheduled on the nodes which are not tolerated by the new damonset's template spec. The daemonset controller should just delete the old pods from the nodes which cannot be tolerated anymore. The old pods from the nodes which can still be tolerated should be recreated according to the rolling update parameters.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:
1. Create the daemonset which tolerates "node-role.kubernetes.io/master" taint and has the following rolling update parameters:

$ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.updateStrategy
rollingUpdate:
  maxSurge: 10%
  maxUnavailable: 0
type: RollingUpdate

$ oc  -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
- key: node-role.kubernetes.io/master
  operator: Exists

2. Let the daemonset to be scheduled on all the target nodes (e.g. all masters and all workers)

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-6bfmf     2/2     Running   0          119m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
dns-default-9cjdf     2/2     Running   0          2m35s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-c6j9x     2/2     Running   0          119m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
dns-default-fhqrs     2/2     Running   0          2m12s   10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-lx2nf     2/2     Running   0          119m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
dns-default-mmc78     2/2     Running   0          112m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

3. Update the daemonset's tolerations by removing "node-role.kubernetes.io/master" and adding any other toleration (not existing works too):

$ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
- key: test-taint
  operator: Exists

Actual results:

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-6bfmf     2/2     Running   0          124m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
dns-default-76vjz     0/2     Pending   0          3m2s    <none>        <none>                                     <none>           <none>
dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-c6j9x     2/2     Running   0          124m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-lx2nf     2/2     Running   0          124m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
dns-default-mmc78     2/2     Running   0          117m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

Expected results:

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-mmc78     2/2     Running   0          7m54s   10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

Additional info:
Upstream issue: https://github.com/kubernetes/kubernetes/issues/118823
Slack discussion: https://redhat-internal.slack.com/archives/CKJR6200N/p1687455135950439

https://github.com/openshift/kubernetes/pull/1716

Bug OCPBUGS-24191: [4.14] Load balancers are not created in ARO

View the Description View the linked PRs

After creating a 4.14 ARO cluster, some cluster operators are not available because load balancer can't be created.

It is because of the change of the default value of vmType in cloud-provider-azure.

https://github.com/kubernetes-sigs/cloud-provider-azure/pull/4214

In ARO, we use standard vmType and don't use any vmss as a cluster node, but installer doesn't specify vmType, which causes vmType mismatch and cloud-provider-azure can't configure load balancer.

https://github.com/openshift/installer/blob/release-4.14/pkg/asset/manifests/azure/cloudproviderconfig.go

We would like it to make vmType default `standard` or to have an option to change it via install config or something.

discussion thread: https://redhat-internal.slack.com/archives/C68TNFWA2/p1700814868246649

Reproducible steps:

Create an 4.14 ARO cluster.
Creating a normal cluster with standard vm in Azure might also reproduce the issue

What I got:

❯ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.1    False       True          True       21m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.atokubi.eastus.osadev.cloud/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)...
cloud-controller-manager                   4.14.1    True        False         False      24m
cloud-credential                           4.14.1    True        False         False      26m
cluster-autoscaler                         4.14.1    True        False         False      20m
config-operator                            4.14.1    True        False         False      21m
console                                    4.14.1    False       True          False      13m     DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.14.1    True        False         False      14m
csi-snapshot-controller                    4.14.1    True        False         False      20m
dns                                        4.14.1    True        False         False      20m
etcd                                       4.14.1    True        False         False      19m
image-registry                             4.14.1    True        False         False      8m11s
ingress                                              False       True          True       7m36s   The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0...
insights                                   4.14.1    True        False         False      14m
kube-apiserver                             4.14.1    True        True          False      10m     NodeInstallerProgressing: 1 nodes are at revision 5; 2 nodes are at revision 6
kube-controller-manager                    4.14.1    True        False         False      18m
kube-scheduler                             4.14.1    True        False         False      17m
kube-storage-version-migrator              4.14.1    True        False         False      21m
machine-api                                4.14.1    True        False         False      11m
machine-approver                           4.14.1    True        False         False      20m
machine-config                             4.14.1    True        False         False      15m
marketplace                                4.14.1    True        False         False      20m
monitoring                                 4.14.1    True        False         False      6m53s
network                                    4.14.1    True        False         False      22m
node-tuning                                4.14.1    True        False         False      20m
openshift-apiserver                        4.14.1    True        False         False      14m
openshift-controller-manager               4.14.1    True        False         False      20m
openshift-samples                          4.14.1    True        False         False      14m
operator-lifecycle-manager                 4.14.1    True        False         False      20m
operator-lifecycle-manager-catalog         4.14.1    True        False         False      20m
operator-lifecycle-manager-packageserver   4.14.1    True        False         False      14m
service-ca                                 4.14.1    True        False         False      21m
storage                                    4.14.1    True        False         False      20m

❯ oc get svc -A | grep LoadBalancer
openshift-ingress                                  router-default                             LoadBalancer   172.30.43.24     <pending>                              80:32538/TCP,443:31115/TCP                38m

❯ oc get cm cloud-provider-config -n openshift-config -oyaml
apiVersion: v1
data:
  config: '{"cloud":"AzurePublicCloud","tenantId":"<reducted>","aadClientId":"","aadClientSecret":"","aadClientCertPath":"","aadClientCertPassword":"","useManagedIdentityExtension":false,"userAssignedIdentityID":"","subscriptionId":"<reducted>","resourceGroup":"aro-atokubi","location":"eastus","vnetName":"dev-vnet","vnetResourceGroup":"v4-eastus","subnetName":"atokubi-worker","securityGroupName":"atokubi-vnkt5-nsg","routeTableName":"atokubi-vnkt5-node-routetable","primaryAvailabilitySetName":"","vmType":"","primaryScaleSetName":"","cloudProviderBackoff":true,"cloudProviderBackoffRetries":0,"cloudProviderBackoffExponent":0,"cloudProviderBackoffDuration":6,"cloudProviderBackoffJitter":0,"cloudProviderRateLimit":false,"cloudProviderRateLimitQPS":0,"cloudProviderRateLimitBucket":0,"cloudProviderRateLimitQPSWrite":0,"cloudProviderRateLimitBucketWrite":0,"useInstanceMetadata":true,"loadBalancerSku":"standard","excludeMasterFromStandardLB":false,"disableOutboundSNAT":true,"maximumLoadBalancerRuleCount":0}'
kind: ConfigMap
metadata:
  creationTimestamp: "2023-11-29T10:08:19Z"
  name: cloud-provider-config
  namespace: openshift-config
  resourceVersion: "33363"
  uid: 8b35cf3f-65ee-428d-92e6-304165301e96

❯ oc logs azure-cloud-controller-manager-fbdfbdb86-hk646 -n openshift-cloud-controller-manager
Defaulted container "cloud-controller-manager" out of: cloud-controller-manager, azure-inject-credentials (init)
<omitted>
I1129 10:46:47.401672       1 controller.go:388] Ensuring load balancer for service openshift-ingress/router-default
I1129 10:46:47.401732       1 azure_loadbalancer.go:122] reconcileService: Start reconciling Service "openshift-ingress/router-default" with its resource basename "ac376ce0f66164eebb9fc0fa76a9c697"
I1129 10:46:47.401742       1 azure_loadbalancer.go:1533] reconcileLoadBalancer for service(openshift-ingress/router-default) - wantLb(true): started
I1129 10:46:47.401849       1 event.go:307] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I1129 10:46:47.505374       1 azure_loadbalancer_repo.go:73] LoadBalancerClient.List(aro-atokubi) success
I1129 10:46:47.573290       1 azure_loadbalancer.go:1557] reconcileLoadBalancer for service(openshift-ingress/router-default): lb(aro-atokubi/atokubi-vnkt5) wantLb(true) resolved load balancer name
I1129 10:46:47.643053       1 azure_vmssflex_cache.go:162] Could not find node () in the existing cache. Forcely freshing the cache to check again...
E1129 10:46:47.716774       1 azure_vmssflex.go:379] fs.GetNodeNameByIPConfigurationID(/subscriptions/fe16a035-e540-4ab7-80d9-373fa9a3d6ae/resourceGroups/aro-atokubi/providers/Microsoft.Network/networkInterfaces/atokubi-vnkt5-master0-nic/ipConfigurations/pipConfig) failed. Error: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0
E1129 10:46:47.716802       1 azure_loadbalancer.go:126] reconcileLoadBalancer(openshift-ingress/router-default) failed: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0
I1129 10:46:47.716835       1 azure_metrics.go:115] "Observed Request Latency" latency_seconds=0.315082823 request="services_ensure_loadbalancer" resource_group="aro-atokubi" subscription_id="fe16a035-e540-4ab7-80d9-373fa9a3d6ae" source="openshift-ingress/router-default" result_code="failed_ensure_loadbalancer"
E1129 10:46:47.716866       1 controller.go:291] error processing service openshift-ingress/router-default (will retry): failed to ensure load balancer: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0
I1129 10:46:47.716964       1 event.go:307] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0"

After changing vmType from empty to "standard" in cloud-provider-config, it can configure load balancer and errors are gone.

https://github.com/openshift/installer/pull/7793

Bug OCPBUGS-24638: Tuned Profiles going degraded due to the extra net.core.rps_default_mask configuration in openshift-node-performance-xxx-profile

View the Description View the linked PRs

Description of problem:
Issue - Profiles are degraded [1]even after applied due to below [2]error:

[1]

$oc get profile -A
NAMESPACE                                NAME                                          TUNED                APPLIED   DEGRADED   AGE
openshift-cluster-node-tuning-operator   master0    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master1    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master2    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   worker0    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker1    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker10   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker11   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker12   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker13   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker14   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker15   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker2    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker3    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker4  rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker5    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker6    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker7    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker8   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker9   rdpmc-patch-worker   True      True       5d

[2]

  lastTransitionTime: "2023-12-05T22:43:12Z"
    message: TuneD daemon issued one or more sysctl override message(s) during profile
      application. Use reapply_sysctl=true or remove conflicting sysctl net.core.rps_default_mask
    reason: TunedSysctlOverride
    status: "True"

If we see in rdpmc-patch-master tuned:

NAMESPACE                                NAME                                          TUNED                APPLIED   DEGRADED   AGE
openshift-cluster-node-tuning-operator   master0    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master1    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master2    rdpmc-patch-master   True      True       5d

We are configuring below in rdpmc-patch-master tuned:

$ oc get tuned rdpmc-patch-master -n openshift-cluster-node-tuning-operator -oyaml |less
spec:
  profile:
  - data: |
      [main]
      include=performance-patch-master
      [sysfs]
      /sys/devices/cpu/rdpmc = 2
    name: rdpmc-patch-master
  recommend:

Below in Performance-patch-master which is included in above tuned:

spec:
  profile:
  - data: |
      [main]
      summary=Custom tuned profile to adjust performance
      include=openshift-node-performance-master-profile
      [bootloader]
      cmdline_removeKernelArgs=-nohz_full=${isolated_cores}

Below(which is coming in error) is in openshift-node-performance-master-profile included in above tuned:

net.core.rps_default_mask=${not_isolated_cpumask}

RHEL BUg has been raised for the same https://issues.redhat.com/browse/RHEL-18972

    Version-Release number of selected component (if applicable):{code:none}
4.14

https://github.com/openshift/cluster-node-tuning-operator/pull/869

Bug OCPBUGS-35305: [release-4.15] OLM catalog pods do not recover from node failure

View the Description View the linked PRs

Observed behavior: Default OpenShift OLM catalog pods do not survive outage of the node that they are currently being executed on. The pods remain in termination state, despite the tolerations that should move them away from unresponsive nodes latest after 5 minutes.

Impact: Operators can no longer be installed or update from catalogs that were previously executed on a node that has gone down.

Expected behavior: The catalog pods get automatically rescheduled on remaining nodes and their gRPC API endpoint recovers as a result.

https://github.com/openshift/operator-framework-olm/pull/779

Bug OCPBUGS-39426: oc adm prune deployments` does not work and giving panic when using --replica-set option

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39341~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34877~~. The following is the description of the original issue:
—
Description of problem:

oc adm prune deployments` does not work and giving below error when using --replica-set option.

    [root@weyb1525 ~]# oc adm prune deployments --orphans --keep-complete=1 --keep-failed=0 --keep-younger-than=1440m --replica-sets --v=6
I0603 09:55:39.588085 1540280 loader.go:373] Config loaded from file:  /root/openshift-install/paas-03.build.net.intra.laposte.fr/auth/kubeconfig
I0603 09:55:39.890672 1540280 round_trippers.go:553] GET https://api-int.paas-03.build.net.intra.laposte.fr:6443/apis/apps.openshift.io/v1/deploymentconfigs 200 OK in 301 milliseconds
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
I0603 09:55:40.529367 1540280 round_trippers.go:553] GET https://api-int.paas-03.build.net.intra.laposte.fr:6443/apis/apps/v1/deployments 200 OK in 65 milliseconds
I0603 09:55:41.369413 1540280 round_trippers.go:553] GET https://api-int.paas-03.build.net.intra.laposte.fr:6443/api/v1/replicationcontrollers 200 OK in 706 milliseconds
I0603 09:55:43.083804 1540280 round_trippers.go:553] GET https://api-int.paas-03.build.net.intra.laposte.fr:6443/apis/apps/v1/replicasets 200 OK in 118 milliseconds
I0603 09:55:43.320700 1540280 prune.go:58] Creating deployment pruner with keepYoungerThan=24h0m0s, orphans=true, replicaSets=true, keepComplete=1, keepFailed=0
Dry run enabled - no modifications will be made. Add --confirm to remove deployments
panic: interface conversion: interface {} is *v1.Deployment, not *v1.DeploymentConfig

goroutine 1 [running]:
github.com/openshift/oc/pkg/cli/admin/prune/deployments.(*dataSet).GetDeployment(0xc007fa9bc0, {0x5052780?, 0xc00a0b67b0?})
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/data.go:171 +0x3d6
github.com/openshift/oc/pkg/cli/admin/prune/deployments.(*orphanReplicaResolver).Resolve(0xc006ec87f8)
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/resolvers.go:78 +0x1a6
github.com/openshift/oc/pkg/cli/admin/prune/deployments.(*mergeResolver).Resolve(0x55?)
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/resolvers.go:28 +0xcf
github.com/openshift/oc/pkg/cli/admin/prune/deployments.(*pruner).Prune(0x5007c40?, {0x50033e0, 0xc0083c19e0})
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/prune.go:96 +0x2f
github.com/openshift/oc/pkg/cli/admin/prune/deployments.PruneDeploymentsOptions.Run({0x0, 0x1, 0x1, 0x4e94914f0000, 0x1, 0x0, {0x0, 0x0}, {0x5002d00, 0xc000ba78c0}, ...})
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/deployments.go:206 +0xa03
github.com/openshift/oc/pkg/cli/admin/prune/deployments.NewCmdPruneDeployments.func1(0xc0005f4900?, {0xc0006db020?, 0x0?, 0x6?})
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/deployments.go:78 +0x118
github.com/spf13/cobra.(*Command).execute(0xc0005f4900, {0xc0006dafc0, 0x6, 0x6})
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:944 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc000e5b800)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:1068 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:992
k8s.io/component-base/cli.run(0xc000e5b800)
        /go/src/github.com/openshift/oc/vendor/k8s.io/component-base/cli/run.go:146 +0x317
k8s.io/component-base/cli.RunNoErrOutput(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/component-base/cli/run.go:84
main.main()
        /go/src/github.com/openshift/oc/cmd/oc/oc.go:77 +0x365

Version-Release number of selected component (if applicable):

How reproducible:

   Run  oc adm prune deployments command with --replica-sets option

 #  oc adm prune deployments --keep-younger-than=168h --orphans --keep-complete=5 --keep-failed=1 --replica-sets=true

Actual results:

    Its failing with below error:panic: interface conversion: interface {} is *v1.Deployment, not *v1.DeploymentConfig

Expected results:

    Its should not fail and work as expected.

Additional info:

    Slack thread https://redhat-internal.slack.com/archives/CKJR6200N/p1717519017531979

https://github.com/openshift/oc/pull/1865

Bug OCPBUGS-20403: UPI playbook is missing sg rules for compact cluster

View the Description View the linked PRs

Description of problem:

Master only installations with workers set to replicas 0 should be supported in UPI. At the moment, the ingress rules that are enabled on workers are not enabled on master as well.

Context: https://bugzilla.redhat.com/show_bug.cgi?id=1955544

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7576

Bug OCPBUGS-28764: Self-managed HCP pods are scheduled on single mgmt cluster node when no zones are in use

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22899~~. The following is the description of the original issue:
—
Description of problem:


In the self-managed HCP use case, if the on-premise baremetal management cluster does not have nodes labeled with the "topology.kubernetes.io/zone" key, then all HCP pods for a High Available cluster are scheduled to a single mgmt cluster node.

This is a result of the way the affinity rules are constructed.

Take the pod affinity/antiAffinity example below, which is generated for a HA HCP cluster. If the "topology.kubernetes.io/zone" label does not exist on the mgmt cluster nodes, then the pod will still get scheduled but that antiAffinity rule is effectively ignored. That seems odd due to the usage of the "requiredDuringSchedulingIgnoredDuringExecution" value, but I have tested this and the rule truly is ignored if the topologyKey is not present.

        podAffinity: 
          preferredDuringSchedulingIgnoredDuringExecution: 
          - podAffinityTerm: 
              labelSelector: 
                matchLabels: 
                  hypershift.openshift.io/hosted-control-plane: clusters-vossel1
              topologyKey: kubernetes.io/hostname
            weight: 100
        podAntiAffinity: 
          requiredDuringSchedulingIgnoredDuringExecution: 
          - labelSelector: 
              matchLabels: 
                app: kube-apiserver
                hypershift.openshift.io/control-plane-component: kube-apiserver
            topologyKey: topology.kubernetes.io/zone

In the event that no "zones" are configured for the baremetal mgmt cluster, then the only other pod affinity rule is one that actually colocates the pods together. This results in a HA HCP having all the etcd, apiservers, etc... scheduled to a single node.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. Create a self-managed HA HCP cluster on a mgmt cluster with nodes that lack the "topology.kubernetes.io/zone" label

Actual results:

all HCP pods are scheduled to a single node.

Expected results:

HCP pods should always be spread across multiple nodes.

Additional info:


A way to address this is to add another anti-affinity rule which prevents every component from being scheduled on the same node as its replicas

https://github.com/openshift/hypershift/pull/3495

Bug OCPBUGS-31274: [IBMCloud] dns records from a private cluster are not destroyed when it is using the same domain name as another existing CIS instance

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28870~~. The following is the description of the original issue:
—
Description of problem:

Install a private cluster, the base domain set in install-config.yaml is same as another existed cis domain name. 
After destroy the private cluster, the dns resource-records remains.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1.create a DNS service instance, setting its domain to "ibmcloud.qe.devcluster.openshift.com", Note, this domain name is also being used in another existing CIS domain.
2.Install a private ibmcloud cluster, the base domain set in install-config is "ibmcloud.qe.devcluster.openshift.com"
3.Destroy the cluster
4.Check the remains dns records

Actual results:

$ ibmcloud dns resource-records 5f8a0c4d-46c2-4daa-9157-97cb9ad9033a -i preserved-openshift-qe-private | grep ci-op-17qygd06-23ac4
api-int.ci-op-17qygd06-23ac4.ibmcloud.qe.devcluster.openshift.com 
*.apps.ci-op-17qygd06-23ac4.ibmcloud.qe.devcluster.openshift.com 
api.ci-op-17qygd06-23ac4.ibmcloud.qe.devcluster.openshift.com

Expected results:

No more dns records about the cluster

Additional info:

$ ibmcloud dns zones -i preserved-openshift-qe-private | awk '{print $2}'   
Name
private-ibmcloud.qe.devcluster.openshift.com 
private-ibmcloud-1.qe.devcluster.openshift.com 
ibmcloud.qe.devcluster.openshift.com  

$ ibmcloud cis domains
Name
ibmcloud.qe.devcluster.openshift.com

When use private-ibmcloud.qe.devcluster.openshift.com and private-ibmcloud-1.qe.devcluster.openshift.com as domain, no such issue, when use ibmcloud.qe.devcluster.openshift.com as domain the dns records remains.

https://github.com/openshift/installer/pull/8197

Bug OCPBUGS-25161: Avoid eviction of CSI driver daemonsets pods from the cluster-autoscaler

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23306~~. The following is the description of the original issue:
—
Related with https://issues.redhat.com/browse/OCPBUGS-23000

Cluster-autoscaler by default evict all those pods -including those coming from daemon sets-
In the case of EFS-CSI drivers, which are mounted as nfs volumes, this is causing nfs stale and that application worloads are not terminated gracefully.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

- While scaling down a node from the cluster-autoscaler-operator, the DS pods are beeing evicted.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

CSI pods might not be evicted by the cluster autoscaler (at least prior to workloads termination) as it might produce data corruption

Additional info:

Is possible to disable csi pods eviction adding the following annotation label on the csi driver pod
cluster-autoscaler.kubernetes.io/enable-ds-eviction: "false"

Bug OCPBUGS-29031: openshift/origin - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28662~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28573

Bug OCPBUGS-38895: Image registry unable to run due to permissions error

View the Description View the linked PRs

This is a clone of issue OCPBUGS-38842. The following is the description of the original issue:
—
Component Readiness has found a potential regression in the following test:

[sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers for ns/openshift-image-registry

Probability of significant regression: 98.02%

Sample (being evaluated) Release: 4.17
Start Time: 2024-08-15T00:00:00Z
End Time: 2024-08-22T23:59:59Z
Success Rate: 94.74%
Successes: 180
Failures: 10
Flakes: 0

Base (historical) Release: 4.16
Start Time: 2024-05-31T00:00:00Z
End Time: 2024-06-27T23:59:59Z
Success Rate: 100.00%
Successes: 89
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?Architecture=amd64&Architecture=amd64&FeatureSet=default&FeatureSet=default&Installer=ipi&Installer=ipi&Network=ovn&Network=ovn&NetworkAccess=default&Platform=aws&Platform=aws&Scheduler=default&SecurityMode=default&Suite=unknown&Suite=unknown&Topology=ha&Topology=ha&Upgrade=micro&Upgrade=micro&baseEndTime=2024-06-27%2023%3A59%3A59&baseRelease=4.16&baseStartTime=2024-05-31%2000%3A00%3A00&capability=Other&columnGroupBy=Platform%2CArchitecture%2CNetwork&component=Image%20Registry&confidence=95&dbGroupBy=Platform%2CArchitecture%2CNetwork%2CTopology%2CFeatureSet%2CUpgrade%2CSuite%2CInstaller&environment=amd64%20default%20ipi%20ovn%20aws%20unknown%20ha%20micro&ignoreDisruption=true&ignoreMissing=false&includeVariant=Architecture%3Aamd64&includeVariant=FeatureSet%3Adefault&includeVariant=Installer%3Aipi&includeVariant=Installer%3Aupi&includeVariant=Owner%3Aeng&includeVariant=Platform%3Aaws&includeVariant=Platform%3Aazure&includeVariant=Platform%3Agcp&includeVariant=Platform%3Ametal&includeVariant=Platform%3Avsphere&includeVariant=Topology%3Aha&minFail=3&pity=5&sampleEndTime=2024-08-22%2023%3A59%3A59&sampleRelease=4.17&sampleStartTime=2024-08-15%2000%3A00%3A00&testId=openshift-tests-upgrade%3A10a9e2be27aa9ae799fde61bf8c992f6&testName=%5Bsig-cluster-lifecycle%5D%20pathological%20event%20should%20not%20see%20excessive%20Back-off%20restarting%20failed%20containers%20for%20ns%2Fopenshift-image-registry

Also hitting 4.17, I've aligned this bug to 4.18 so the backport process is cleaner.

The problem appears to be a permissions error preventing the pods from starting:

2024-08-22T06:14:14.743856620Z ln: failed to create symbolic link '/etc/pki/ca-trust/extracted/pem/directory-hash/ca-certificates.crt': Permission denied

Originating from this code: https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/resource/podtemplatespec.go#L489

Both 4.17 and 4.18 nightlies bumped rhcos and in there is an upgrade like this:

container-selinux-3-2.231.0-1.rhaos4.16.el9-noarch container-selinux-3-2.231.0-2.rhaos4.17.el9-noarch

With slightly different versions in each stream, but both were on 3-2.231.

Hits other tests too:

operator conditions image-registry
Operator upgrade image-registry
[sig-cluster-lifecycle] Cluster completes upgrade
[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]
[sig-arch][Feature:ClusterUpgrade] Cluster should be upgradeable after finishing upgrade [Late][Suite:upgrade]

https://github.com/openshift/cluster-image-registry-operator/pull/1105

Bug OCPBUGS-38939: [4.15] Ironic issues soft power_off command during installation via ACM, preventing fakefish from working on certain configurations

View the Description View the linked PRs

Description of problem:

Even though fakefish is not a supported redfish interface, it is very useful to have it working for "special" scenarios, like NC-SI, while its support is implemented.

On OCP 4.14 and later, converged flow is enabled by default, and on this configuration Ironic sends a soft power_off command to the ironic agent running on the ramdisk. Since this power operation is not going through the redfish interface, it is not processed by fakefish, preventing it from working on some NC-SI configurations, where a full power-off would mean the BMC loses power.

Ironic already supports using out-of-band power off for the agent [1], so having an option to use it would be very helpful.

[1]- https://opendev.org/openstack/ironic/commit/824ad1676bd8032fb4a4eb8ffc7625a376a64371

Version-Release number of selected component (if applicable):

Seen with OCP 4.14.26 and 4.14.33, expected to happen on later versions

How reproducible:

Always

Steps to Reproduce:

    1. Deploy SNO node using ACM and fakefish as redfish interface
    2. Check metal3-ironic pod logs

Actual results:

We can see a soft power_off command sent to the ironic agent running on the ramdisk:

2024-08-07 15:00:45.545 1 DEBUG ironic.drivers.modules.agent_client [None req-74c0c3ed-011f-4718-bdce-53f2ba412e85 - - - - - -] Executing agent command standby.power_off for node df006e90-02ee-4847-b532-be4838e844e6 with params {'wait': 'false', 'agent_token': '***'} _command /usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_client.py:197
2024-08-07 15:00:45.551 1 DEBUG ironic.drivers.modules.agent_client [None req-74c0c3ed-011f-4718-bdce-53f2ba412e85 - - - - - -] Agent command standby.power_off for node df006e90-02ee-4847-b532-be4838e844e6 returned result None, error None, HTTP status code 200 _command /usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_client.py:234

Expected results:

There is an option to prevent this soft power_off command, so all power actions happen via redfish. This would allow fakefish to capture them and behave as needed.

Additional info:

https://github.com/openshift/baremetal-operator/pull/373

Bug MGMT-15980: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5569

Bug OCPBUGS-17293: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-ibmcloud/pull/25

Bug OCPBUGS-24098: Update 4.15 ose-cluster-kube-apiserver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1589

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1589

Bug OCPBUGS-29508: hcp create nodepool agent '--node-upgrade-type' param is mandatory although in --help it has default value

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29115~~. The following is the description of the original issue:
—
Description of problem:

Trying to run without --node-upgrade-type param fails for "spec.management.upgradeType: Unsupported value: \"\": supported values: \"Replace\", \"InPlace\""


although in --help it is documented to have a default value of 'InPlace'

Version-Release number of selected component (if applicable):

 [kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp -v
hcp version openshift/hypershift: af9c0b3ce9c612ec738762a8df893c7598cbf157. Latest supported OCP: 4.15.0
[

How reproducible:

  happens all the time

Steps to Reproduce:

    1.on an hosted cluster setup run :
[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2 --node-upgrade-type Replace --help
Creates basic functional NodePool resources for Agent platformUsage:
  hcp create nodepool agent [flags]Flags:
  -h, --help   help for agentGlobal Flags:
      --cluster-name string             The name of the HostedCluster nodes in this pool will join. (default "example")
      --name string                     The name of the NodePool.
      --namespace string                The namespace in which to create the NodePool. (default "clusters")
      --node-count int32                The number of nodes to create in the NodePool. (default 2)
      --node-upgrade-type UpgradeType   The NodePool upgrade strategy for how nodes should behave when upgraded. Supported options: Replace, InPlace (default )
      --release-image string            The release image for nodes; if this is empty, defaults to the same release image as the HostedCluster.
      --render                          Render output as YAML to stdout instead of applying.
     

2.try to run with default value of --node-upgrade-type:
[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2

Actual results:

[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2
2024-02-06T19:57:03+02:00       ERROR   Failed to create nodepool       {"error": "NodePool.hypershift.openshift.io \"nodepool-of-extra1\" is invalid: spec.management.upgradeType: Unsupported value: \"\": supported values: \"Replace\", \"InPlace\""}
github.com/openshift/hypershift/cmd/nodepool/core.(*CreateNodePoolOptions).CreateRunFunc.func1
        /home/kni/hypershift_working/hypershift/cmd/nodepool/core/create.go:39
github.com/spf13/cobra.(*Command).execute
        /home/kni/hypershift_working/hypershift/vendor/github.com/spf13/cobra/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
        /home/kni/hypershift_working/hypershift/vendor/github.com/spf13/cobra/command.go:1115
github.com/spf13/cobra.(*Command).Execute
        /home/kni/hypershift_working/hypershift/vendor/github.com/spf13/cobra/command.go:1039
github.com/spf13/cobra.(*Command).ExecuteContext
        /home/kni/hypershift_working/hypershift/vendor/github.com/spf13/cobra/command.go:1032
main.main
        /home/kni/hypershift_working/hypershift/product-cli/main.go:60
runtime.main
        /home/kni/hypershift_working/go/src/runtime/proc.go:250
Error: NodePool.hypershift.openshift.io "nodepool-of-extra1" is invalid: spec.management.upgradeType: Unsupported value: "": supported values: "Replace", "InPlace"
NodePool.hypershift.openshift.io "nodepool-of-extra1" is invalid: spec.management.upgradeType: Unsupported value: "": supported values: "Replace", "InPlace"

Expected results:

   should pass as if your adding the param :
[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2 --node-upgrade-type InPlace
NodePool nodepool-of-extra1 created
[kni@ocp-edge119 ~]$

Additional info:

A related issue is that we have a difference if the --help is used with other parameters or not :

[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2 --node-upgrade-type Replace --help > long.help.out
[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --help > short.help.out
[kni@ocp-edge119 ~]$ diff long.help.out short.help.out 
14c14
<       --node-upgrade-type UpgradeType   The NodePool upgrade strategy for how nodes should behave when upgraded. Supported options: Replace, InPlace (default )
---
>       --node-upgrade-type UpgradeType   The NodePool upgrade strategy for how nodes should behave when upgraded. Supported options: Replace, InPlace
[kni@ocp-edge119 ~]$

https://github.com/openshift/hypershift/pull/3582

Bug OCPBUGS-31490: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-controller-manager/pull/294

Bug OCPBUGS-19154: Update 4.15 openshift-enterprise-registry image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-registry/pull/379

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-registry/pull/379

Bug OCPBUGS-26066: Regression: [sig-arch] events should not repeat pathologically for ns/openshift-operator-lifecycle-manager

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25830~~. The following is the description of the original issue:
—
Component Readiness has found a potential regression in [sig-arch] events should not repeat pathologically for ns/openshift-operator-lifecycle-manager.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.15
Start Time: 2023-12-05T00:00:00Z
End Time: 2023-12-11T23:59:59Z
Success Rate: 94.30%
Successes: 248
Failures: 15
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 100.00%
Successes: 730
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=OLM&confidence=95&environment=ovn%20upgrade-minor%20amd64%20aws%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=aws&sampleEndTime=2023-12-11%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2023-12-05%2000%3A00%3A00&testId=openshift-tests-upgrade%3A480dd81bbb3ca53f8daa59222281fea8&testName=%5Bsig-arch%5D%20events%20should%20not%20repeat%20pathologically%20for%20ns%2Fopenshift-operator-lifecycle-manager&upgrade=upgrade-minor&variant=standard

https://github.com/openshift/operator-framework-olm/pull/648

Bug OCPBUGS-28725: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2050

Task MON-3528: Fix Prometheus downstream manifest file

View the linked PRs

https://github.com/openshift/prometheus/pull/186

Bug OCPBUGS-22476: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/183

Bug OCPBUGS-23126: 'hcp destroy' command can leave HostedCluster hanging indefinitely during cleanup

View the Description View the linked PRs

Description of problem:

A user destroying a HostedCluster can cause the HostedCluster to hang indefinitely if the destroy command times out during execution

This is due to the hcp cli placing a finalizer on the HostedCluster during deletion which the cli tool later removes after waiting for some clean up actions to occur. If a user cancels the `hcp destroy cluster` command (or the command times out) while the cli is waiting for cleanup, then the HostedCluster will hang indefinitely with a DeletionTimestamp != nil.

The cli tool should not be putting the HostedCluster into an un-reconcilable state. All this finializer cleanup logic belongs on the backend.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. create an hcp cluster
2. destroy the hcp cluster with the cli tool and immediately abort the cli process
3.

Actual results:

HostedCluster is stuck indefinitely during deletion

Expected results:

HostedCluster is able to delete despite the cli being cancelled.

Additional info:

related to https://access.redhat.com/support/cases/#/case/03660218

https://github.com/openshift/hypershift/pull/3234

Bug OCPBUGS-25463: when set a custom endpoint, the private IAM url would be overrode together for installing a ibmcloud cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24473~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7844

Bug OCPBUGS-37727: Backport owners file for multus admission controller

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36341~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/multus-admission-controller/pull/88

Bug MGMT-16258: When preparing for skipping reboot if installation disk is nvme the partition names are incorrect

View the Description View the linked PRs

Description of the problem:
When preparing for skipping reboot, the partition names are generated by appending "4" and "3" to the installation disk. This is not always correct. For nvme we should append "p4", and "p3"

How reproducible:

Always with nvme

Steps to reproduce:

1. Try install with nvme installation disk

2.

3.

Actual results:

The reboot is not skipped

Expected results:
The reboot should be skipped

https://github.com/openshift/assisted-installer/pull/752

Bug OCPBUGS-28235: Required RBAC for network-node-identity is not created when hosted cluster networkType is set to Other.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26977~~. The following is the description of the original issue:
—
Description of problem:

When using a custom CNI plugin in a hostedcluster, multus requires some CSRs to be approved. The component approving these CSRs is the network-node-identity. This component only gets the proper RBAC rules configured when networkType is set to Calico.

In the current implementation, there is an condition that will apply the required RBAC if the networkType is set to Calico[1].

When using other CNI plugins, like Cilium, you're supposed to set networkType to Other. With current implementation, you won't get the required RBAC in place and as such, the required CSRs won't be approved automatically.


[1] https://github.com/openshift/hypershift/blob/release-4.14/control-plane-operator/controllers/hostedcontrolplane/cno/clusternetworkoperator.go#L139

Version-Release number of selected component (if applicable):

Latest

How reproducible:

Always

Steps to Reproduce:

    1. Set hostedcluster.spec.networking.networkType to Other
    2. Wait for the HC to start deploying and for the Nodes to join the cluster
    3. The nodes will remain in NotReady. Multus pods will complaing about certificates not being ready.
    4. If you list CSRs you will find pending CSRs.

Actual results:

RBAC not properly configured when networkType set to Other

Expected results:

RBAC properly configured when networkType set to Other

Additional info:

Slack discussion:

https://redhat-internal.slack.com/archives/C01C8502FMM/p1704824277049609

https://github.com/openshift/hypershift/pull/3467

Bug OCPBUGS-15583: MachineConfig rollout after Control-Plane Node(s) CPU and Memory update because of nodeStatusUpdateFrequency being updated

View the Description View the linked PRs

Description of problem:

After adding additional CPU and Memory to the OpenShift Container Platform 4 - Control-Plane Node(s) it was noticed that a new MachineConfig was rolled out, causing all OpenShift Container Platform 4 - Node(s) to reboot unexpected.

Interesting enough, no new MachineConfig was rendered but actually a slightly older MachineConfig was picked and applied to all OpenShift Container Platform 4 - Node after the change on the OpenShift Container Platform 4 - Control-Plane Node(s) was performed.

The only visible change found in the MachineConfig was that nodeStatusUpdateFrequency was updated from 10s to 0s even though nodeStatusUpdateFrequency is not specified or configured in any MachineConfig or KubeletConfig.

https://issues.redhat.com/browse/OCPBUGS-6723 was found but given that the affected OpenShift Container Platform 4 - Cluster is running 4.11.35 it's difficult to understand what happen as generally this problem was/is suspected to be solved.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.11.35

How reproducible:

Unknown

Steps to Reproduce:

1. OpenShift Container Platform 4 on AWS
2. Updating OpenShift Container Platform 4 - Control-Plane Node(s) to add more CPU and Memory 
3. Check whether a potential MachineConfig update is being applied

Actual results:

MachineConfig update is being rolled out to all OpenShift Container Platform 4 - Node(s) after adding CPU and Memoy to OpenShift Container Platform 4 - Control-Plane Node(s) as nodeStatusUpdateFrequency is being updated, which is rather unexpected or not clear why it's happening.

Expected results:

Either no new MachineConfig to rollout after such a change or else to have a newly rendered MachineConfig that is being rolled out with information of what changed and why this change was applied

Additional info:

https://github.com/openshift/machine-config-operator/pull/3890

Bug OCPBUGS-23741: Bump cluster-dns-operator to Kubernetes 1.28 for 4.15

View the Description View the linked PRs

Description of problem

The cluster-dns-operator repository vendors k8s.io/* v0.27.2 and controller-runtime v0.15.0. OpenShift 4.15 is based on Kubernetes 1.28.

Version-Release number of selected component (if applicable)

4.15.

How reproducible

Always.

Steps to Reproduce

Check https://github.com/openshift/cluster-dns-operator/blob/release-4.15/go.mod.

Actual results

The k8s.io/* packages are at v0.27.2, and the sigs.k8s.io/controller-runtime package is at v0.15.0.

Expected results

The k8s.io/* packages are at v0.28.0 or newer, and the sigs.k8s.io/controller-runtime package is at v0.16.0 or newer.

Additional info

The controller-runtime v0.16 release includes some breaking changes; see the release notes at https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.16.0.

https://github.com/openshift/cluster-dns-operator/pull/395

Bug OCPBUGS-24101: Update 4.15 ose-nutanix-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-nutanix/pull/23

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-nutanix/pull/23

Bug OCPBUGS-24160: Update 4.15 ose-csi-snapshot-validation-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/116

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/116

Bug OCPBUGS-34887: [Backport 4.15] TestHostNetworkPort is half serial and half parallel

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30091~~. The following is the description of the original issue:
—

Description of problem

CI is flaky because the TestHostNetworkPort test fails:

=== NAME  TestAll/serial/TestHostNetworkPortBinding
    operator_test.go:1034: Expected conditions: map[Admitted:True Available:True DNSManaged:False DeploymentReplicasAllAvailable:True LoadBalancerManaged:False]
         Current conditions: map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:False DeploymentReplicasMinAvailable:True DeploymentRollingOut:True EvaluationConditionsDetected:False LoadBalancerManaged:False LoadBalancerProgressing:False Progressing:True Upgradeable:True]
    operator_test.go:1034: Ingress Controller openshift-ingress-operator/samehost status: {
          "availableReplicas": 0,
          "selector": "ingresscontroller.operator.openshift.io/deployment-ingresscontroller=samehost",
          "domain": "samehost.ci-op-xlwngvym-43abb.origin-ci-int-aws.dev.rhcloud.com",
          "endpointPublishingStrategy": {
            "type": "HostNetwork",
            "hostNetwork": {
              "protocol": "TCP",
              "httpPort": 9080,
              "httpsPort": 9443,
              "statsPort": 9936
            }
          },
          "conditions": [
            {
              "type": "Admitted",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "Valid"
            },
            {
              "type": "DeploymentAvailable",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentAvailable",
              "message": "The deployment has Available status condition set to True"
            },
            {
              "type": "DeploymentReplicasMinAvailable",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentMinimumReplicasMet",
              "message": "Minimum replicas requirement is met"
            },
            {
              "type": "DeploymentReplicasAllAvailable",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentReplicasNotAvailable",
              "message": "0/1 of replicas are available"
            },
            {
              "type": "DeploymentRollingOut",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentRollingOut",
              "message": "Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n"
            },
            {
              "type": "LoadBalancerManaged",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "EndpointPublishingStrategyExcludesManagedLoadBalancer",
              "message": "The configured endpoint publishing strategy does not include a managed load balancer"
            },
            {
              "type": "LoadBalancerProgressing",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "LoadBalancerNotProgressing",
              "message": "LoadBalancer is not progressing"
            },
            {
              "type": "DNSManaged",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "UnsupportedEndpointPublishingStrategy",
              "message": "The endpoint publishing strategy doesn't support DNS management."
            },
            {
              "type": "Available",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z"
            },
            {
              "type": "Progressing",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "IngressControllerProgressing",
              "message": "One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n)"
            },
            {
              "type": "Degraded",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z"
            },
            {
              "type": "Upgradeable",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "Upgradeable",
              "message": "IngressController is upgradeable."
            },
            {
              "type": "EvaluationConditionsDetected",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "NoEvaluationCondition",
              "message": "No evaluation condition is detected."
            }
          ],
          "tlsProfile": {
            "ciphers": [
              "ECDHE-ECDSA-AES128-GCM-SHA256",
              "ECDHE-RSA-AES128-GCM-SHA256",
              "ECDHE-ECDSA-AES256-GCM-SHA384",
              "ECDHE-RSA-AES256-GCM-SHA384",
              "ECDHE-ECDSA-CHACHA20-POLY1305",
              "ECDHE-RSA-CHACHA20-POLY1305",
              "DHE-RSA-AES128-GCM-SHA256",
              "DHE-RSA-AES256-GCM-SHA384",
              "TLS_AES_128_GCM_SHA256",
              "TLS_AES_256_GCM_SHA384",
              "TLS_CHACHA20_POLY1305_SHA256"
            ],
            "minTLSVersion": "VersionTLS12"
          },
          "observedGeneration": 1
        }
    operator_test.go:1036: failed to observe expected conditions for the second ingresscontroller: timed out waiting for the condition
    operator_test.go:1059: deleted ingresscontroller samehost
    operator_test.go:1059: deleted ingresscontroller hostnetworkportbinding

This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1017/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1762147882179235840. Search.ci shows another failure: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/48873/rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi/1762576595890999296. The test has failed sporadically in the past, beyond what search.ci is able to search.

TestHostNetworkPort is marked as a serial test in TestAll and marked with t.Parallel() in the test itself. Not sure if this is what is causing a new failure seen in this test, but something is incorrect.

Version-Release number of selected component (if applicable)

The test failures have been observed recently on 4.16 as well as on 4.12 (https://github.com/openshift/cluster-ingress-operator/pull/828#issuecomment-1292888086) and 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/914#issuecomment-1526808286). The logic error was introduced in 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc).

How reproducible

The logic error is self-evident. The test failure is very rare. The failure has been observed sporadically over the past couple years. Presently, search.ci shows two failures, with the following impact, for the past 14 days:

rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi (all) - 3 runs, 33% failed, 100% of failures match = 33% impact

pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 16 runs, 25% failed, 25% of failures match = 6% impact

Steps to Reproduce

N/A.

Actual results

The TestHostNetworkPort test fails. The test is marked as both serial and parallel.

Expected results

Test should be marked as either serial or parallel, and it should pass consistently.

Additional info

When TestAll was introduced, TestHostNetworkPortBinding was initially marked parallel in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc. After some discussion, it was moved to the serial list in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a449e497e35fafeecbee9ea656e0631393182f70, but the commit to remove t.Parallel() evidently got inadvertently dropped.

https://github.com/openshift/cluster-ingress-operator/pull/1075

Bug OCPBUGS-30804: Hypershift image configuration not working for Hypershift HostedCluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13726~~. The following is the description of the original issue:
—
Description of problem:

Add image configuration for hypershift Hosted Cluster not working as expected.

Version-Release number of selected component (if applicable):

# oc get clusterversions.config.openshift.io
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-rc.8   True        False         6h46m   Cluster version is 4.13.0-rc.8

How reproducible:

Always

Steps to Reproduce:

1. Get hypershift hosted cluster detail from management cluster. 

# hostedcluster=$( oc get -n clusters hostedclusters -o json | jq -r '.items[].metadata.name')  

2. Apply image setting for hypershift hosted cluster. 
#  oc patch hc/$hostedcluster -p '{"spec":{"configuration":{"image":{"registrySources":{"allowedRegistries":["quay.io","registry.redhat.io","image-registry.openshift-image-registry.svc:5000","insecure.com"],"insecureRegistries":["insecure.com"]}}}}}' --type=merge -n clusters     
hostedcluster.hypershift.openshift.io/85ea85757a5a14355124 patched 

# oc get HostedCluster $hostedcluster -n clusters -ojson | jq .spec.configuration.image
{
  "registrySources": {
    "allowedRegistries": [
      "quay.io",
      "registry.redhat.io",
      "image-registry.openshift-image-registry.svc:5000",
      "insecure.com"
    ],
    "insecureRegistries": [
      "insecure.com"
    ]
  }
}

3. Check Pod or operator restart to apply configuration changes. 

# oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster}
NAME                              READY   STATUS    RESTARTS   AGE
kube-apiserver-67b6d4556b-9nk8s   5/5     Running   0          49m
kube-apiserver-67b6d4556b-v4fnj   5/5     Running   0          47m
kube-apiserver-67b6d4556b-zldpr   5/5     Running   0          51m

#oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster} -l app=openshift-apiserver
NAME                                   READY   STATUS    RESTARTS   AGE
openshift-apiserver-7c69d68f45-4xj8c   3/3     Running   0          136m
openshift-apiserver-7c69d68f45-dfmk9   3/3     Running   0          135m
openshift-apiserver-7c69d68f45-r7dqn   3/3     Running   0          136m  

4. Check image.config in hosted cluster.
# oc get image.config -o yaml
...
  spec:
    allowedRegistriesForImport: []
  status:
    externalRegistryHostnames:
    - default-route-openshift-image-registry.apps.hypershift-ci-32506.qe.devcluster.openshift.com
    internalRegistryHostname: image-registry.openshift-image-registry.svc:5000  

#oc get node
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-128-61.us-east-2.compute.internal    Ready    worker   6h42m   v1.26.3+b404935
ip-10-0-130-68.us-east-2.compute.internal    Ready    worker   6h42m   v1.26.3+b404935
ip-10-0-134-89.us-east-2.compute.internal    Ready    worker   6h42m   v1.26.3+b404935
ip-10-0-138-169.us-east-2.compute.internal   Ready    worker   6h42m   v1.26.3+b404935

# oc debug node/ip-10-0-128-61.us-east-2.compute.internal
Temporary namespace openshift-debug-mtfcw is created for debugging node...
Starting pod/ip-10-0-128-61us-east-2computeinternal-debug-mctvr ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.61
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# cat /etc/containers/registries.conf
unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
short-name-mode = ""[[registry]]
  prefix = ""
  location = "registry-proxy.engineering.redhat.com"  [[registry.mirror]]
    location = "brew.registry.redhat.io"
    pull-from-mirror = "digest-only"[[registry]]
  prefix = ""
  location = "registry.redhat.io"  [[registry.mirror]]
    location = "brew.registry.redhat.io"
    pull-from-mirror = "digest-only"[[registry]]
  prefix = ""
  location = "registry.stage.redhat.io"  [[registry.mirror]]
    location = "brew.registry.redhat.io"
    pull-from-mirror = "digest-only"

Actual results:

Config changes not applied in backend.Not operator & pod restart

Expected results:

Configuration should applied and pod & operator should restart after config changes.

Additional info:

https://github.com/openshift/hypershift/pull/3730

Bug OCPBUGS-46034: Due to trailing dot(.) in domain name openshift installation getting failed.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-45974~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-45918~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45889. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45306. The following is the description of the original issue:
—
Description of problem:

Customer is trying to install Self managed OCP cluster in aws. This customer use AWS VPC DHCPOptionSet. where it has a trailing dot (.) at the end of domain name in dhcpoptionset. due to this setting Master nodes hostname also has trailing dot & this cause failure in OpenShift installation.

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

1.Please create a aws vpc with DHCPOptionSet, where DHCPoptionSet has trailing dot at the domain name.
2.Try installation of cluster with IPI.

Actual results:

    Openshift Installer should allowed to create AWS Master nodes, where domain has trailing dot(.).

Expected results:

Additional info:

Bug OCPBUGS-18996: "Create StorageClass" form breaks when a dynamic provisioner is selected

View the Description View the linked PRs

Description of problem:

Please check: https://issues.redhat.com/browse/OCPBUGS-18702?focusedId=23021716&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23021716 for more details.

https://drive.google.com/drive/folders/14aSJs-lO6HC-2xYFlOTJtCZIQg3ekE85?usp=sharing (plz check recording "sc_form_typeerror.mp4").

Issues:
1. TypeError mentioned above.
2. Default params added by an extension are not getting added to the created StorageClass.
3. Validation for parameters added by an extension in not working correctly as well.
4. The Provisioner child details will be stuck once user selected 'openshift-storage.cephfs.csi.ceph.com'.

Version-Release number of selected component (if applicable):

4.14 (OCP)

How reproducible:

Steps to Reproduce:

1. Install ODF operator.
2. Create StorageSystem (once dynamic plugin is loaded).
3. Wait for a while for ODF related StorageClasses gets created.
4. Once they are created, go to "Create StorageSystem" form.
5. Switch to provisioners (rbd.csi.ceph) added by ODF dynamic plugin.

Actual results:

Page breaks with an error.

Expected results:

Page should not break.
And functionality should be how it was acting before the refactoring introduced by PR: https://github.com/openshift/console/pull/13036

Additional info:

Stack trace:
Caught error in a child component: TypeError: Cannot read properties of undefined (reading 'parameters')
    at allRequiredFieldsFilled (storage-class-form.tsx:204:1)
    at validateForm (storage-class-form.tsx:235:1)
    at storage-class-form.tsx:262:1
    at invokePassiveEffectCreate (react-dom.development.js:23487:1)
    at HTMLUnknownElement.callCallback (react-dom.development.js:3945:1)
    at Object.invokeGuardedCallbackDev (react-dom.development.js:3994:1)
    at invokeGuardedCallback (react-dom.development.js:4056:1)
    at flushPassiveEffectsImpl (react-dom.development.js:23574:1)
    at unstable_runWithPriority (scheduler.development.js:646:1)
    at runWithPriority$1 (react-dom.development.js:11276:1) {componentStack: '\n    at StorageClassFormInner (http://localhost:90...c03030668ef271da51f.js:491534:20)\n    at Suspense'}

https://github.com/openshift/console/pull/13153

Bug OCPBUGS-19155: Update 4.15 ose-csi-driver-shared-resource-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/142

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-19916: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13259

Bug OCPBUGS-17288: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/248

Bug OCPBUGS-27591: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-operator-controller/pull/68

Bug OCPBUGS-41340: [4.15] EgressIP intermittent connection timeout while communicating with external services

View the Description View the linked PRs

Description of problem:

- Pods that reside in a namespace utilizing EgressIP are experiencing intermittent TCP IO timeouts when attempting to communicate with external services.

Connection response while connecting external service from one of the pods:

❯ oc exec gitlab-runner-aj-02-56998875b-n6xxb -- bash -c 'while true; do timeout 3 bash -c "</dev/tcp/10.135.108.56/443" && echo "Connection success" || echo "Connection timeout"; sleep 0.5; done'
Connection success
Connection timeout
Connection timeout
Connection timeout
Connection timeout
Connection timeout
Connection success
Connection timeout
Connection success

The customer followed this solution https://access.redhat.com/solutions/7005481 and noticed an IP address in logical_router_policy nexthops that is not associated with any node.

# Get pod node and podIP variable for the problematic pod 
❯ oc get pod gitlab-runner-aj-02-56998875b-n6xxb -ojson 2>/dev/null | jq -r '"\(.metadata.name) \(.spec.nodeName) \(.status.podIP)"' | read -r pod node podip

# Find the ovn-kubernetes pod running on the same node as  gitlab-runner-aj-02-56998875b-n6xxb
❯ oc get pods -n openshift-ovn-kubernetes -lapp=ovnkube-node -ojson | jq --arg node "$node" -r '.items[] | select(.spec.nodeName == $node)| .metadata.name' | read -r ovn_pod

# Collect each possible logical switch port address into variable LSP_ADDRESSES
❯ LSP_ADDRESSES=$(oc -n openshift-ovn-kubernetes exec ${ovn_pod} -it -c northd -- bash -c 'ovn-nbctl lsp-list transit_switch | while read guid name; do printf "%s " "${name}"; ovn-nbctl lsp-get-addresses "${guid}"; done')

# List the logical router policy for the problematic pod
❯ oc -n openshift-ovn-kubernetes exec ${ovn_pod} -c northd -- ovn-nbctl find logical_router_policy match="\"ip4.src == ${podip}\""
_uuid               : c55bec59-6f9a-4f01-a0b1-67157039edb8
action              : reroute
external_ids        : {name=gitlab-runner-caasandpaas-egress}
match               : "ip4.src == 172.40.114.40"
nexthop             : []
nexthops            : ["100.88.0.22", "100.88.0.57"]
options             : {}
priority            : 100

# Check whether each nexthop entry exists in the LSP addresses table
❯ echo $LSP_ADDRESSES | grep 100.88.0.22
(tstor-c1nmedi01-9x2g9-worker-cloud-paks-m9t6b) 0a:58:64:58:00:16 100.88.0.22/16
❯ echo $LSP_ADDRESSES | grep 100.88.0.57

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Pods configured to use EgressIP face intermittent connection timeout while connecting to external services.

Expected results:

The connection timeout should not happen.

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
For guidance on using this template please see
OCPBUGS Template Training for Networking components

https://github.com/openshift/ovn-kubernetes/pull/2288

Bug OCPBUGS-19098: Update 4.15 baremetal-runtimecfg image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-runtimecfg/pull/274

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-runtimecfg/pull/274

Bug OCPBUGS-21631: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2120

Bug OCPBUGS-33200: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/204

Bug OCPBUGS-35032: [internal] add spot instance support to control plane nodes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34976~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34975~~. The following is the description of the original issue:
—
Description of problem:

    See https://issues.redhat.com//browse/CORS-3523 and https://issues.redhat.com//browse/CORS-3524 for the overall issue.

Creating this bug for backporting purposes.

Version-Release number of selected component (if applicable):

all

How reproducible:

    always in the terraform path

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    spot instances only supported for worker nodes.

Expected results:

    spot instances used for all nodes.

Additional info:

https://github.com/openshift/installer/pull/8540

Bug OCPBUGS-25211: PipelineRun List page list PipelineRuns from all namespace

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13439

Bug OCPBUGS-30759: Fix build issues in Console dynamic plugin SDK v1

View the Description View the linked PRs

We have detected several bugs in Console dynamic plugin SDK v1 as part of Kubevirt plugin PR #1804

These bugs affect dynamic plugins which target Console 4.15+

1. Build errors related to Hot Module Replacement chunk files

ERROR in [entry] [initial] kubevirt-plugin.494371abc020603eb01f.hot-update.js
Missing call to loadPluginEntry

2. Build warnings issued by `dynamic-module-import-loader`

LOG from @openshift-console/dynamic-plugin-sdk-webpack/lib/webpack/loaders/dynamic-module-import-loader ../node_modules/ts-loader/index.js??ruleSet[1].rules[0].use[0]!./utils/hooks/useKubevirtWatchResource.ts
<w> Detected parse errors in /home/vszocs/work/kubevirt-plugin/src/utils/hooks/useKubevirtWatchResource.ts

3. Build warnings related to PatternFly shared modules

WARNING in shared module @patternfly/react-core
No required version specified and unable to automatically determine one. Unable to find required version for "@patternfly/react-core" in description file (/home/vszocs/work/kubevirt-plugin/node_modules/@openshift-console/dynamic-plugin-sdk/package.json). It need to be in dependencies, devDependencies or peerDependencies.

How to reproduce

1. git clone Kubevirt plugin repo
2. switch to commit containing changes from PR #1804
3. yarn install && yarn dev to update dependencies and start local dev server

https://github.com/openshift/console/pull/13678

Bug OCPBUGS-44973: [release-4.15] Provide support for user owned IPsec machine configs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44846~~. The following is the description of the original issue:
—
Description of problem:

There is regression issue found with libreswan 4.9 and later versions which causes ipsec tunnel broken and making pod to pod traffic failing intermittently. But this issue is not seen with libreswan 4.5.

So we must provide a flexibility for user to install their own IPsec machine config to choose their own libreswan version instead of sticking with CNO managed IPsec machine config which installs libreswan version which comes with RHCOS distro.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2576

Bug OCPBUGS-27348: ingress operator appears to be reporting unavailable in error

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27341~~. The following is the description of the original issue:
—

: [bz-Routing] clusteroperator/ingress should not change

Has been failing for over a month in the e2e-metal-ipi-sdn-bm-upgrade jobs

I think this is because there are only two worker nodes in the BM environment and some HA services loose redundancy when one of the workers is rebooted.

In the medium term I hope to add another node to each cluster but in the sort term we should skip the test.

https://github.com/openshift/origin/pull/28531

Bug OCPBUGS-27671: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-catalogd/pull/39

Bug OCPBUGS-32173: [4.15] Fix name in setup.cfg for cachito configuration

View the Description View the linked PRs

the name in setup.cfg is incorrectly set as ironic-image
it should be ironic-agent-image

https://github.com/openshift/ironic-agent-image/pull/123

Bug OCPBUGS-33526: CNO unable to healthcheck api.openshift.com on HyperShift when a proxy is configured

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26762~~. The following is the description of the original issue:
—
Description of problem:

When a proxy.config.openshift.io is specified on a HyperShift cluster (in this case ROSA HCP), the network cluster operator is degraded:

❯ k get co network                                                                                                 
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
network   4.14.6    True        False         True       2d1h    The configuration is invalid for proxy 'cluster' (readinessEndpoint probe failed for endpoint 'https://api.openshift.com': endpoint probe failed for endpoint 'https://api.openshift.com' using proxy 'http://ip-172-17-1-38.ec2.internal:3128': Get "https://api.openshift.com": Service Unavailable). Use 'oc edit proxy.config.openshift.io cluster' to fix.

because the CNO pod runs on the management cluster and does not have connectivity to the customer's proxy which is accessible from the HyperShift worker nodes' network.

Version-Release number of selected component (if applicable):

4.14.6

How reproducible:

100%

Steps to Reproduce:

1. Create a proxy that's only accessible from a HyperShift cluster's workers network
2. Update the cluster's proxy.config.openshift.io cluster object accordingly
3. Observe that the network ClusterOperator is degraded

Actual results:

I'm not sure how important it is that the CNO has connectivity to api.openshift.com and leave it up for discussion. Maybe CNO should ignore the proxy configuration in HyperShift for its own health checks for example.

Expected results:

The network ClusterOperator is not degraded

Additional info:

https://github.com/openshift/hypershift/pull/4148

Bug MGMT-16335: vSphere disk UUID property not reported properly

View the Description View the linked PRs

Description of the problem:

We have a validation on vSphere that ensures the disk UUID property is set. However, the agent reports a fake disk in appliance mode, with the "hasUUID" property always set to false.

How reproducible:

100%

Steps to reproduce:

1. Try to install on vSphere

Actual results:

The UUID validation always fails

Expected results:

The UUID validation passes if the UUID property is set on the VM

https://github.com/openshift/assisted-installer-agent/pull/634

Bug OCPBUGS-34682: BareMetalHost CR gets stuck if delete before installing starts

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33048~~. The following is the description of the original issue:
—
Description of problem:

When you delete a cluster, or just a BMH, before the installation starts (Assisted Service takes the control), the metal3-operator tries to generate a PreprovisioningImage.

In previous versions, it was created a fix that, during some first installation phases the creation of the PreprovisioningImage was not invoked:

https://github.com/openshift/baremetal-operator/pull/262/files#diff-a69d9029388ab766ed36b32180145f52785a9d4a153775510dbddfa928a72e1cR787

it was based on the status "StateDeleting".

Recently, it was added a new status "StatePoweringOffBeforeDelete":

https://github.com/openshift/baremetal-operator/commit/6f65d8e75ef6ed921863ebaf793cccda61de8bcb#diff-eeed3703d04e4c23a7d7af8cd0b7931b6b7990f23d826c49bdbc31c5f0a50291

but this status is not covered on the previous fix. And during this new phase there should not be tried to create the image.

The problem of trying create the PreprovisioningImage, when it should not, it is that create problems on ZTP. Where the BMH and all the objects are deleted at the same time. And the operator cannot create the image because the NS is been deleted.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

    1.Create a cluster
    2.Wait until the provisioing phase
    3.Delete the cluster
    4.The metal3-operator tries to create the PreprovisioningImage wrongly.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/baremetal-operator/pull/358

Bug OCPBUGS-37196: [4.15] ovs-vswitchd is using isolated cpu pool instead of reserved pool

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36608~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35347~~. The following is the description of the original issue:
—
Description of problem:

OCP/RHCOS system daemon(s) like ovs-vswitchd (revalidator process) use the same vCPU (from isolated vCPU pool) that is already reserved by CPU Manager for CNF workloads, causing intermittent issues for CNF workloads performance (and also causing vCPU level overload). Note: NCP 23.11 uses CPU Manager with static policy and Topology Manager set to "single-numa-node". Also, specific isolated and reserved vCPU pools have been defined.

Version-Release number of selected component (if applicable):

4.14.22

How reproducible:

Intermittent at customer environment.

Steps to Reproduce:

1.
2.
3.

Actual results:

ovs-vswitchd is using isolated CPUs

Expected results:

ovs-vswitchd to use only  reserved CPUs

Additional info:

We want to understand if customer is hitting the bug:

  https://issues.redhat.com/browse/OCPBUGS-32407

This bug was fixed at 4.14.25. Customer cluster is 4.14.22. Customer is also asking if it is possible to get a private fix since they cannot update at the moment.

All case files have been yanked at both US and EU instances of Supportshell. In case case updates or attachments are not accessible please let me know.

https://github.com/openshift/ovn-kubernetes/pull/2235

Bug OCPBUGS-38537: PF4 chart color CSS variables not available in OpenShift 4.15+

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36816~~. The following is the description of the original issue:
—
Description of problem:

Dynamic plugins using PatternFly 4 could be referring to PF4 variables that do not exist in OpenShift 4.15+. Currently this is causing contrast issues for ACM in dark mode for donut charts.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

    1. Install ACM on OpenShift 4.15
    2. Switch to dark mode
    3. Observe Home > Overview page

Actual results:

 Some categories in the donut charts cannot be seen due to low contrast

Expected results:

 Colors should match those seen in OpenShift 4.14 and earlier

Additional info:

Also posted about this on Slack: https://redhat-internal.slack.com/archives/C011BL0FEKZ/p1720467671332249

Variables like --pf-chart-color-gold-300 are no longer provided, although the PF5 equivalent, --pf-v5-chart-color-gold-300, is available. The stylesheet @patternfly/patternfly/patternfly-charts.scss is present, but not the V4 version. Hopefully it is possible to also include these styles since the names now include a version.

https://github.com/openshift/console/pull/14158

Bug OCPBUGS-30270: Introduce --issuer-url flag in oc login

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30162~~. The following is the description of the original issue:
—
Description of problem:
Introduce --issuer-url flag in oc login .

Version-Release number of selected component (if applicable):

[xxia@2024-03-01 21:03:30 CST my]$ oc version --client
Client Version: 4.16.0-0.ci-2024-03-01-033249
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
[xxia@2024-03-01 21:03:50 CST my]$ oc get clusterversion
NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.ci-2024-02-29-213249   True        False         8h      Cluster version is 4.16.0-0.ci-2024-02-29-213249

How reproducible:

Always

Steps to Reproduce:

1. Launch fresh HCP cluster.

2. Login to https://entra.microsoft.com. Register application and set properly.

3. Prepare variables.
HC_NAME=hypershift-ci-267920
MGMT_KUBECONFIG=/home/xxia/my/env/xxia-hs416-2-267920-4.16/kubeconfig
HOSTED_KUBECONFIG=/home/xxia/my/env/xxia-hs416-2-267920-4.16/hypershift-ci-267920.kubeconfig
AUDIENCE=7686xxxxxx
ISSUER_URL=https://login.microsoftonline.com/64dcxxxxxxxx/v2.0
CLIENT_ID=7686xxxxxx
CLIENT_SECRET_VALUE="xxxxxxxx"
CLIENT_SECRET_NAME=console-secret

4. Configure HC without oauthMetadata.
[xxia@2024-03-01 20:29:21 CST my]$ oc create secret generic console-secret -n clusters --from-literal=clientSecret=$CLIENT_SECRET_VALUE --kubeconfig $MGMT_KUBECONFIG

[xxia@2024-03-01 20:34:05 CST my]$ oc patch hc $HC_NAME -n clusters --kubeconfig $MGMT_KUBECONFIG --type=merge -p="
spec:
  configuration: 
    authentication: 
      oauthMetadata:
        name: ''
      oidcProviders:
      - claimMappings:
          groups:
            claim: groups
            prefix: 'oidc-groups-test:'
          username:
            claim: email
            prefixPolicy: Prefix
            prefix:
              prefixString: 'oidc-user-test:'
        issuer:
          audiences:
          - $AUDIENCE
          issuerURL: $ISSUER_URL
        name: microsoft-entra-id
        oidcClients:
        - clientID: $CLIENT_ID
          clientSecret:
            name: $CLIENT_SECRET_NAME
          componentName: console
          componentNamespace: openshift-console
      type: OIDC
"

Wait pods to renew:
[xxia@2024-03-01 20:52:41 CST my]$ oc get po -n clusters-$HC_NAME --kubeconfig $MGMT_KUBECONFIG --sort-by metadata.creationTimestamp
...
certified-operators-catalog-7ff9cffc8f-z5dlg          1/1     Running   0          5h44m
kube-apiserver-6bd9f7ccbd-kqzm7                       5/5     Running   0          17m
kube-apiserver-6bd9f7ccbd-p2fw7                       5/5     Running   0          15m
kube-apiserver-6bd9f7ccbd-fmsgl                       5/5     Running   0          13m
openshift-apiserver-7ffc9fd764-qgd4z                  3/3     Running   0          11m
openshift-apiserver-7ffc9fd764-vh6x9                  3/3     Running   0          10m
openshift-apiserver-7ffc9fd764-b7znk                  3/3     Running   0          10m
konnectivity-agent-577944765c-qxq75                   1/1     Running   0          9m42s
hosted-cluster-config-operator-695c5854c-dlzwh        1/1     Running   0          9m42s
cluster-version-operator-7c99cf68cd-22k84             1/1     Running   0          9m42s
konnectivity-agent-577944765c-kqfpq                   1/1     Running   0          9m40s
konnectivity-agent-577944765c-7t5ds                   1/1     Running   0          9m37s

5. Check console login and oc login.
$ export KUBECONFIG=$HOSTED_KUBECONFIG
$ curl -ksS $(oc whoami --show-server)/.well-known/oauth-authorization-server
{
"issuer": "https://:0",
"authorization_endpoint": "https://:0/oauth/authorize",
"token_endpoint": "https://:0/oauth/token",
...
}
Check console login, it succeeds, console upper right shows correctly user name oidc-user-test:xxia@redhat.com.

Check oc login:
$ rm -rf ~/.kube/cache/oc/
$ oc login --exec-plugin=oc-oidc --client-id=$CLIENT_ID --client-secret=$CLIENT_SECRET_VALUE --extra-scopes=email --callback-port=8080
error: oidc authenticator error: oidc discovery error: Get "https://:0/.well-known/openid-configuration": dial tcp :0: connect: connection refused
error: oidc authenticator error: oidc discovery error: Get "https://:0/.well-known/openid-configuration": dial tcp :0: connect: connection refused
Unable to connect to the server: getting credentials: exec: executable oc failed with exit code 1

Actual results:

Console login succeeds. oc login fails.

Expected results:

oc login should also succeed.

Additional info:{}

https://github.com/openshift/oc/pull/1696

Bug OCPBUGS-33283: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14020

Bug OCPBUGS-33347: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-16079: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3067

Bug OCPBUGS-22276: docker.io rate limiting triggering issues with okd jobs

View the Description View the linked PRs

Description of problem:

8.1478  tagged from docker.io/openshift/wildfly-81-centos7:latest479    prefer registry pullthrough when referencing this tag480481  Build and run WildFly 8.1 applications on CentOS 7. For more information about using this builder image, including OpenShift considerations, see https://github.com/openshift-s2i/s2i-wildfly/blob/master/README.md.482  Tags: builder, wildfly, java483  Supports: wildfly:8.1, jee, java484  Example Repo: https://github.com/openshift/openshift-jee-sample.git485486  ! error: Import failed (Unauthorized): you may not have access to the container image "docker.io/openshift/wildfly-81-centos7:latest"487      20 minutes ago488489490error: imported completed with errors491[Mon Oct 23 15:23:32 UTC 2023] Retrying image import openshift/wildfly:10.1492error: tag latest failed: you may not have access to the container image "docker.io/openshift/wildfly-101-centos7:latest"493imagestream.image.openshift.io/wildfly imported with errors494495Name:			wildfly496Namespace:		openshift497Created:		21 minutes ago

Version-Release number of selected component (if applicable):

4.14 / 4.15

How reproducible:

Often on vSphere jobs, perhaps because they lack a local mirror?

Steps to Reproduce:

1.
2.
3.

Actual results:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/44127/rehearse-44127-periodic-ci-openshift-release-master-okd-scos-4.14-e2e-aws-ovn-serial/1716463869561409536

Expected results:

ci jobs run successfully

Additional info:

https://github.com/openshift/origin/pull/28347

Bug OCPBUGS-23458: OCP 4.14 Installation fails in environments where S3 versioning is enforced

View the Description View the linked PRs

Description of problem:

OCP 4.14 installation fails in AWS environments where S3 versioning is enforced. OCP 4.13 installs successfully in the same environment.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Use any native AWS ways to enforce Versioning on S3. AWS Config is easiest. This will enable versioning on S3 buckets after creation.  
2. Install OCP 4.13 on AWS just using the defaults. It will succeed.
3. Install OCP 4.14 on AWS just using the defaults. It will fail.

Actual results:

OCP 4.14 installation fails fatally.

Expected results:

OCP 4.14 installation succeeds just like OCP 4.13 installation. 
OR - if defaults are changed, provided documentation.

Additional info:

1. Related 4.14 feature : https://docs.openshift.com/container-platform/4.14/release_notes/ocp-4-14-release-notes.html#ocp-4-14-aws-s3-deletion - provides the ability to skip deletion of S3 buckets altogether. 
2. Attached OCP logs.
3. Strategic enterprise customers of managed services use data governance policies that enforce versioning, bucket policy etc that are blocked from installing

https://github.com/openshift/installer/pull/7791

Bug OCPBUGS-25233: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/77

Bug OCPBUGS-21594: mapi_current_pending_csr metric firing when non-mapi CSRs are present

View the Description View the linked PRs

Description of problem:

The MAPI metric mapi_current_pending_csr fires even when there are no pending MAPI CSRs. However, there are non-MAPI CSRs present. It may not be appropriately scoping this metric to only it's CSRs.

Version-Release number of selected component (if applicable):

Observed in 4.11.25

How reproducible:

Consistent

Steps to Reproduce:

1. Install a component that uses CSRs (like ACM) but leave the CSRs in a pending state
2. Observe metric firing
3.

Actual results:

Metric is firing

Expected results:

Metric only fires if there are MAPI specific CSRs pending

Additional info:

This impacts SRE alerting

https://github.com/openshift/cluster-machine-approver/pull/208

Bug OCPBUGS-21839: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/51

Bug OCPBUGS-31081: ovnkube-node doesn't refresh certificates after node was suspended for 30 days

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28742~~. The following is the description of the original issue:
—
Description of problem:

ovnkube-node doesn't issue a CSR to get new certificates when node is suspended for 30 days

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Setup a libvirt cluster on machine
    2. Disable chronyd on all nodes and host machine
    3. Suspend nodes
    4. Change time on host 30 days forward
    5. Resume nodes
    6. Wait for API server to come up
    7. Wait for all operators to become ready

Actual results:

ovnkube-node would attempt to use expired certs:  2024-01-21T01:24:41.576365431+00:00 stderr F I0121 01:24:41.573615    8852 master.go:740] Adding or Updating Node "test-infra-cluster-4832ebf8-master-0"
2024-04-20T01:25:08.519622252+00:00 stderr F I0420 01:25:08.516550    8852 services_controller.go:567] Deleting service openshift-operator-lifecycle-manager/packageserver-service
2024-04-20T01:25:08.900228370+00:00 stderr F I0420 01:25:08.898580    8852 services_controller.go:567] Deleting service openshift-operator-lifecycle-manager/packageserver-service
2024-04-20T01:25:17.137956433+00:00 stderr F I0420 01:25:17.137891    8852 obj_retry.go:296] Retry object setup: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:25:17.137956433+00:00 stderr F I0420 01:25:17.137933    8852 obj_retry.go:358] Adding new object: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:25:17.137997952+00:00 stderr F I0420 01:25:17.137979    8852 obj_retry.go:370] Retry add failed for *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp, will try again later: failed to obtain IPs to add remote pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: suppressed error logged: pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: no pod IPs found 
2024-04-20T01:25:19.099635059+00:00 stderr F I0420 01:25:19.099057    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-master-1
2024-04-20T01:25:19.099635059+00:00 stderr F I0420 01:25:19.099080    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-master-1: 35.077µs
2024-04-20T01:25:22.245550966+00:00 stderr F W0420 01:25:22.242774    8852 base_network_controller_namespace.go:458] Unable to remove remote zone pod's openshift-controller-manager/controller-manager-5485d88c84-xztxq IP address from the namespace address-set, err: pod openshift-controller-manager/controller-manager-5485d88c84-xztxq: no pod IPs found 
2024-04-20T01:25:22.262446336+00:00 stderr F W0420 01:25:22.261351    8852 base_network_controller_namespace.go:458] Unable to remove remote zone pod's openshift-route-controller-manager/route-controller-manager-6b5868f887-n6jj9 IP address from the namespace address-set, err: pod openshift-route-controller-manager/route-controller-manager-6b5868f887-n6jj9: no pod IPs found 
2024-04-20T01:25:27.154790226+00:00 stderr F I0420 01:25:27.154744    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-worker-0
2024-04-20T01:25:27.154790226+00:00 stderr F I0420 01:25:27.154770    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-worker-0: 31.72µs
2024-04-20T01:25:27.172301639+00:00 stderr F I0420 01:25:27.168666    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-master-2
2024-04-20T01:25:27.172301639+00:00 stderr F I0420 01:25:27.168692    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-master-2: 34.346µs
2024-04-20T01:25:27.196078022+00:00 stderr F I0420 01:25:27.194311    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-master-0
2024-04-20T01:25:27.196078022+00:00 stderr F I0420 01:25:27.194339    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-master-0: 40.027µs
2024-04-20T01:25:27.196078022+00:00 stderr F I0420 01:25:27.194582    8852 master.go:740] Adding or Updating Node "test-infra-cluster-4832ebf8-master-0"
2024-04-20T01:25:27.215435944+00:00 stderr F I0420 01:25:27.215387    8852 master.go:740] Adding or Updating Node "test-infra-cluster-4832ebf8-master-0"
2024-04-20T01:25:35.789830706+00:00 stderr F I0420 01:25:35.789782    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-worker-1
2024-04-20T01:25:35.790044794+00:00 stderr F I0420 01:25:35.790025    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-worker-1: 250.227µs
2024-04-20T01:25:37.596875642+00:00 stderr F I0420 01:25:37.596834    8852 iptables.go:358] "Running" command="iptables-save" arguments=["-t","nat"]
2024-04-20T01:25:47.138312366+00:00 stderr F I0420 01:25:47.138266    8852 obj_retry.go:296] Retry object setup: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:25:47.138382299+00:00 stderr F I0420 01:25:47.138370    8852 obj_retry.go:358] Adding new object: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:25:47.138453866+00:00 stderr F I0420 01:25:47.138440    8852 obj_retry.go:370] Retry add failed for *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp, will try again later: failed to obtain IPs to add remote pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: suppressed error logged: pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: no pod IPs found 
2024-04-20T01:26:17.138583468+00:00 stderr F I0420 01:26:17.138544    8852 obj_retry.go:296] Retry object setup: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:26:17.138640587+00:00 stderr F I0420 01:26:17.138629    8852 obj_retry.go:358] Adding new object: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:26:17.138708817+00:00 stderr F I0420 01:26:17.138696    8852 obj_retry.go:370] Retry add failed for *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp, will try again later: failed to obtain IPs to add remote pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: suppressed error logged: pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: no pod IPs found 
2024-04-20T01:26:39.474787436+00:00 stderr F I0420 01:26:39.474744    8852 reflector.go:790] k8s.io/client-go/informers/factory.go:159: Watch close - *v1.EndpointSlice total 130 items received
2024-04-20T01:26:39.475670148+00:00 stderr F E0420 01:26:39.475653    8852 reflector.go:147] k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.EndpointSlice: the server has asked for the client to provide credentials (get endpointslices.discovery.k8s.io)
2024-04-20T01:26:40.786339334+00:00 stderr F I0420 01:26:40.786255    8852 reflector.go:325] Listing and watching *v1.EndpointSlice from k8s.io/client-go/informers/factory.go:159
2024-04-20T01:26:40.806238387+00:00 stderr F W0420 01:26:40.804542    8852 reflector.go:535] k8s.io/client-go/informers/factory.go:159: failed to list *v1.EndpointSlice: Unauthorized
2024-04-20T01:26:40.806238387+00:00 stderr F E0420 01:26:40.804571    8852 reflector.go:147] k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Unauthorized

Expected results:

ovnkube-node detects that cert is expired, requests new certs via CSR flow and reloads them

Additional info:

CI periodic to check this flow: https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ovn-sno-cert-rotation-suspend-30d
artifacts contain sosreport

Applies to SNO and HA clusters, works as expected when nodes are being properly shutdown instead of suspended

https://github.com/openshift/ovn-kubernetes/pull/2099

Bug OCPBUGS-20164: builds.config.openshift.io CRD is available in a cluster with baselineCapabilitySet None

View the Description View the linked PRs

Description of problem:

a cluster installed with baselineCapabilitySet: None have build available while the build capability is disabled


❯ oc get -o json clusterversion version | jq '.spec.capabilities'                      
{
  "baselineCapabilitySet": "None"
}

❯ oc get -o json clusterversion version | jq '.status.capabilities.enabledCapabilities'
null

❯ oc get build -A                   
NAME      AGE
cluster   5h23m

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-04-143709

How reproducible:

100%

Steps to Reproduce:

1.install a cluster with baselineCapabilitySet: None

Actual results:

❯ oc get build -A                   
NAME      AGE
cluster   5h23m

Expected results:

❯ oc get -A build
error: the server doesn't have a resource type "build"

slack thread with more info: https://redhat-internal.slack.com/archives/CF8SMALS1/p1696527133380269

Bug OCPBUGS-24359: oc-mirror with v2 will create more data compared with v1 format

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/762

Bug OCPBUGS-36329: [4.15.z] SCC pinning for all workloads in platform namespaces (oc node debug pods)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the oc node debug pods.

https://github.com/openshift/oc/pull/1818

Bug OCPBUGS-9157: ‘Create Pod’ button should be disabled for normal user without any projects on pods list page

View the Description View the linked PRs

Description of problem:
An error message 'Restricted Access' and an 'Create Pod' button would be shown on Pods's page for a normal user without any project

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-03-04-063157

How reproducible:
Always

Steps to Reproduce:
1. Login in OCP with a normal user, navigate to Pods page
2. Check if 'No Pods found' message will be shown on page, and the 'Create Pod' button will be hidden
3.

Actual results:
2. An error message 'Restricted Access' and an enabled 'Create Pod' button would be shown on pod's page

Expected results:
2. Should show ‘No Pods found’ message
Hide 'Create Pod' button

Additional info:
The same behavior can be checked on 'Deployment, Stateful Set, Job, Service' page which is correct

https://github.com/openshift/console/pull/13040

Task HOSTEDCP-1480: Improve security posture of TLS cert hash

View the Description View the linked PRs

Update hash creation to use sha512 instead of sha1

Links

https://app.snyk.io/org/hypershift/project/4c9cffe4-d47c-473d-bea4-4daf4266d02d#issue-aa63e0f3-bed0-44e9-bf14-515127af79d7

https://github.com/openshift/hypershift/pull/4017

Bug OCPBUGS-19188: Update 4.15 ose-ibm-vpc-block-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver/pull/44

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/44

Bug OCPBUGS-29717: [4.15] Replace nodelister with master nodelister everywhere

View the Description View the linked PRs

Description of problem:

From profiling on cert rotation we know that the node informer is
called every couple of seconds on node heartbeats. This PR will ensure
that all our node listers only ever listen/inform on the master node
updates to reduce the frequency of unnecessary sync calls.

Also related to the issue, increasing the amount of node status updates:
OCPBUGS-29713
OCPBUGS-29424

Version-Release number of selected component (if applicable):

4.16 down to 4.12, we need to check all versions

How reproducible:

always

Steps to Reproduce:

    1. create a cluster
    2. look at some metric (eg sum(rate(apiserver_request_total{resource="nodes"}[5m]))))
    3. observe some improvement over previous state

Actual results:

increased amount of CPU usage for CEO / QPS to apiserver

Expected results:

less amount of CPU consumed for CEO / QPS to apiserver

Additional info:

already fixed in 4.16 with
https://github.com/openshift/cluster-etcd-operator/pull/1205

creating this ticket for backporting

https://github.com/openshift/cluster-etcd-operator/pull/1206

Bug OCPBUGS-32840: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1749

Bug OCPBUGS-19900: hybrid nodes have permissions error setting annotations

View the Description View the linked PRs

Description of problem: Updating the ovn-kubernetes submodules in the windows-machine-config-operator causes nodes to have permission errors setting annotations

E0927 19:37:53.178022    4932 kube.go:130] Error in setting annotation on node ci-op-56c3qr7h-8411c-wdmq9-e2e-wm-xs6sc: admission webhook "node.network-node-identity.openshift.io" denied the request: user "system:node:ci-op-56c3qr7h-8411c-wdmq9-e2e-wm-xs6sc" is not allowed to set the following annotations on node: "ci-op-56c3qr7h-8411c-wdmq9-e2e-wm-xs6sc": [k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac]

seen in
https://github.com/openshift/windows-machine-config-operator/pull/1836

https://github.com/openshift/ovn-kubernetes/pull/1919

Bug OCPBUGS-23378: PF5 bubble component with wrong layout in Create NetworkPolicy page

View the Description View the linked PRs

Description of problem:

The bubble box with wrong layout

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-11-16-110328

How reproducible:

Always

Steps to Reproduce:

1. Make sure there is no pod under your using project
2. navigate to Networking -> NetworkPolicies -> Create NetworkPolicy page, click the 'affected pods' in Pod selector section
3. Check the layout in the bubble component

Actual results:

the layout is in correct (shared file:https://drive.google.com/file/d/1I8e2ZkiFO2Gu4nSt9kJ6JmRG3LdvkE-u/view?usp=drive_link )

Expected results:

layout should correct

Additional info:

https://github.com/openshift/console/pull/13390

Bug OCPBUGS-27748: whereabouts reconciler schedule is not configurable

View the Description View the linked PRs

Description of problem:

whereabouts reconciler is responsible for reclaiming dangling IPs, and freeing them to be available to allocate to new pods.
This is crucial for scenarios where the amount of addresses are limited and dangling IPs prevent whereabouts from successfully allocating new IPs to new pods.

The reconciliation schedule is currently hard-coded to run once a day, without a user-friendly way to configure.

Version-Release number of selected component (if applicable):

How reproducible:

    Create a Whereabouts reconciler daemon set, not able to configure the reconciler schedule.

Steps to Reproduce:

    1. Create a Whereabouts reconciler daemonset
       instructions: https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-      network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network

     2. Run `oc get pods -n openshift-multus | grep whereabouts-reconciler`

     3. Run `oc logs whereabouts-reconciler-xxxxx`

Actual results:

    You can't configure the cron-schedule of the reconciler.

Expected results:

    Be able to modify the reconciler cron schedule.

Additional info:

    The fix for this bug is in two places: whereabouts, and cluster-network-operator.
    From this reason, in order to verify correctly we need to use both fixed components.
    Please read below for more details about how to apply the new configurations.

How to Verify:

    Create a whereabouts-config ConfigMap with a custom value, and check in the
    whereabouts-reconciler pods' logs that it is updated, and triggering the clean up.

Steps to Verify:

    1. Create a Whereabouts reconciler daemonset
    2. Wait for the whereabouts-reconciler pods to be running. (takes time for the daemonset to get created).
    3. See in logs: "[error] could not read file: <nil>, using expression from flatfile: 30 4 * * *"
       This means it uses the hardcoded default value. (Because no ConfigMap yet)
    4. Run: oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/2 * * * *"
    5. Check in the logs for: "successfully updated CRON configuration" 
    6. Check that in the next 2 minutes the reconciler runs: "[verbose] starting reconciler run"

Bug OCPBUGS-28946: openshift/openshift-controller-manager-operator - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28666~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/327

Bug OCPBUGS-32396: Azure upgrades to 4.14.15+ fail with UPI storage account

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32328~~. The following is the description of the original issue:
—
Description of problem:

Cluster with user provisioned image registry storage accounts fails to upgrade to 4.14.20 due to image-registry-operator being degraded.

message: "Progressing: The registry is ready\nNodeCADaemonProgressing: The daemon set node-ca is deployed\nAzurePathFixProgressing: Migration failed: panic: AZURE_CLIENT_ID is required for authentication\nAzurePathFixProgressing: \nAzurePathFixProgressing: goroutine 1 [running]:\nAzurePathFixProgressing: main.main()\nAzurePathFixProgressing: \t/go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:25 +0x15c\nAzurePathFixProgressing: "

cmd/move-blobs was introduced due to https://issues.redhat.com/browse/OCPBUGS-29003.

Version-Release number of selected component (if applicable):

4.14.15+

How reproducible:

I have not reproduced myself but I imagine you would hit this every time when upgrading from 4.13->4.14.15+ with Azure UPI image registry

Steps to Reproduce:

    1.Starting on version 4.13, Configuring the registry for Azure user-provisioned infrastructure - https://docs.openshift.com/container-platform/4.14/registry/configuring_registry_storage/configuring-registry-storage-azure-user-infrastructure.html.

    2.  Upgrade to 4.14.15+
    3.

Actual results:

    Upgrade does not complete succesfully 
$ oc get co
....
image-registry                             4.14.20        True        False         True       617d     AzurePathFixControllerDegraded: Migration failed: panic: AZURE_CLIENT_ID is required for authentication...

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.38   True        True          7h41m   Unable to apply 4.14.20: wait has exceeded 40 minutes for these operators: image-registry

Expected results:

Upgrade to complete successfully

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1022

Bug MGMT-15984: Assisted installer doesn't freeze and unmount file systems used for overwriting os image

View the Description View the linked PRs

Description of the problem:

Assisted installer doesn't freeze and unmount file systems used for overwriting os image.
This causes the file system to become corrupt.

How reproducible:

Always for ZTP flow.

Steps to reproduce:

1. Run ZTP with enable-skip-mco-reboot set to true

2.

3.

Actual results:

Installation fails. Host drops to emergency shell.

Expected results:

Successful installation.

https://github.com/openshift/assisted-installer/pull/737

Bug OCPBUGS-24678: ODF Dynamic plugin should not expose Server header

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24186~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13427

Bug OCPBUGS-29955: ServiceInstanceNameToGUID needs more debugging statements

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29895~~. The following is the description of the original issue:
—
Description of problem:

A user noticed on delete cluster that the IPI generated service instance was not cleaned up. Add more debugging statements to find out why.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Create cluster
    2. Delete cluster

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8065

Bug OCPBUGS-38108: Cannot create web-terminals as kubeadmin on OpenShift 4.15+

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36484~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-31685~~. The following is the description of the original issue:
—
See the bug reported here https://github.com/openshift/console/issues/13696

https://github.com/openshift/console/pull/14114

Bug OCPBUGS-17090: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-machine-approver/pull/200

Bug OCPBUGS-18785: sdn-controller should never try to a lease as "localhost.localdomain"

View the Description View the linked PRs

Description of problem:

During a highly escalated scenario, we have found the following scenario:
- Due to an unrelated problem, 2 control plane nodes had "localhost.localdomain" hostname when their respective sdn-controller pods started (this problem would be out of the scope of this bug report).
- As both sdn-controller pods had (and retained) the "localhost.localdomain" hostname, this caused both of them to use "localhost.localdomain" while trying to acquire and renew the controller lease in openshift-network-controller configmap.
- This ultimately caused both sdn-controller pods to mistakenly believe that they were the active sdn-controller, so both of them were active at the same time.

Such a situation might have a number of undesired (and unknown) side effects. In our case, the result was that two nodes were allocated the same hostsubnet, disrupting pod communication between the 2 nodes and with the other nodes.

What we expect from this bug report: That the sdn-controller never tries to acquire a lease as "localhost.localdomain" during a failure scenario. The ideal solution would be to acquire the lease in a way that avoids collisions (more on this on comments), but at the very least, sdn-controller should prefer crash-looping rather than starting with a lease that can collide and wreak havoc.

Version-Release number of selected component (if applicable):

Found on 4.11, but it should be reproducible in 4.13 as well.

How reproducible:

Under some error scenarios where 2 control plane nodes temporarily have "localhost.localdomain" hostname by mistake.

Steps to Reproduce:

1. Start sdn-controller pods
2.
3.

Actual results:

2 sdn-controller pods acquire the lease with "localhost.localdomain" holderIdentity and become active at the same time.

Expected results:

No sdn-controller pod to acquire the lease with "localhost.localdomain" holderIdentity. Either use unique identities even when there is failure scenario or just crash-loop.

Additional info:

Just FYI, the trigger that caused the wrong domain was investigated at this other bug: https://issues.redhat.com/browse/OCPBUGS-11997

However, this situation may happen under other possible failure scenarios, so it is worth preventing it somehow.

Bug OCPBUGS-19398: [IBMCloud] Add IPI support for new region eu-es (Madrid)

View the Description View the linked PRs

Description of problem:

IPI on IBM Cloud does not currently support the new eu-es region

Version-Release number of selected component (if applicable):

4.15

How reproducible:

100%

Steps to Reproduce:

1. Create install-config.yaml for IBM Cloud, per docs, using eu-es region
2. Create the manifests (or cluster) using IPI

Actual results:

level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.ibmcloud.region: Unsupported value: "eu-es": supported values: "us-south", "us-east", "jp-tok", "jp-osa", "au-syd", "ca-tor", "eu-gb", "eu-de", "br-sao"

Expected results:

Successful IBM Cloud OCP cluster in eu-es

Additional info:

IBM Cloud has started testing a potential fix, in eu-es to confirm supported cluster types (Public, Private, BYON) all work properly in eu-es

https://github.com/openshift/installer/pull/7668

Bug OCPBUGS-24339: Husky pre-commit task fails after latest update

View the Description View the linked PRs

After updating our husky dependency, the pre-commit hook might fail on some systems if their PATH env var is not properly configured:
{{}}

Running husky pre-commit hook...
frontend/.husky/pre-commit: line 6: lint-staged: command not found
husky - pre-commit hook exited with code 127 (error)
husky - command not found in PATH=<user path>

The PATH env var must include "./node_modules/.bin" for the husky pre-commit hook to work, which should be documented in the README.

Bug OCPBUGS-19528: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1126

Bug OCPBUGS-31764: gstreamer1 package dependency in network-tools creates legal concerns

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31699~~. The following is the description of the original issue:
—
Description of problem:

gstreamer1 package (and its plugins) include certain video/audio codecs, which create licensing concerns for our Partners, who embed our solutions (OCP) and deliver it to their end customers. 

ose-network-tools container image (seems applicable for all OCP releases) includes dependency to gstreamer1 rpm (and its plugin rpms, like gstreamer1-plugins-bad-free). The request is re-consider this dependency and if possible totally remove it. It is a blocking issue which prevents our partners to deliver their solution on the field.

It is an indirect dependency. ose-network-tools includes wireshark, wireshark has dependency to qt5-multimedia, which in turn includes dependency to gstreamer1-plugins-bad-free. 

First question: is wireshark really needed for network-tools? Wireshirk is a GUI tool, so dependency is not clear. 
Second question: would wireshark-cli be sufficient for needed purposes instead? Because CLI version does not contain dependency to qt5 and so on.

Version-Release number of selected component (if applicable):

    Seems applicable to all active OCP releases.

How reproducible:

    Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/network-tools/pull/118

Bug OCPBUGS-31814: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2131

Bug OCPBUGS-37816: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/308

Vulnerability OCPBUGS-44393: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/monitoring-plugin/pull/277

Bug OCPBUGS-29010: [release-4.15] oauthclients degraded condition never gets removed

View the Description View the linked PRs

Description of problem:

oauthclients degraded condition that never gets removed, meaning once its set due to an issue on a cluster, it wont be unset

Version-Release number of selected component (if applicable):

How reproducible:

Sporadically, when the AuthStatusHandlerFailedApply condition is set on the console operator status conditions.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/856

Bug OCPBUGS-31807: api-int Certificate Authority rotation during 4.14.17 to 4.15.3 update

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31384~~. The following is the description of the original issue:
—

Description of problem:

In a cluster updating from 4.5.11 through many intermediate versions to 4.14.17 and on to 4.15.3 (initiated 2024-03-18T07:33:11Z), multus pods are sad about api-int X.509:

$ tar -xOz inspect.local.5020316083985214391/namespaces/openshift-kube-apiserver/core/events.yaml <hivei01ue1.inspect.local.5020316083985214391.gz | yaml2json | jq -r '[.items[] | select(.reason == "FailedCreatePodSandBox")][0].message'
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-928-ip-10-164-221-242.ec2.internal_openshift-kube-apiserver_9e87f20b-471a-447e-9679-edce26b4ef78_0(8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c): error adding pod openshift-kube-apiserver_installer-928-ip-10-164-221-242.ec2.internal to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c Netns:/var/run/netns/6e2b0b10-5006-4bf9-bd74-17333e0cdceb IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-928-ip-10-164-221-242.ec2.internal;K8S_POD_INFRA_CONTAINER_ID=8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c;K8S_POD_UID=9e87f20b-471a-447e-9679-edce26b4ef78 Path: StdinData:[REDACTED]} ContainerID:"8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c" Netns:"/var/run/netns/6e2b0b10-5006-4bf9-bd74-17333e0cdceb" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-928-ip-10-164-221-242.ec2.internal;K8S_POD_INFRA_CONTAINER_ID=8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c;K8S_POD_UID=9e87f20b-471a-447e-9679-edce26b4ef78" Path:"" ERRORED: error configuring pod [openshift-kube-apiserver/installer-928-ip-10-164-221-242.ec2.internal] networking: Multus: [openshift-kube-apiserver/installer-928-ip-10-164-221-242.ec2.internal/9e87f20b-471a-447e-9679-edce26b4ef78]: error waiting for pod: Get "https://api-int.REDACTED:6443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-928-ip-10-164-221-242.ec2.internal?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority

Version-Release number of selected component (if applicable)

4.15.3, so we have 4.15.2's ~~OCPBUGS-30304~~ but not 4.15.5's ~~OCPBUGS-30237~~.

How reproducible

Seen in two clusters after updating from 4.14 to 4.15.3.

Steps to Reproduce

Unclear.

Actual results

Sad multus pods.

Expected results

Happy cluster.

Additional info

$ openssl s_client -showcerts -connect api-int.REDACTED:6443 < /dev/null
...
Certificate chain
 0 s:CN = api-int.REDACTED
   i:CN = openshift-kube-apiserver-operator_loadbalancer-serving-signer@1710747228
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Mar 25 19:35:55 2024 GMT; NotAfter: Apr 24 19:35:56 2024 GMT
...
 1 s:CN = openshift-kube-apiserver-operator_loadbalancer-serving-signer@1710747228
   i:CN = openshift-kube-apiserver-operator_loadbalancer-serving-signer@1710747228
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Mar 18 07:33:47 2024 GMT; NotAfter: Mar 16 07:33:48 2034 GMT
...

So that's created seconds after the update was initiated. We have inspect logs for some namespaces, but they don't go back quite that far, because the machine-config roll at the end of the update into 4.15.3 rolled all the pods:

$ tar -xOz inspect.local.5020316083985214391/namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-6cbfdd467c-4ctq7/kube-apiserver-operator/kube-apiserver-operator/logs/current.log <hivei01ue1.inspect.local.5020316083985214391.gz | head -n2
2024-03-18T08:22:05.058253904Z I0318 08:22:05.056255       1 cmd.go:241] Using service-serving-cert provided certificates
2024-03-18T08:22:05.058253904Z I0318 08:22:05.056351       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.

We were able to recover individual nodes via:

oc config new-kubelet-bootstrap-kubeconfig > bootstrap.kubeconfig from any machine with an admin kubeconfig
copy to all nodes as /etc/kubernetes/kubeconfig
on each node rm /var/lib/kubelet/kubeconfig
restart each node
approve each kubelet CSR
delete the node's multus-* pod.

Bug OCPBUGS-33627: Registry overrides are being propagated to some data plane components

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32257~~. The following is the description of the original issue:
—
Description of problem:

When using the registry-overrides flag to override registries for control plane components, it seems like the current implementation prpagates the override to some data plane components. 

It seems that certain components like multus, dns, and ingress get values for their containers' images from env vars set in operators on the control plane (cno/dns operator/konnectivity), and hence also get the overridden registry propagated to them.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1.Input a registry override through the HyperShift Operator
    2.Check registry fields for components on data plane
    3.

Actual results:

Data plane components that get registry values from env vars set in dns-operator, ingress-operator, cluster-network-operator, and cluster-node-tuning-operator get overridden registries.

Expected results:

overriden registries should not get propagated to data plane

Additional info:

https://github.com/openshift/hypershift/pull/4131

Bug OCPBUGS-33672: aws: ca-west-1 is missing quota support

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33649~~. The following is the description of the original issue:
—
Description of problem:

    The ca-west-1 region is missing from https://github.com/openshift/installer/blob/master/pkg/quota/aws/limits.go#L15

Version-Release number of selected component (if applicable):

    4.15+

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Quota checking is skipped as if it was not supported

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8412

Bug OCPBUGS-43669: prometheus pods can crash in rare scenarios

View the Description View the linked PRs

This is a clone of issue OCPBUGS-43668. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43667~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-43378. The following is the description of the original issue:
—
In https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-shiftstack-ci-release-4.18-e2e-openstack-ovn-etcd-scaling/1834144693181485056 I noticed the following panic:

 Undiagnosed panic detected in pod expand_less 	0s
{  pods/openshift-monitoring_prometheus-k8s-1_prometheus_previous.log.gz:ts=2024-09-12T09:30:09.273Z caller=klog.go:124 level=error component=k8s_client_runtime func=Errorf msg="Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3180480), concrete:(*abi.Type)(0x34a31c0), asserted:(*abi.Type)(0x3a0ac40), missingMethod:\"\"} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node)\ngoroutine 13218 [running]:\nk8s.io/apimachinery/pkg/util/runtime.logPanic({0x32f1080, 0xc05be06840})\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x90\nk8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc010ef6000?})\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b\npanic({0x32f1080?, 0xc05be06840?})\n\t/usr/lib/golang/src/runtime/panic.go:770 +0x132\ngithub.com/prometheus/prometheus/discovery/kubernetes.NewEndpoints.func11({0x34a31c0?, 0xc05bf3a580?})\n\t/go/src/github.com/prometheus/prometheus/discovery/kubernetes/endpoints.go:170 +0x4e\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:253\nk8s.io/client-go/tools/cache.(*processorListener).run.func1()\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/shared_informer.go:977 +0x9f\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00fc92f70, {0x456ed60, 0xc031a6ba10}, 0x1, 0xc015a04fc0)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc011678f70, 0x3b9aca00, 0x0, 0x1, 0xc015a04fc0)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f\nk8s.io/apimachinery/pkg/util/wait.Until(...)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161\nk8s.io/client-go/tools/cache.(*processorListener).run(0xc04c607440)\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52\ncreated by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 12933\n\t/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73\n"}

This issue seems relatively common on openstack, these runs seem to very frequently be this failure.

Linked test name: Undiagnosed panic detected in pod

https://github.com/openshift/prometheus/pull/233

Bug OCPBUGS-39136: monitor test pod-network-avalibility setup fails frequently on openstack

View the Description View the linked PRs

This is a clone of issue OCPBUGS-39135. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39134. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-31738. The following is the description of the original issue:
—
Description of problem:

The [Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility setup test frequently fails on OpenStack platform, which in turn also causes the [sig-network] can collect pod-to-service poller pod logs and [sig-network] can collect host-to-service poller pod logs tests to fail.

These failure happen frequently in vh-mecha, for example for all CSI jobs, such as 4.16-e2e-openstack-csi-cinder.

https://github.com/openshift/origin/pull/29053

Task MIXEDARCH-353: run yq from upi-installer in ipi-install-heterogeneous

View the Description View the linked PRs

In https://github.com/openshift/release/blob/master/ci-operator/step-registry/ipi/install/heterogeneous/ipi-install-heterogeneous-commands.sh#L37-L42, it is downlading yq-v4 from github and use it in the following step.

This will be a potential issue when multiple concurrent jobs are running on the same time, github would deny the access.

We hit ever such issues before, so we installed yq-3.3.0 in upi-installer image, refer to https://github.com/openshift/installer/blob/master/images/installer/Dockerfile.upi.ci.rhel8#L46-L50. Is it possible to migrate the codes to use yq-3.3.0 from upi-installer image?

Before we migrate a lot of ci jobs from arm and amd to multiarch ci, we need to resolve such issues.

cc Lin Wang

https://github.com/openshift/installer/pull/7567

Bug OCPBUGS-21636: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/483

Bug OCPBUGS-21718: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-proxy/pull/266

Bug OCPBUGS-18969: SNO fails install because image-registry operator is degraded - "Degraded: The registry is removed..."

View the Description View the linked PRs

Description of problem:

While installing many SNOs via ZTP using ACM, two SNOs failed to complete install because the image-registry was degraded during the install process.

# cat clusters | xargs -I % sh -c "echo '%'; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion"
vm01831
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         18h     Error while reconciling 4.14.0-rc.0: the cluster operator image-registry is degraded
vm02740
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         18h     Error while reconciling 4.14.0-rc.0: the cluster operator image-registry is degraded

# cat clusters | xargs -I % sh -c "echo '%'; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get co image-registry"
vm01831
NAME             VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.14.0-rc.0   True        False         True       18h     Degraded: The registry is removed...
vm02740
NAME             VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.14.0-rc.0   True        False         True       18h     Degraded: The registry is removed...

Both showed the image-pruner job pod in error state:
# cat clusters | xargs -I % sh -c "echo '%'; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get po -n openshift-image-registry"
vm01831
NAME                                               READY   STATUS    RESTARTS   AGE
cluster-image-registry-operator-5d497944d4-czn64   1/1     Running   0          18h
image-pruner-28242720-w6jmv                        0/1     Error     0          18h
node-ca-vtfj8                                      1/1     Running   0          18h
vm02740
NAME                                               READY   STATUS    RESTARTS      AGE
cluster-image-registry-operator-5d497944d4-lbtqw   1/1     Running   1 (18h ago)   18h
image-pruner-28242720-ltqzk                        0/1     Error     0             18h
node-ca-4fntj                                      1/1     Running   0             18h

Version-Release number of selected component (if applicable):

Deployed SNO OCP - 4.14.0-rc.0
Hub 4.13.11
ACM - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

Rare, only 2 clusters were found in this state after the test

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Seems like some permissions might have been lacking:

# oc --kubeconfig /root/hv-vm/kc/vm01831/kubeconfig logs -n openshift-image-registry image-pruner-28242720-w6jmv
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #1 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #2 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #3 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #4 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found
attempt #5 has failed (exit code 1), going to make another attempt...
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found

Bug OCPBUGS-25701: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1994

Bug OCPBUGS-22020: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/985

Bug OCPBUGS-30792: Power VS: Cannot deploy with service IDs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30641~~. The following is the description of the original issue:
—
Description of problem:

    When deploying with a service ID, the installer is unable to query resource groups.

Version-Release number of selected component (if applicable):

    4.13-4.16

How reproducible:

    Easily

Steps to Reproduce:

    1. Create a service ID with seemingly enough permissions to do an IPI install
    2. Deploy to power vs with IPI
    3. Fail

Actual results:

    Fail to deploy a cluster with service ID

Expected results:

    cluster create should succeed

Additional info:

https://github.com/openshift/installer/pull/8138

Bug OCPBUGS-35047: Update owners file of multus repo

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34924~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34911~~. The following is the description of the original issue:
—
We need to add more people to the owners file of multus repo.

https://github.com/openshift/multus-cni/pull/240

Bug OCPBUGS-19096: Update 4.15 ose-olm-operator-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-operator-controller/pull/26

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-operator-controller/pull/27

Bug OCPBUGS-20350: Vsphere IPI installation is getting failed with panic: runtime error: invalid memory address or nil pointer dereference

View the Description View the linked PRs

Description of problem:

Vsphere IPI installation is getting failed with panic: runtime error: invalid memory address or nil pointer dereference

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Download 4.13 installation binary
2. Run openshift-install create cluster command.

Actual results:

Error:

DEBUG   Generating Platform Provisioning Check...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x3401c4e]goroutine 1 [running]:
github.com/openshift/installer/pkg/asset/installconfig/vsphere.validateESXiVersion(0xc001524060?, {0xc00018aff0, 0x43}, 0x1?, 0x1?)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/vsphere/validation.go:279 +0xb6e
github.com/openshift/installer/pkg/asset/installconfig/vsphere.validateFailureDomain(0xc001524060, 0xc00022c840, 0x0)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/vsphere/validation.go:167 +0x6b6
github.com/openshift/installer/pkg/asset/installconfig/vsphere.ValidateForProvisioning(0xc0003d4780)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/vsphere/validation.go:132 +0x675
github.com/openshift/installer/pkg/asset/installconfig.(*PlatformProvisionCheck).Generate(0xc0000f2000?, 0x5?)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/platformprovisioncheck.go:112 +0x45f
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000925e90, {0x1dc012d0, 0x2279afa8}, {0x7c34091, 0x2})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:226 +0x5fa
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000925e90, {0x1dc01090, 0x22749ce0}, {0x0, 0x0})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:220 +0x75b
github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffe670305f1?, {0x1dc01090, 0x22749ce0}, {0x227267a0, 0x8, 0x8})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:76 +0x48
main.runTargetCmd.func1({0x7ffe670305f1, 0x6})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:260 +0x125
main.runTargetCmd.func2(0x2272da00?, {0xc000925410?, 0x3?, 0x3?})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:290 +0xe7
github.com/spf13/cobra.(*Command).execute(0x2272da00, {0xc000925380, 0x3, 0x3})
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:920 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc000210900)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:1040 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:968
main.installerMain()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0
main.main()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff

Expected results:

Installation to be completed successfully.

Additional info:

https://github.com/openshift/installer/pull/7575

Bug OCPBUGS-34613: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2393

Bug OCPBUGS-34997: [4.15] The secrets-store-csi-driver with AWS provider integration does not work in HyperShift hosted cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34759~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-18711~~. The following is the description of the original issue:
—
Description of problem:

secrets-store-csi-driver with AWS provider does not work in HyperShift hosted cluster, pod can't mount the volume successfully.

Version-Release number of selected component (if applicable):

secrets-store-csi-driver-operator.v4.14.0-202308281544 in 4.14.0-0.nightly-2023-09-06-235710 HyperShift hosted cluster.

How reproducible:

Always

Steps to Reproduce:

1. Follow test case OCP-66032 "Setup" part to install secrets-store-csi-driver-operator.v4.14.0-202308281544 , secrets-store-csi-driver and AWS provider successfully:

$ oc get po -n openshift-cluster-csi-drivers
NAME                                                READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-node-7xxgr                       3/3     Running   0          5h18m
aws-ebs-csi-driver-node-fmzwf                       3/3     Running   0          5h18m
aws-ebs-csi-driver-node-rgrxd                       3/3     Running   0          5h18m
aws-ebs-csi-driver-node-tpcxq                       3/3     Running   0          5h18m
csi-secrets-store-provider-aws-2fm6q                1/1     Running   0          5m14s
csi-secrets-store-provider-aws-9xtw7                1/1     Running   0          5m15s
csi-secrets-store-provider-aws-q5lvb                1/1     Running   0          5m15s
csi-secrets-store-provider-aws-q6m65                1/1     Running   0          5m15s
secrets-store-csi-driver-node-4wdc8                 3/3     Running   0          6m22s
secrets-store-csi-driver-node-n7gkj                 3/3     Running   0          6m23s
secrets-store-csi-driver-node-xqr52                 3/3     Running   0          6m22s
secrets-store-csi-driver-node-xr24v                 3/3     Running   0          6m22s
secrets-store-csi-driver-operator-9cb55b76f-7cbvz   1/1     Running   0          7m16s

2. Follow test case OCP-66032 steps to create AWS secret, set up AWS IRSA successfully.

3. Follow test case OCP-66032 steps SecretProviderClass, deployment with the secretProviderClass successfully. Then check pod, pod is stuck in ContainerCreating:

$ oc get po
NAME                               READY   STATUS              RESTARTS   AGE
hello-openshift-84c76c5b89-p5k4f   0/1     ContainerCreating   0          10m

$ oc describe po hello-openshift-84c76c5b89-p5k4f
...
Events:
  Type     Reason       Age   From               Message
  ----     ------       ----  ----               -------
  Normal   Scheduled    11m   default-scheduler  Successfully assigned xxia-proj/hello-openshift-84c76c5b89-p5k4f to ip-10-0-136-205.us-east-2.compute.internal
  Warning  FailedMount  11m   kubelet            MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 92d1ff5b-36be-4cc5-9b55-b12279edd78e
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 50907328-70a6-44e0-9f05-80a31acef0b4
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 617dc3bc-a5e3-47b0-b37c-825f8dd84920
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 8ab5fc2c-00ca-45e2-9a82-7b1765a5df1a
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: b76019ca-dc04-4e3e-a305-6db902b0a863
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: b395e3b2-52a2-4fc2-80c6-9a9722e26375
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: ec325057-9c0a-4327-80c9-a9b6233a64dd
  Warning  FailedMount  10m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 405492b2-ed52-429b-b253-6a7c098c26cb
  Warning  FailedMount  82s (x5 over 9m35s)  kubelet  Unable to attach or mount volumes: unmounted volumes=[secrets-store-inline], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
  Warning  FailedMount  74s (x5 over 9m25s)  kubelet  (combined from similar events): MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
  status code: 400, request id: c38bbed1-012d-4250-b674-24ab40607920

Actual results:

Hit above stuck issue.

Expected results:

Pod should be Running.

Additional info:

Compared another operator (cert-manager-operator) which also uses AWS IRSA: OCP-62500 , that case works well. So secrets-store-csi-driver-operator has bug.

https://github.com/openshift/hypershift/pull/4157

Bug OCPBUGS-25234: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-operator/pull/83

Bug OCPBUGS-30761: [release-4.15] Update MCO to create static pods for crio kube-rbac-proxy

View the Description View the linked PRs

Description of problem:


Add code to MCO to create kube-rbac-proxies for crio's metrics port.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4252

Bug OCPBUGS-38170: Rebase CPO on 4.15

View the Description View the linked PRs

release-4.15 of openshift/cloud-provider-openstack is missing some commits that were backported in upstream project into the release-1.28 branch.
We should import them in our downstream fork.

https://github.com/openshift/cloud-provider-openstack/pull/274

Bug OCPBUGS-27934: panic in poller

View the Description View the linked PRs

This is a clone of issue OCPBUGS-27892. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28550

Bug OCPBUGS-29679: TaskRuns list page is loading constantly for all projects

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29363~~. The following is the description of the original issue:
—
Description of problem:

    1. TaskRuns list page is loading constantly for all projects
    2. Archive icon is not displayed for some tasks in TaskRun list page
    3. On change of ns to All Projects, PipelineRuns and TaskRuns are not loading properly

Version-Release number of selected component (if applicable):

    4.15.z

How reproducible:

    Always

Steps to Reproduce:

    1.Create some TaskRun
    2.Go to TaskRun list page
    3.Select all project in project dropdown

Actual results:

Screen is keep on loading

Expected results:

     Should load TaskRuns from all projects

Additional info:

https://github.com/openshift/console/pull/13618

Bug OCPBUGS-35820: 4.15 server does not have PodMetrics/NodeMetrics

View the Description View the linked PRs

Description of problem:

checked with 4.15.0-0.nightly-2023-12-11-033133, there are not PodMetrics/NodeMetrics in server

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2023-12-11-033133   True        False         122m    Cluster version is 4.15.0-0.nightly-2023-12-11-033133

$ oc api-resources | grep -i metrics
nodes                                                                                                                        metrics.k8s.io/v1beta1                        false        NodeMetrics
pods                                                                                                                         metrics.k8s.io/v1beta1                        true         PodMetrics

$ oc explain PodMetrics
the server doesn't have a resource type "PodMetrics"
$ oc explain NodeMetrics
the server doesn't have a resource type "NodeMetrics"

$ oc get NodeMetrics
error: the server doesn't have a resource type "NodeMetrics"
$ oc get PodMetrics -A
error: the server doesn't have a resource type "PodMetrics"

no issue with 4.14.0-0.nightly-2023-12-11-135902

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-12-11-135902   True        False         88m     Cluster version is 4.14.0-0.nightly-2023-12-11-135902

$ oc api-resources | grep -i metrics
nodes                                                                                                                        metrics.k8s.io/v1beta1                        false        NodeMetrics
pods                                                                                                                         metrics.k8s.io/v1beta1                        true         PodMetrics

$ oc explain PodMetrics
GROUP:      metrics.k8s.io
KIND:       PodMetrics
VERSION:    v1beta1DESCRIPTION:
    PodMetrics sets resource usage metrics of a pod.
...

$ oc explain NodeMetrics
GROUP:      metrics.k8s.io
KIND:       NodeMetrics
VERSION:    v1beta1DESCRIPTION:
    NodeMetrics sets resource usage metrics of a node.
...

$ oc get PodMetrics -A
NAMESPACE                                          NAME                                                                       CPU    MEMORY      WINDOW
openshift-apiserver                                apiserver-65f777466-4m8nj                                                  9m     297512Ki    5m0s
openshift-apiserver                                apiserver-65f777466-g7n72                                                  10m    313308Ki    5m0s
openshift-apiserver                                apiserver-65f777466-xzd8l                                                  12m    293008Ki    5m0s
openshift-apiserver-operator                       openshift-apiserver-operator-54945b8bbd-bxkcj                              3m     119264Ki    5m0s
...

$ oc get NodeMetrics
NAME                                        CPU     MEMORY      WINDOW
ip-10-0-20-163.us-east-2.compute.internal   765m    8349848Ki   5m0s
ip-10-0-22-189.us-east-2.compute.internal   388m    5363132Ki   5m0s
ip-10-0-41-231.us-east-2.compute.internal   1274m   7243548Ki   5m0s
...

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-11-033133

How reproducible:

always

Steps to Reproduce:

1. see the description

Actual results:

4.15 server does not have PodMetrics/NodeMetrics

Expected results:

should have

https://github.com/openshift/k8s-prometheus-adapter/pull/109

Story WRKLDS-908: Promote some experimental commands, deprecate others

View the Description View the linked PRs

Some commands have been here for so long and used regularly they are considered GA. Some commands are no longer that useful.

Bug OCPBUGS-18105: [IBM VPC] failed provisioning volume in proxy cluster

View the Description View the linked PRs

Description of problem:

IBM VPC CSI Driver failed to provisioning volume in proxy cluster, (if I understand correctly) it seems the proxy in not injected because in our definition (https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/controller.yaml), we are injecting proxy to csi-driver:
    config.openshift.io/inject-proxy: csi-driver
    config.openshift.io/inject-proxy-cabundle: csi-driver
but the container name is iks-vpc-block-driver in https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/controller.yaml#L153

I checked the proxy in not defined in controller pod or driver container ENV.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Create IBM cluster with proxy setting
2. create pvc/pod with IBM VPC CSI Driver

Actual results:

It failed to provisioning volume

Expected results:

Provisioning volume works well on proxy cluster

Additional info:

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/43

Bug OCPBUGS-19287: Update 4.15 openshift-enterprise-haproxy-router image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/513

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/513

Bug OCPBUGS-19861: Multus annotation permissions: CNO should configure 24h cert for multus

View the Description View the linked PRs

Description of problem: Multus currently implements a certificate that exists for 10 minutes, we need to add configuration for certificates for 24 hours

https://github.com/openshift/cluster-network-operator/pull/2039

Bug OCPBUGS-26076: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/109

Bug OU-318: [release 4-15] Tables in dashboards are not merging multiple query responses correctly

View the Description View the linked PRs

A table in a dashboard relies on the order of the metric labels to merge results

How to Reproduce:

Create a dashboard with a table including this query:

label_replace(sort_desc(sum(sum_over_time(ALERTS{alertstate="firing"}[24h])) by ( alertstate, alertname)), "aaa", "$1", "alertstate", "(.+)")

A single row will be displayed as the query is simulating that the first label `aaa` has a single value.

Expected result:

The table should not rely on a single metric label to merge results but consider all the labels so the expected rows are displayed.

https://github.com/openshift/monitoring-plugin/pull/97

Bug OCPBUGS-13206: 'customPythonDeploymentConfig' YAML script to create a Pipeline not working

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13149

Bug OCPBUGS-24166: Update 4.15 ose-azure-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/98

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/98

Bug OCPBUGS-31387: vsphere: when esxi host is offline no version is present

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27848~~. The following is the description of the original issue:
—
When a esxi host is in maintenance mode the installer is unable to query the hosts' version causing validation to fail.

time="2024-01-04T05:30:45-05:00" level=fatal msg="failed to fetch Terraform Variables: failed to fetch dependency of \"Terraform Variables\": failed to generate asset \"Platform Provisioning Check\": platform.vsphere: Internal error: vCenter is failing to retrieve config product version information for the ESXi host: "

https://github.com/openshift/installer/blob/0d56a06e02343e6603128e3f58c6c9bbc2edea3d/pkg/asset/installconfig/vsphere/validation.go#L247-L319

https://github.com/openshift/installer/pull/8208

Bug OCPBUGS-36279: [Backport-4.15] TestAllowedSourceRangesStatus expected the annotation to be reflected in status.allowedSourceRanges flake

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35883~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35368~~. The following is the description of the original issue:
—
Description of problem:

TestAllowedSourceRangesStatus test is flaking with the error:

allowed_source_ranges_test.go:197: expected the annotation to be reflected in status.allowedSourceRanges: timed out waiting for the condition

I also notice it sometimes coincides with a TestScopeChange error. It may be related updating LoadBalancer type operations, for example, https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/978/pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator/1800249453098045440

Version-Release number of selected component (if applicable):

4.17

How reproducible:

~25-50%

Steps to Reproduce:

1. Run cluster-ingress-operator TestAllowedSourceRangesStatus E2E tests
2.
3.

Actual results:

Test is flaking

Expected results:

Test shouldn't flake

Additional info:

Example Flake

Search.CI Link

https://github.com/openshift/cluster-ingress-operator/pull/1096

Story TRT-1347: Disable Image Registry Disruption Monitoring When Not HA

View the Description View the linked PRs

~~OCPBUGS-18596~~ and ~~OCPBUGS-22382~~ track issues on metal and vsphere jobs with disruption for image registry. By default image registry is not enabled for these platforms but is enabled, in a non HA manor, for the tests. During discussion around the issue it was decided that unless / until these teams support HA deployments of image registry we should not be monitoring them for disruption.

Devan floated the idea of checking to see if the image registry deployment set has replicas enabled and if not then selectively disable disruption monitoring.

https://github.com/openshift/origin/pull/28425

Bug OCPBUGS-19018: sdn container failing to start on okd-scos

View the Description View the linked PRs

using metal-ipi on 4.14 the cluster is failing to come up,

the network cluster-operator is failing to start, the sdn pod shows the error

bash: RHEL_VERSION: unbound variable

https://github.com/openshift/cluster-network-operator/pull/2003

Bug OCPBUGS-19264: Update 4.15 ose-aws-pod-identity-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-pod-identity-webhook/pull/167

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-pod-identity-webhook/pull/167

Bug OCPBUGS-27329: Remove NCv2 series from azure doc tested_instance_types_x86_64

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27192~~. The following is the description of the original issue:
—
Description of problem:

Based on Azure doc [1], NCv2 series Azure virtual machines (VMs) are retired on September 6, 2023. VM could not be provisioned on those instance types.

So remove standardNCSv2Family from azure doc tested_instance_types_x86_64 on 4.13+.

[1] https://learn.microsoft.com/en-us/azure/virtual-machines/ncv2-series

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. cluster is installed failed on NCv2 series instance type 
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7921

Bug OCPBUGS-27571: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/668

Bug OCPBUGS-29117: [IBMCloud] Unhandled response during destroy disks

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20085~~. The following is the description of the original issue:
—
Description of problem:

During the destroy cluster operation, unexpected results from the IBM Cloud API calls for Disks can result in panics when response data (or responses) are missing, resulting in unexpected failures during destroy.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Unknown, dependent on IBM Cloud API responses

Steps to Reproduce:

1. Successfully create IPI cluster on IBM Cloud
2. Attempt to cleanup (destroy) the cluster

Actual results:

Golang panic attempting to parse a HTTP response that is missing or lacking data.


level=info msg=Deleted instance "ci-op-97fkzvv2-e6ed7-5n5zg-master-0"
E0918 18:03:44.787843      33 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 228 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x6a3d760?, 0x274b5790})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xfffffffe?})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x6a3d760, 0x274b5790})
	/usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion.func1()
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:84 +0x12a
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).Retry(0xc000791ce0, 0xc000573700)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:99 +0x73
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion(0xc000791ce0, {{0xc00160c060, 0x29}, {0xc00160c090, 0x28}, {0xc0016141f4, 0x9}, {0x82b9f0d, 0x4}, {0xc00160c060, ...}})
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:78 +0x14f
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDisks(0xc000791ce0)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:118 +0x485
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1()
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:201 +0x3f
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x7f7801e503c8, 0x18})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:109 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x227a2f78?, 0xc00013c000?}, 0xc000a9b690?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:154 +0x57
k8s.io/apimachinery/pkg/util/wait.poll({0x227a2f78, 0xc00013c000}, 0xd0?, 0x146fea5?, 0x7f7801e503c8?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:245 +0x38
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x227a2f78, 0xc00013c000}, 0x4136e7?, 0x28?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:229 +0x49
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x100000000000000?, 0x806f00?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:214 +0x46
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc000791ce0, {{0x82bb9a3?, 0xc000a9b7d0?}, 0xc000111de0?}, 0x840366?, 0xc00054e900?)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:198 +0x108
created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:172 +0xa87
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference

Expected results:

Destroy IBM Cloud Disks during cluster destroy, or provide a useful error message to follow up on.

Additional info:

The ability to reproduce is relatively low, as it requires the IBM Cloud API's to return specific data (or lack there of), which is currently unknown why the HTTP respoonse and/or data is missing.

IBM Cloud already has a PR to attempt to mitigate this issue, like done with other destroy resource calls. Potentially followup for additional resources as necessary.
https://github.com/openshift/installer/pull/7515

https://github.com/openshift/installer/pull/7984

Bug OCPBUGS-29236: Installer creates CPMS incorrectly for vSphere IPI when static IPs are configured

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29114~~. The following is the description of the original issue:
—
Description of problem:

When installing a new vSphere cluster with static IPs, control plane machine sets (CPMS) are also enabled in TechPreviewNoUpgrade and the installer applies the incorrect config to the CPMS resulting in masters being recreated.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

always

Steps to Reproduce:

1. create install-config.yaml with static IPs following documentation
2. run `openshift-install create cluster`
3. as install progresses, watch the machines definitions

Actual results:

new master machines are created

Expected results:

all machines are the same as what was created by the installer.

Additional info:

Bug OCPBUGS-31377: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3803

Bug OCPBUGS-36863: Hosted Control Plane deployment fails as ignition server fails to provide ignition payload.

View the Description View the linked PRs

Description of problem:

HCP deployment fails with error:

Hardware:
> Disk encryption requirements: Missing ignition information.

It is observed that installation is stuck as ignition is not available.

When checked the ignition server logs observed the below error message.

~~~ 2024-07-04T13:27:53.272449113Z {"level":"info","ts":"2024-07-04T13:27:53Z","logger":"get-payload","msg":"machine-config-operator process completed","time":"0s","output":"FIPS mode is enabled, but the required OpenSSL backend is unavailable\n"}
2024-07-04T13:27:53.291079347Z {"level":"error","ts":"2024-07-04T13:27:53Z","msg":"Reconciler error","controller":"secret","controllerGroup":"","controllerKind":"Secret","Secret":{"name":"token-nodepool-example-1-2efdaff7","namespace":"example-example"},"namespace":"example-example","name":"token-nodepool-example-1-2efdaff7","reconcileID":"4b82d54e-5a0b-4551-95e0-800dffb6bc26","error":"error getting ignition payload: failed to execute machine-config-operator: machine-config-operator process failed: exit status 1
~~~

Version-Release number of selected component (if applicable):

How reproducible:

    100 %

Steps to Reproduce:

    1. Deploy HCP cluster from ACM
    2. Check the node status
    3. Check the ignition server pod logs

Actual results:

    installation get stuck

Expected results:

    installation should be completed

Additional info:

https://github.com/openshift/machine-config-operator/pull/4476

Story METAL-726: update libraries versions in ironic containers for OCP 4.15

View the linked PRs

https://github.com/openshift/ironic-image/pull/412

Bug OCPBUGS-24125: Update 4.15 ose-cluster-baremetal-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-baremetal-operator/pull/392

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-baremetal-operator/pull/392

Bug OCPBUGS-25552: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/886

Bug OCPBUGS-27225: PrometheusOperatorRejectedResources alert fires on Hypershift clusters with user-defined monitoring

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18115~~. The following is the description of the original issue:
—
Description of problem:

After enabling user-defined monitoring on an HyperShift hosted cluster, PrometheusOperatorRejectedResources starts firing.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Start an hypershift-hosted cluster with cluster-bot
2. Enable user-defined monitoring
3.

Actual results:

PrometheusOperatorRejectedResources alert becomes firing

Expected results:

No alert firing

Additional info:

Need to reach out to the HyperShift folks as the fix should probably be in their code base.

Bug OCPBUGS-35201: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1798

Bug OCPBUGS-18776: Test case failure- OpenShift alerting rules [apigroup:image.openshift.io] should have description and summary annotations

View the Description View the linked PRs

Description of problem:

Test case failure- OpenShift alerting rules [apigroup:image.openshift.io] should have description and summary annotations
The obtained response seems to have unmarshalling errors. 

Failed to fetch alerting rules: unable to parse response 

invalid character 's' after object key

Expected output- The response should be proper and the unmarshalling should have worked

Openshift Version- 4.13 & 4.14

Cloud Provider/Platform- PowerVS

Prow Job Link/Must gather path- https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.13-ocp-e2e-ovn-ppc64le-powervs/1700992665824268288/artifacts/ocp-e2e-ovn-ppc64le-powervs/

https://github.com/openshift/origin/pull/28320

Bug OCPBUGS-20533: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2074

Bug OCPBUGS-29548: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-nutanix/pull/69

Bug OCPBUGS-29089: Make controllerAvailabilityPolicy field immutable

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27282~~. The following is the description of the original issue:
—
Description of problem:

    We need to make controllerAvailabilityPolicy field inmutable in the HostedCluster spec section to ensure the customer cannot go from/to SingleReplica to HighAvailability.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3533

Bug OCPBUGS-31045: Singleline syntax for inline code snippet

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25931~~. The following is the description of the original issue:
—

Description of problem:

Single line execute markdown reference is not working.

Steps to Reproduce

Install Web Terminal Operator
Create a ConsoleQuickStart resource using "copy-execute-demo.yaml" file
Open "Sample Quick Start" from quick start catalog page

Actual results:

The inline code is not getting rendered properly specifically for single line execute syntax.

Expected results:

The inline code should show a code block with a small execute icon to run the commands in web terminal

https://github.com/openshift/console/pull/13681

Bug OCPBUGS-18859: Update 4.15 ironic image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-image/pull/397

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-image/pull/397

Bug OCPBUGS-22217: Upstream OLM flaky-e2e-tests suite failing

View the Description View the linked PRs

Description of problem:

The flaky-e2e-test suite has been failing consistently due to some changes made to how the test environments are set up in each test. Two tests in particular have been failing and need to be fixed:
[FLAKE] should clear up the condition in the InstallPlan status that contains an error message when a valid OperatorGroup is created"
[FLAKE] consistent generation

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Run flaky-e2e-test suite

Actual results:

Tests never pass

Expected results:

Tests pass at least a majority of the time

Additional info:

https://github.com/openshift/operator-framework-olm/pull/595

Bug OCPBUGS-24079: Update 4.15 openshift-state-metrics-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-state-metrics/pull/111

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-state-metrics/pull/111

Bug OCPBUGS-24680: [release-4.15] [UI] Console fields and warning to support Azure Workload Identity are not showing up

View the Description View the linked PRs

Description of problem:

    CCO supports creating a credentials request in manual mode to specify the fields required to perform short term authentication using workload identity federation but the console fields and warnings that are supposed to be present are not.

Version-Release number of selected component (if applicable):

How reproducible:

    create a catalog containing a bundle that has the annotation to support WIF and apply it to an oidc manual azure cluster.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    No warnings or additional field options for subscription are present

Expected results:

    Warnings and additional fields for subscription should be present

Additional info:

https://github.com/openshift/console/pull/13428

Bug OCPBUGS-22069: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-openstack/pull/89

Bug OCPBUGS-33294: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2163

Bug OCPBUGS-37524: [4.15.z] SCC pinning for all workloads in platform namespaces (openshift-marketplace)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the openshift-marketplace.

Namespaces with workloads that need pinning:

openshift-marketplace

See 4.17 PR for more info on what needs pinning.

https://github.com/operator-framework/operator-marketplace/pull/570

Bug MGMT-13425: [Assisted-4-12][Staging] Installation fails to complete for 3+1 while 1 worker in Error

View the Description View the linked PRs

Description of the problem:

Installation fails to complete for 3+1 while 1 worker in Error

CVO cannot complete installation

1/22/2023, 10:00:33 PMOperator cvo status: progressing message: Unable to apply 4.12.0: the cluster operator machine-api is not available
1/22/2023, 9:56:33 PMOperator cvo status: progressing message: Unable to apply 4.12.0: some cluster operators are not available
1/22/2023, 9:56:33 PMOperator console status: available message: All is well
1/22/2023, 9:55:13 PMUpdated status of the cluster to finalizing

[Detected by regression test: test_delete_host_during_installation_success]

How reproducible:

100%, started a while ago.

Steps to reproduce:

1. Start Install 3+1

2. Once worker start its installation, kill worker's agent

3. Worker got to Error and installation continues

Actual results:

CVO fails to install, eventually timing up and cluster ends with a failure

Expected results:

Cluster completed installation with 1 failed worker

https://github.com/openshift/machine-api-operator/pull/1165

Task MON-3548: Bump KSM to v2.10.1

View the Description View the linked PRs

Bump KSM to the latest v2.10.1 release that addresses a regression in the previous upstream release as well as builds with a newer Golang patch version (v1.20.8).

Bug OCPBUGS-19163: Update 4.15 ose-cloud-network-config-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-network-config-controller/pull/122

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-network-config-controller/pull/122

Bug OCPBUGS-19265: Update 4.15 ose-azure-cloud-node-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/85

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/85

Bug OCPBUGS-5491: [Azure] EgressIP cannot be applied to the egress node on Azure private cluster

View the Description View the linked PRs

Description of problem:

The issue was found in ci, and it is an Azure private cluster, all the egressIP cases failed due to  EgressIP cannot be applied to the egress node. It was able to be reproduced manually.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-01-08-142418

How reproducible:

Always

Steps to Reproduce:

1. Label one worker node as egress node
2. Create one egressIP object
3.

Actual results:

% oc get egressip
NAME             EGRESSIPS    ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-2       10.0.1.10                    
egressip-47164   10.0.1.217 

% oc get cloudprivateipconfig 
NAME         AGE
10.0.1.10    18m
10.0.1.217   22m
% oc get cloudprivateipconfig  -o yaml
apiVersion: v1
items:
- apiVersion: cloud.network.openshift.io/v1
  kind: CloudPrivateIPConfig
  metadata:
    annotations:
      k8s.ovn.org/egressip-owner-ref: egressip-2
    creationTimestamp: "2023-01-09T10:11:33Z"
    finalizers:
    - cloudprivateipconfig.cloud.network.openshift.io/finalizer
    generation: 1
    name: 10.0.1.10
    resourceVersion: "59723"
    uid: d697568a-7d7c-471a-b5e1-d7b814244549
  spec:
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
  status:
    conditions:
    - lastTransitionTime: "2023-01-09T10:17:06Z"
      message: 'Error processing cloud assignment request, err: network.InterfacesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: Code="OutboundRuleCannotBeUsedWithBackendAddressPoolThatIsReferencedBySecondaryIpConfigs"
        Message="OutboundRule /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/outboundRules/outbound-rule-v4
        cannot be used with Backend Address Pool /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/backendAddressPools/huirwang-0109b-bv4ld
        that contains Secondary IPConfig /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/networkInterfaces/huirwang-0109b-bv4ld-worker-eastus1-llmpb-nic/ipConfigurations/huirwang-0109b-bv4ld-worker-eastus1-llmpb_10.0.1.10"
        Details=[]'
      observedGeneration: 1
      reason: CloudResponseError
      status: "False"
      type: Assigned
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
- apiVersion: cloud.network.openshift.io/v1
  kind: CloudPrivateIPConfig
  metadata:
    annotations:
      k8s.ovn.org/egressip-owner-ref: egressip-47164
    creationTimestamp: "2023-01-09T10:07:56Z"
    finalizers:
    - cloudprivateipconfig.cloud.network.openshift.io/finalizer
    generation: 1
    name: 10.0.1.217
    resourceVersion: "58333"
    uid: 6a7d6196-cfc9-4859-9150-7371f5818b74
  spec:
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
  status:
    conditions:
    - lastTransitionTime: "2023-01-09T10:13:29Z"
      message: 'Error processing cloud assignment request, err: network.InterfacesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: Code="OutboundRuleCannotBeUsedWithBackendAddressPoolThatIsReferencedBySecondaryIpConfigs"
        Message="OutboundRule /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/outboundRules/outbound-rule-v4
        cannot be used with Backend Address Pool /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/backendAddressPools/huirwang-0109b-bv4ld
        that contains Secondary IPConfig /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/networkInterfaces/huirwang-0109b-bv4ld-worker-eastus1-llmpb-nic/ipConfigurations/huirwang-0109b-bv4ld-worker-eastus1-llmpb_10.0.1.217"
        Details=[]'
      observedGeneration: 1
      reason: CloudResponseError
      status: "False"
      type: Assigned
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
kind: List
metadata:
  resourceVersion: ""

Expected results:

EgressIP can be applied correctly

Additional info:

https://github.com/openshift/cloud-network-config-controller/pull/121

Bug OCPBUGS-32796: [4.15] unable to logout from cluster with external OIDC provider

View the Description View the linked PRs

Description of problem:

   As a logged in user Im unable to logout from cluster with external OIDC provider.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Login into cluster with external OIDC setup
    2.
    3.

Actual results:

    Unable to logout

Expected results:

    Logout successfully

Additional info:

https://github.com/openshift/console/pull/13793

Bug OCPBUGS-33224: ROSA HCP openshift-controller-manager controllers for image registry are not disabled when managementState is Removed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32110~~. The following is the description of the original issue:
—
Description of problem:

When using OpenShift 4.15 on ROSA Hosted Control Planes, after disabling the ImageRegistry, the default secrets and service accounts are still being created.

This functionality should not be occurring once the registry is removed:

https://docs.openshift.com/rosa/nodes/pods/nodes-pods-secrets.html#auto-generated-sa-token-secrets_nodes-pods-secrets

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

    1. Deploy ROSA 4.15 HCP Cluster
    2. Set spec.managementState = "Removed" on the cluster.config.imageregistry.operator.openshift.io. The image registry will be removed
    3. Create a new OpenShift Project
    4. Observe the builder, default and deployer ServiceAccounts and their associated Secrets are still created

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3976

Bug OCPBUGS-19279: Update 4.15 ose-etcd image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/etcd/pull/215

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/etcd/pull/215

Bug OSASINFRA-3283: MAPO: missing port profile when converting to CAPO v1alpha7

View the linked PRs

https://github.com/openshift/machine-api-provider-openstack/pull/93

Bug OCPBUGS-34252: [backport 4.15] TestMTLSWithCRLs and TestCRLUpdate failing due to removed container image

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31722~~. The following is the description of the original issue:
—
Description of problem:

    The image quay.io/centos7/httpd-24-centos7 used in TestMTLSWithCRLs and TestCRLUpdate is no longer being rebuilt, and has had its 'latest' tag removed. Containers using this image fail to start, and cause the tests to fail.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    Run 'TEST="(TestMTLSWithCRLs|TestCRLUpdate)" make test-e2e' from the cluster-ingress-operator repo

Actual results:

    Both tests and all their subtests fail

Expected results:

    Tests pass

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/1058

Bug OU-312: the monitoring-plugin Dockerfile used by the CI is misconfigured

View the Description View the linked PRs

This PR fixed a bug related to the nginx default assets directory in the Dockerfile . This was not backported to 4.15, which causes OCP consoles launched with ci images, like cluster bot to fail to display the observe menu. Backporting fixes the issue for 4.15 ci images.

https://github.com/openshift/monitoring-plugin/pull/93

Bug OCPBUGS-19166: Update 4.15 csi-driver-manila-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-manila-operator/pull/204

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-manila-operator/pull/204

Bug OCPBUGS-24310: Update 4.15 ose-csi-driver-shared-resource-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/157

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/157

Bug OCPBUGS-27346: No warning that TechPreview is not supported by agent installer

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19054~~. The following is the description of the original issue:
—
Description of problem:

The agent-based installer does not support the TechPreviewNoUpgrade featureSet, and by extension nor does it support any of the features gated by it. Because of this, there is no warning about one of these features being specified - we expect the TechPreviewNoUpgrade feature gate to error out when any of them are used.

However, we don't warn about TechPreviewNoUpgrade itself being ignored, so if the user does specify it then they can use some of these non-supported features without being warned that their configuration is ignored.

We should fail with an error when TechPreviewNoUpgrade is specified, until such time as AGENT-554 is implemented.

https://github.com/openshift/installer/pull/7922

Bug OCPBUGS-19792: OVN-Kubernetes node webhook does not allow to set k8s.ovn.org/node-mgmt-port and k8s.ovn.org/gateway-mtu-support

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1911

Bug OCPBUGS-25978: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/71

Bug OCPBUGS-19746: Add a network validation to avoid overlapping when you define KAS Advertise Address

View the Description View the linked PRs

Description of problem:

When deploying a HostedCluster and you defines a KAS AdvertiseAddress, it could conflict with the current deployment overlapping with the other networks like Service, Cluster or Machine network, causing a deployment failure.

Version-Release number of selected component (if applicable):

latest

https://github.com/openshift/hypershift/pull/3047

Bug OCPBUGS-23308: vSphere ExcludeNetworkSubnetCIDR does not include fd69::2/128 for IPv6-only setups

View the Description View the linked PRs

As part of ~~OCPBUGS-18641~~ we have created a code that appends internal OVN-K8s subnet `fd69::2/128` to the `ExcludeNetworkSubnetCIDR` list for dual-stack installations.

What has been discovered now is that for IPv6-only clusters this network is not present on this list even though it should be.

This is causing vSphere IPv6-only setups to work incorrectly.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/299

Bug OCPBUGS-24117: Update 4.15 ose-baremetal-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-operator/pull/323

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-operator/pull/323

Bug OCPBUGS-30876: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/214

Bug OCPCLOUD-2297: CCPMSO: Missing etcd block device size validation

View the Description View the linked PRs

If a user changes the providerSpec in CPMS to add an block device for etcd, we need to check that the size is valid or it can result into unhealthy clusters.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/264

Bug OCPBUGS-37768: MCD degraded on content mismatch for resolv-prepender script

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37767~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-30192~~. The following is the description of the original issue:
—
I took a look at Component Readiness today and noticed that "[sig-cluster-lifecycle] cluster upgrade should complete in a reasonable time" is permafailing. I modified the sample start time to see that is appears to have started around February 19th.

Is this expected with 4.16 or do we have a problem?

https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Cluster%20Version%20Operator&confidence=95&environment=ovn%20upgrade-minor%20amd64%20metal-ipi%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-03-04%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-01-16%2000%3A00%3A00&testId=Cluster%20upgrade%3A0bf7638bc532109d8a7a3c395e2867da&testName=%5Bsig-cluster-lifecycle%5D%20cluster%20upgrade%20should%20complete%20in%20a%20reasonable%20time&upgrade=upgrade-minor&upgrade=upgrade-minor&variant=standard&variant=standard

Component Readiness has found a potential regression in [sig-cluster-lifecycle] cluster upgrade should complete in a reasonable time.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.16
Start Time: 2024-02-27T00:00:00Z
End Time: 2024-03-04T23:59:59Z
Success Rate: 0.00%
Successes: 0
Failures: 4
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 100.00%
Successes: 47
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Cluster%20Version%20Operator&confidence=95&environment=ovn%20upgrade-minor%20amd64%20metal-ipi%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-03-04%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-02-27%2000%3A00%3A00&testId=Cluster%20upgrade%3A0bf7638bc532109d8a7a3c395e2867da&testName=%5Bsig-cluster-lifecycle%5D%20cluster%20upgrade%20should%20complete%20in%20a%20reasonable%20time&upgrade=upgrade-minor&upgrade=upgrade-minor&variant=standard&variant=standard

https://github.com/openshift/machine-config-operator/pull/4506

Bug OCPBUGS-21826: Add warning if managmentState is not managed for csi operator

View the Description View the linked PRs

We should warn loudly in logs when customers change managmentState of a CSI operator rather than logging with lower level log messages.

I spend non-trivial amount of time debugging a cluster where CSI driver won't get installed, only to find out that customer has somehow set managmentState to Removed.

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/178

Bug OCPBUGS-24099: Update 4.15 ose-cluster-csi-snapshot-controller-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/176

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/176

Bug OCPBUGS-24311: Update 4.15 ose-cluster-api-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api/pull/188

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api/pull/188

Bug OCPBUGS-29105: [Custom DNS] Failed to generate coredns.yaml manifest

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28969~~. The following is the description of the original issue:
—
Description of problem:

Bootstrap process failed due to coredns.yaml manifest generation issue:

Feb 04 05:14:34 yunjiang-p2-2r2b2-bootstrap bootkube.sh[11219]: I0204 05:14:34.966343       1 bootstrap.go:188] manifests/on-prem/coredns.yaml
Feb 04 05:14:34 yunjiang-p2-2r2b2-bootstrap bootkube.sh[11219]: F0204 05:14:34.966513       1 bootstrap.go:188] error rendering bootstrap manifests: failed to execute template: template: manifests/on-prem/coredns.yaml:34:32: executing "manifests/on-prem/coredns.yaml" at <onPremPlatformAPIServerInternalIPs .ControllerConfig>: error calling onPremPlatformAPIServerInternalIPs: invalid platform for API Server Internal IP
Feb 04 05:14:35 yunjiang-p2-2r2b2-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=255/EXCEPTION
Feb 04 05:14:35 yunjiang-p2-2r2b2-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-02-03-192446
4.16.0-0.nightly-2024-02-03-221256

How reproducible:

Always

Steps to Reproduce:

    1. 1. Enable custom DNS on GCP: platform.gcp.userProvisionedDNS:Enabled and featureSet:TechPreviewNoUpgrade
    2.
    3.

Actual results:

coredns.yaml can not be generated, bootstrap failed.

Expected results:

Bootstrap process succeeds.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4169

Bug OCPBUGS-29796: catalogd crash loops after etcd restore

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29453~~. The following is the description of the original issue:
—
Description of problem:

The etcd team has introduced an e2e test that exercises a full etcd backup and restore cycle in OCP [1].

We run those tests as part of our PR builds and since 4.15 [2] (also 4.16 [3]), we have failed runs with the catalogd-controller-manager crash looping:

1 events happened too frequently
event [namespace/openshift-catalogd node/ip-10-0-25-29.us-west-2.compute.internal pod/catalogd-controller-manager-768bb57cdb-nwbhr hmsg/47b381d71b - Back-off restarting failed container manager in pod catalogd-controller-manager-768bb57cdb-nwbhr_openshift-catalogd(aa38d084-ecb7-4588-bd75-f95adb4f5636)] happened 44 times}


I assume something in that controller doesn't really deal gracefully with the restoration process of etcd, or the apiserver being down for some time.


[1] https://github.com/openshift/origin/blob/master/test/extended/dr/recovery.go#L97

[2] https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1205/pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-etcd-recovery/1757443629380538368

[3] https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1191/pull-ci-openshift-cluster-etcd-operator-release-4.15-e2e-aws-etcd-recovery/1752293248543494144

Version-Release number of selected component (if applicable):

> 4.15

How reproducible:

always by running the test

Steps to Reproduce:

Run the test:

[sig-etcd][Feature:DisasterRecovery][Suite:openshift/etcd/recovery][Timeout:2h] [Feature:EtcdRecovery][Disruptive] Recover with snapshot with two unhealthy nodes and lost quorum [Serial]     

and observe the event invariant failing on it crash looping

Actual results:

catalogd-controller-manager crash loops and causes our CI jobs to fail

Expected results:

our e2e job is green again and catalogd-controller-manager doesn't crash loop

Additional info:

https://github.com/openshift/operator-framework-catalogd/pull/44

Bug OCPBUGS-36066: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/72

Bug OCPBUGS-26511: The default channel is not correct

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26048~~. The following is the description of the original issue:
—
Description of problem:

The default channel of 4.15, 4.16 clusters is stable-4.14.

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-01-03-193825

How reproducible:

Always

Steps to Reproduce:

    1. Install a 4.16 cluster
    2. Check default channel
# oc adm upgrade 
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.16.0-0.nightly-2024-01-03-193825 not found in the "stable-4.14" channel

Cluster version is 4.16.0-0.nightly-2024-01-03-193825

Upgradeable=False

  Reason: MissingUpgradeableAnnotation
  Message: Cluster operator cloud-credential should not be upgraded between minor versions: Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade.

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.14

    3.

Actual results:

Default channel is stable-4.14 in a 4.16 cluster

Expected results:

Default channel should be stable-4.16 in a 4.16 cluster

Additional info:

4.15 cluster has the issue as well.

https://github.com/openshift/installer/pull/7883

Bug OCPBUGS-28214: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2223

Bug OCPBUGS-29310: HCP CSR Allows Invalid CNs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29103~~. The following is the description of the original issue:
—
Description of problem:

    The HCP CSR flow allows any CN in the incoming CSR.

Version-Release number of selected component (if applicable):

    4.16.0

How reproducible:

    Using the CSR flow, any name you add to the CN in the CSR will be your username against the Kubernetes API server - check your username using the SelfSubjectRequest API (kubectl auth whoami)

Steps to Reproduce:

    1.create CSR with CN=whatever
    2.CSR signed, create kubeconfig
    3.using kubeconfig, kubectl auth whoami should show whatever CN

Actual results:

    any CN in CSR is the username against the cluster

Expected results:

    we should only allow CNs with some known prefix (system:customer-break-glass:...)

Additional info:

https://github.com/openshift/hypershift/pull/3558

Bug OCPBUGS-29591: Install on vSphere using relative path for datastore is not backwards compatible on OCP 4.13+

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22410~~. The following is the description of the original issue:
—
Description of problem:

Original issue reported here: https://issues.redhat.com/browse/ACM-6189 reported by QE and customer.

Using ACM/hive, customers can deploy Openshift on vSphere. In the upcoming release of ACM 2.9, we support customers on OCP 4.12 - 4.15. ACM UI updates the install config as users add configurations details.

This has worked for several releases over the last few years. However in OCP 4.13+ the format has changed and there is now additional validation to check if the datastore is a full path.

As per https://issues.redhat.com/browse/SPLAT-1093, removal of the legacy fields should not happen until later, so any legacy configurations such as relative paths should still work.

Version-Release number of selected component (if applicable):

ACM 2.9.0-DOWNSTREAM-2023-10-24-01-06-09
OpenShift 4.14.0-rc.7
OpenShift 4.13.18
OpenShift 4.12.39

How reproducible:

Always

Steps to Reproduce:

1. Deploy OCP 4.12 on vSphere using legacy field and relative path without folder (e.g. platform.vsphere.defaultDatastore: WORKLOAD-DS
2. Installer passes.
3. Deploy OCP 4.12 on vSphere using legacy field and relative path WITH folder (e.g. platform.vsphere.defaultDatastore: WORKLOAD-DS-Folder/WORKLOAD-DS
4. Installer fails.
5. Deploy OCP 4.12 on vSphere using legacy field and FULL path (e.g. platform.vsphere.defaultDatastore: /Workload Datacenter/datastore/WORKLOAD-DS-Folder/WORKLOAD-DS 
6. Installer fails.

7. Deploy OCP 4.13 on vSphere using legacy field and relative path without folder (e.g. platform.vsphere.defaultDatastore: WORKLOAD-DS
8. Installer fails.
9. Deploy OCP 4.13 on vSphere using legacy field and relative path WITH folder (e.g. platform.vsphere.defaultDatastore: WORKLOAD-DS-Folder/WORKLOAD-DS 
10. Installer passes. 
11. Deploy OCP 4.13 on vSphere using legacy field and FULL path (e.g. platform.vsphere.defaultDatastore: /Workload Datacenter/datastore/WORKLOAD-DS-Folder/WORKLOAD-DS 
12. Installer fails.

Actual results:

Default Datastore Value	OCP 4.12	OCP 4.13	OCP 4.14
`/Workload Datacenter/datastore/WORKLOAD-DS-Folder/WORKLOAD-DS`	No	Yes	Yes
`WORKLOAD-DS-Folder/WORKLOAD-DS`	No	Yes	Yes
`WORKLOAD-DS`	Yes	No	No

For OCP 4.12.z managed clusters deployments name-only path is the only one that works as expected.
For OCP 4.13.z+ managed cluster deployments only full name and relative path with folder works as expected.

Expected results:

OCP 4.13.z+ takes relative path without specifying the folder like OCP 4.12.z does.

Additional info:

https://github.com/openshift/installer/pull/8034

Bug OCPBUGS-30161: Test failure in upgrade jobs- [bz-Image Registry] clusteroperator/image-registry should not change condition/Available

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27323~~. The following is the description of the original issue:
—
Description of problem:

Observing the following test case failure in 4.14 to 4.15 and 4.15 to 4.16 upgrade CI runs continuously.
[bz-Image Registry] clusteroperator/image-registry should not change condition/Available

JobLink:https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-ppc64le/1746834772249808896

4.14 Image: registry.ci.openshift.org/ocp-ppc64le/release-ppc64le:4.14.0-0.nightly-ppc64le-2024-01-15-085349
4.15 Image: registry.ci.openshift.org/ocp-ppc64le/release-ppc64le:4.15.0-0.nightly-ppc64le-2024-01-15-042536

https://github.com/openshift/origin/pull/28633

Bug OCPBUGS-32055: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2313

Bug OCPBUGS-32501: Pipeline details page Metrics tab crashed due to no custom data

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31809~~. The following is the description of the original issue:
—
Description of problem:

    While doing the migration of Pipeline details page, it is expecting customData from Details page - https://github.com/openshift/console/blob/master/frontend/packages/pipelines-plugin/src/components/pipelines/pipeline-metrics/PipelineMetrics.tsx        but in horizontalnav component exposed to dynamic plugin, we don't have customData prop. https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#horizontalnav

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1. Pipeline details page PR to be up for testing this[WIP][Story - https://issues.redhat.com/browse/ODC-7525]     2. Install Pipelines Operator and don't install Tekton result
    3. Enabled Pipeline details page in dynamic plugin
    4. create a pipeline and go to Metrics tab in details page

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13781

Bug OCPBUGS-20152: Nodes being marked degraded due to /etc/docker/certs.d not being found

View the Description View the linked PRs

This bug focuses on the /etc/docker/certs.d not found issue that is causing nodes to be marked degraded occasionally.

As a result of fixing https://issues.redhat.com/browse/OCPBUGS-19722 , I noticed a few additional logs in the controller where it was failing to get controllerconfig during cluster installation.

I1005 08:32:43.003013 1 container_runtime_config_controller.go:417] Error syncing image config openshift-config: could not get ControllerConfig controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
I1005 08:32:44.284624 1 container_runtime_config_controller.go:417] Error syncing image config openshift-config: could not get ControllerConfig controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
.I1005 08:32:46.735315 1 render_controller.go:377] Error syncing machineconfigpool master: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
I1005 08:32:46.735386 1 render_controller.go:377] Error syncing machineconfigpool worker: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
I1005 08:32:46.755690 1 render_controller.go:377] Error syncing machineconfigpool master: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
I1005 08:32:46.755751 1 render_controller.go:377] Error syncing machineconfigpool worker: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found

I also noticed these on the daemon logs, but they seem to exist prior to the fix made in the above PR.

E1004 15:10:37.497119   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory
E1004 15:10:38.807323   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory
E1004 15:10:41.392855   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory
E1004 15:10:46.544369   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory
E1004 15:10:56.815668   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory

This manifests as the following in the controller:

I1005 08:32:54.162695       1 status.go:126] Degraded Machine: ip-10-0-89-70.us-east-2.compute.internal and Degraded Reason: open /etc/docker/certs.d: no such file or directoryI1005 08:32:54.162712       1 status.go:126] Degraded Machine: ip-10-0-1-133.us-east-2.compute.internal and Degraded Reason: open /etc/docker/certs.d: no such file or directoryI1005 08:32:54.162724       1 status.go:126] Degraded Machine: ip-10-0-60-194.us-east-2.compute.internal and Degraded Reason: open /etc/docker/certs.d: no such file or directoryI1005 08:32:54.174177       1 kubelet_config_features.go:118] Applied FeatureSet cluster on MachineConfigPool master

None of these seem fatal, they seem to show up in installation and go away as the installation completes. We may end up needing to do nothing as this could be a completely harmless timing issue, but it does seem worth taking a closer look at. I'll attach the full log to this bug.

https://github.com/openshift/machine-config-operator/pull/4358

Bug OCPBUGS-22778: All resources' yaml tab show TypeError after MCE operator is installed

View the Description View the linked PRs

This bug fix is in conjunction with https://issues.redhat.com/browse/OCPBUGS-16736

https://github.com/openshift/console/pull/13346

Bug OCPBUGS-25460: Private endpoint creation does not work on cluster created with minimal permissions

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/978

Bug OCPBUGS-29230: Upgrade from 4.13.13 to 4.14rc2 failed at 250 nodes.

View the Description View the linked PRs

Description of problem:
While upgrading a loaded 250 node ROSA cluster from 4.13.13 to 4.14.rc2 the cluster failed to upgrade and was stuck at when network operator was trying
to upgrade.
Around 20 multus pods were in CrashLookpack state with the log

oc logs multus-4px8t
2023-10-10T00:54:34+00:00 [cnibincopy] Successfully copied files in /usr/src/multus-cni/rhel9/bin/ to /host/opt/cni/bin/upgrade_6dcb644a-4164-42a5-8f1e-4ae2c04dc315
2023-10-10T00:54:34+00:00 [cnibincopy] Successfully moved files in /host/opt/cni/bin/upgrade_6dcb644a-4164-42a5-8f1e-4ae2c04dc315 to /host/opt/cni/bin/
2023-10-10T00:54:34Z [verbose] multus-daemon started
2023-10-10T00:54:34Z [verbose] Readiness Indicator file check
2023-10-10T00:55:19Z [error] have you checked that your default network is ready? still waiting for readinessindicatorfile @ /host/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition

https://github.com/openshift/ovn-kubernetes/pull/2060

Bug OCPBUGS-42934: Errors when the image registry is configured to use a custom Azure storage account located in a different resource group blocked the upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42933~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42812~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42514. The following is the description of the original issue:
—
Description of problem:

When configuring the OpenShift image registry to use a custom Azure storage account in a different resource group, following the official documentation [1], the image-registy CO degrade and upgrade from version 4.14.x to 4.15.x fails. The image registry operator reports misconfiguration errors related to Azure storage credentials, preventing the upgrade and causing instability in the control plane.

[1] Configuring registry storage in Azure user infrastructure

Version-Release number of selected component (if applicable):

   4.14.33, 4.15.33

How reproducible:

Set up ARO:

- Deploy an ARO or OpenShift cluster on Azure, version 4.14.x.

Configure Image Registry:

- Follow the official documentation [1] to configure the image registry to use a custom Azure storage account located in a different resource group.
- Ensure that the image-registry-private-configuration-user secret is created in the openshift-image-registry namespace.
- Do not modify the installer-cloud-credentials secret.

Check the image registry CO status
Initiate Upgrade:

- Attempt to upgrade the cluster to OpenShift version 4.15.x.

Steps to Reproduce:

If we have the image-registry-private-configuration-user inplace and installer-cloud-credentials with no modified

We got the error

    NodeCADaemonProgressing: The daemon set node-ca is deployed Progressing: Unable to apply resources: unable to sync storage configuration: client misconfigured, missing 'TenantID', 'ClientID', 'ClientSecret', 'FederatedTokenFile', 'Creds', 'SubscriptionID' option(s)

The oeprator will also genreate a new secret image-registry-private-configuration with the same content as image-registry-private-configuration-user

$ oc get secret  image-registry-private-configuration -o yaml
apiVersion: v1
data:
  REGISTRY_STORAGE_AZURE_ACCOUNTKEY: xxxxxxxxxxxxxxxxx
kind: Secret
metadata:
  annotations:
    imageregistry.operator.openshift.io/checksum: sha256:524fab8dd71302f1a9ade9b152b3f9576edb2b670752e1bae1cb49b4de992eee
  creationTimestamp: "2024-09-26T19:52:17Z"
  name: image-registry-private-configuration
  namespace: openshift-image-registry
  resourceVersion: "126426"
  uid: e2064353-2511-4666-bd43-29dd020573fe
type: Opaque

2. then we delete the secret image-registry-private-configuration-user

now the secret image-registry-private-configuration will still exisit with the same content, but image-registry CO got a new error

NodeCADaemonProgressing: The daemon set node-ca is deployed Progressing: Unable to apply resources: unable to sync storage configuration: failed to get keys for the storage account arojudesa: storage.AccountsClient#ListKeys: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Storage/storageAccounts/arojudesa' under resource group 'aro-ufjvmbl1' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"

3. apply the workaround to manually changeing the secret installer-cloud-credentials azure_resourcegroup key with custom storage account resourcegroup

$ oc get secret installer-cloud-credentials -o yaml
apiVersion: v1
data:
  azure_client_id: xxxxxxxxxxxxxxxxx
  azure_client_secret: xxxxxxxxxxxxxxxxx
  azure_region: xxxxxxxxxxxxxxxxx
  azure_resource_prefix: xxxxxxxxxxxxxxxxx
  azure_resourcegroup: xxxxxxxxxxxxxxxxx <<<<<-----THIS
  azure_subscription_id: xxxxxxxxxxxxxxxxx
  azure_tenant_id: xxxxxxxxxxxxxxxxx
kind: Secret
metadata:
  annotations:
    cloudcredential.openshift.io/credentials-request: openshift-cloud-credential-operator/openshift-image-registry-azure
  creationTimestamp: "2024-09-26T16:49:57Z"
  labels:
    cloudcredential.openshift.io/credentials-request: "true"
  name: installer-cloud-credentials
  namespace: openshift-image-registry
  resourceVersion: "133921"
  uid: d1268e2c-1825-49f0-aa44-d0e1cbcda383
type: Opaque

The image-registry report healthy and this help the continue the upgrade

Actual results:

    The image registry seems still use the service principal way for Azure storage account authentication

Expected results:

    We expect the REGISTRY_STORAGE_AZURE_ACCOUNTKEY should the only thing image registry operator need for storage account authentication if Customer provide

The image registry continues to function using the custom Azure storage account in the different resource group.

Additional info:

Reproducibility: The issue is consistently reproducible by following the official documentation to configure the image registry with a custom storage account in a different resource group and then attempting an upgrade.
Related Issues:
- Similar problems have been reported in previous incidents, suggesting a systemic issue with the image registry operator's handling of Azure storage credentials.
Critical Customer Impact: Customers are required to perform manual interventions after every upgrade for each cluster, which is not sustainable and leads to operational overhead.

Slack : https://redhat-internal.slack.com/archives/CCV9YF9PD/p1727379313014789

https://github.com/openshift/cluster-image-registry-operator/pull/1135

Bug OCPBUGS-30045: [4.15][2186372] Packet drops during the initial phase of VM live migration

View the Description View the linked PRs

Description of problem:

Following https://issues.redhat.com/browse/CNV-28040
On CNV, when virtual machine, with secondary interfaces connected with bridge CNI, is live migrated we observe disruption at the VM inbound traffic.

The root cause for it is the migration target bridge interface advertise before the migration is completed.

When the migration destination pod is created an IPv6 NS (Neighbor Solicitation)
and NA (Neighbor Advertisement) are sent automatically by the kernel.
The switches at the endpoints (e.g.: migration destination node) tables
get updated and the traffic is forwarded to the migration destination before
the migration is completed [1].

The solution is to have the bridge CNI create the pod interface in "link-down" state [2], the IPv6 NS/NA packets are avoided, CNV in turn, set the pod interface to "link-up" [3].

CNV depends on bridge CNI with [2] bits, which is deployed by cluster-network-operator.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2186372#c6
[2] https://github.com/kubevirt/kubevirt/pull/11069
[3] https://github.com/containernetworking/plugins/pull/997

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

100%

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

CNO deploys CNI bridge w/o an option to set the bridge interface down.

Expected results:

CNO to deploy bridge CNI with [1] changes. 

[1] https://github.com/containernetworking/plugins/pull/997

Additional info:

More https://issues.redhat.com/browse/CNV-28040

https://github.com/openshift/containernetworking-plugins/pull/155

Bug OCPBUGS-39077: Update KCM node-monitor-grace-period to upstream default value to prevent nodes going "NotReady" for a few seconds

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38259~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-38258~~. The following is the description of the original issue:
—
The issue we're trying to address is that nodes go NotReady for a few seconds.
See slack thread https://redhat-external.slack.com/archives/C01C8502FMM/p1717767390381249

https://github.com/openshift/hypershift/pull/4628

Bug OCPBUGS-19206: Update 4.15 thanos image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/thanos/pull/117

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/thanos/pull/117

Bug OCPBUGS-19426: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13333

Bug OCPBUGS-23742: Bump cluster-ingress-operator to Kubernetes 1.28 for 4.15

View the Description View the linked PRs

Description of problem

The cluster-ingress-operator repository vendors controller-runtime v0.15.0, which uses Kubernetes 1.27 packages. OpenShift 4.15 is based on Kubernetes 1.28.

Version-Release number of selected component (if applicable)

4.15.

How reproducible

Always.

Steps to Reproduce

Check https://github.com/openshift/cluster-ingress-operator/blob/release-4.15/go.mod.

Actual results

The sigs.k8s.io/controller-runtime package is at v0.15.0.

Expected results

The sigs.k8s.io/controller-runtime package is at v0.16.0 or newer.

Additional info

https://github.com/openshift/cluster-ingress-operator/pull/990 already bumped the k8s.io/* packages to v0.28.2, but ideally the controller-runtime package should be bumped too. The controller-runtime v0.16 release includes some breaking changes; see the release notes at https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.16.0.

https://github.com/openshift/cluster-ingress-operator/pull/1001

Bug OCPBUGS-32768: Revert OCPBUGS-31644

View the Description View the linked PRs

Description of problem:

Openshift-4.15 is expected to be built with go1.20. The linked bug bumps openshift/etcd to 3.5.13 which includes a bump to go1.21 which has caused issues building Microshift.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/etcd/pull/264

Bug OCPBUGS-24008: Agent integration test sno_arm fails

View the Description View the linked PRs

The sno_arm.txt integration test fails because it tries to extract arm64 pxe bits from the OKD release payload that is x86_64.

AC:
skip or remove the sno_arm.txt test.

https://github.com/openshift/installer/pull/7718

Bug OCPBUGS-18830: [AWS SC2S] ec2:DescribeSecurityGroupRules is not supported in SC2S region.

View the Description View the linked PRs

Description of problem:

Failed to install cluster on SC2S region as:

level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-11-201102

How reproducible:

Always

Steps to Reproduce:

1. Create an OCP cluster on SC2S

Actual results:

Install fail:
level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region.

Expected results:

Install succeed.

Additional info:

* C2S region is not affected

https://github.com/openshift/installer/pull/7491

Bug OCPBUGS-19019: Ironic: Invalid cross-device link

View the Description View the linked PRs

Using metal-ipi with okd-scos ironic fails to provision nodes

https://github.com/openshift/ironic-image/pull/398

Bug OCPBUGS-42020: SingleReplica HCPs can not upgrade on cluster with nodes in a single zone

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41555~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-39313~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-29497. The following is the description of the original issue:
—
While updating an HC with controllerAvailabilityPolicy of SingleReplica, the HCP doesn't fully rollout with 3 pod stuck in Pending

multus-admission-controller-5b5c95684b-v5qgd          0/2     Pending   0               4m36s
network-node-identity-7b54d84df4-dxx27                0/3     Pending   0               4m12s
ovnkube-control-plane-647ffb5f4d-hk6fg                0/3     Pending   0               4m21s

This is because these deployment all have requiredDuringSchedulingIgnoredDuringExecution zone anti-affinity and maxUnavailable: 25% (i.e. 1)

Thus the old pod blocks scheduling of the new pod.

https://github.com/openshift/cluster-network-operator/pull/2504

Bug OCPBUGS-13204: customNodeDeployment YAML script not working

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13148

Bug OCPBUGS-27910: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/649

Bug OCPBUGS-32383: [AWS SDK Install] Port 22 is missing worker node's security group in SDK install

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31563~~. The following is the description of the original issue:
—
Description of problem:

Port 22 is added to the worker node security group in TF install [1]:

resource "aws_security_group_rule" "worker_ingress_ssh" {
  type          	= "ingress"
  security_group_id = aws_security_group.worker.id
  description   	= local.description

  protocol	= "tcp"
  cidr_blocks = var.cidr_blocks
  from_port   = 22
  to_port 	= 22
}

But it's missing in SDK install [2]


[1] https://github.com/openshift/installer/blob/master/data/data/aws/cluster/vpc/sg-worker.tf#L39-L48
[2] https://github.com/openshift/installer/pull/7676/files#diff-c89a0152f7d51be6e3830081d1c166d9333628982773c154d8fc9a071c8ff765R272

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-03-31-180021

How reproducible:

Always

Steps to Reproduce:

    1. Create a cluster using SDK installation method
    2.
    3.

Actual results:

See description.

Expected results:

Port 22 is added to worker node's security group.

Additional info:

https://github.com/openshift/installer/pull/8282

Bug OCPBUGS-42881: ROSA HCP Nodepool versions unexpectedly do not match Node versions

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42342~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-41552~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39420. The following is the description of the original issue:
—
Description of problem:

ROSA HCP allows customers to select hostedcluster and nodepool OCP z-stream versions, respecting version skew requirements. E.g.:

A 4.15.28 hostedcluster with
A 4.15.28 nodepool
A 4.15.25 nodepool

Version-Release number of selected component (if applicable):

Reproducible on 4.14-4.16.z, this bug report demonstrates it for a 4.15.28 hostedcluster with a 4.15.25 nodepool

How reproducible:

100%

Steps to Reproduce:

    1. Create a ROSA HCP cluster, which comes with a 2-replica nodepool with the same z-stream version (4.15.28)
    2. Create an additional nodepool at a different version (4.15.25)

Actual results:

Observe that while nodepool objects report the different version (4.15.25), the resulting kernel version of the node is that of the hostedcluster (4.15.28)

❯ k get nodepool -n ocm-staging-2didt6btjtl55vo3k9hckju8eeiffli8                                                                                    
NAME                     CLUSTER       DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
mshen-hyper-np-4-15-25   mshen-hyper   1               1               False         True         4.15.25   False             False            
mshen-hyper-workers      mshen-hyper   2               2               False         True         4.15.28   False             False  


❯ k get no -owide                                            
NAME                                         STATUS   ROLES    AGE   VERSION            INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                  CONTAINER-RUNTIME
ip-10-0-129-139.us-west-2.compute.internal   Ready    worker   24m   v1.28.12+396c881   10.0.129.139   <none>        Red Hat Enterprise Linux CoreOS 415.92.202408100433-0 (Plow)   5.14.0-284.79.1.el9_2.aarch64   cri-o://1.28.9-5.rhaos4.15.git674ed4c.el9
ip-10-0-129-165.us-west-2.compute.internal   Ready    worker   98s   v1.28.12+396c881   10.0.129.165   <none>        Red Hat Enterprise Linux CoreOS 415.92.202408100433-0 (Plow)   5.14.0-284.79.1.el9_2.aarch64   cri-o://1.28.9-5.rhaos4.15.git674ed4c.el9
ip-10-0-132-50.us-west-2.compute.internal    Ready    worker   30m   v1.28.12+396c881   10.0.132.50    <none>        Red Hat Enterprise Linux CoreOS 415.92.202408100433-0 (Plow)   5.14.0-284.79.1.el9_2.aarch64   cri-o://1.28.9-5.rhaos4.15.git674ed4c.el9

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4867

Bug OCPBUGS-37266: hypershift ignition server uses RHEL major version mismatched MCO binaries

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37241~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-37222~~. The following is the description of the original issue:
—

https://github.com/openshift/hypershift/pull/4385

Bug OCPBUGS-18455: Unable to disable external CCM for platform external

View the Description View the linked PRs

Description of problem:

Some 3rd party clouds do not require the use of an external CCM. The installer enables an external CCM by default whenever the platform is external.

Version-Release number of selected component (if applicable):

4.14 nightly

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

The external CCM can not be disabled when the platform type is external.

Expected results:

The external CCM should be able to be disabled when the platform type is external.

Additional info:

https://github.com/openshift/installer/pull/7533

Bug OCPBUGS-19352: Node in NotReady state as unified_cgroup_hierarchy=1 are set

View the Description View the linked PRs

Description of problem:

In baremetal multinode OCP cluster a node ends up in NotReady state.

On the node there are couple of failed services:
● cpuset-configure.service         loaded failed failed Move services to reserved cpuset
● on-prem-resolv-prepender.service loaded failed failed Populates resolv.conf according to on-prem IPI needs

journalctl --boot --no-pager -u cpuset-configure.service
Sep 18 16:57:37 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Starting Move services to reserved cpuset...
Sep 18 16:57:37 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com cpuset-configure.sh[3014]: /usr/local/bin/cpuset-configure.sh: line 17: /sys/fs/cgroup/cpuset/cpuset.sched_load_balance: Read-only file system
Sep 18 16:57:38 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: cpuset-configure.service: Main process exited, code=exited, status=1/FAILURE
Sep 18 16:57:38 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: cpuset-configure.service: Failed with result 'exit-code'.
Sep 18 16:57:38 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Failed to start Move services to reserved cpuset.

Sep 18 16:57:52 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Failed to start Populates resolv.conf according to on-prem IPI needs.
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Starting Populates resolv.conf according to on-prem IPI needs...
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4852]: nameserver 10.47.242.10
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4851]: NM resolv-prepender: Starting download of baremetal runtime cfg image
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23012b3380ffce706aa8f204cdc26745d8a69b0218150ec3bcb495202694fdab...
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Getting image source signatures
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:916ead524b9e54b9d5534b65534253c02ce66f1d784e683389aa3c4cb4d12389
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:d8190195889efb5333eeec18af9b6c82313edd4db62989bd3a357caca4f13f0e
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:c71d2589fba7989ecd29ea120fe7add01fab70126fc653a863d5844e35ee5403
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:97da74cc6d8fa5d1634eb1760fd1da5c6048619c264c23e62d75f3bf6b8ef5c4
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:d4dc6e74b6ce09e24dc284cc1967451f3dda2d485bc92fc95d24d91f939e4849
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying config sha256:ba2c86ef11c4e341cd0870b6d5b7ad39aa39724389d9d2dfead4ea3d75582071
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Writing manifest to image destination
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Storing signatures
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: ba2c86ef11c4e341cd0870b6d5b7ad39aa39724389d9d2dfead4ea3d75582071
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4851]: NM resolv-prepender: Download of baremetal runtime cfg image completed
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4863]: Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded.
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4863]: Error: OCI runtime error: runc: runc create failed: mountpoint for devices not found
Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: on-prem-resolv-prepender.service: Main process exited, code=exited, status=127/n/a

When checking CGroup config:

oc describe node.config
Name:         cluster
Namespace:
Labels:       <none>
Annotations:  include.release.openshift.io/ibm-cloud-managed: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
              release.openshift.io/create-only: true
API Version:  config.openshift.io/v1
Kind:         Node
Metadata:
  Creation Timestamp:  2023-09-18T15:27:44Z
  Generation:          3
  Owner References:
    API Version:     config.openshift.io/v1
    Kind:            ClusterVersion
    Name:            version
    UID:             c62da215-6526-4306-8fc6-035612c8605e
  Resource Version:  91518
  UID:               cf2189ba-cd69-45e9-868c-7c2589decb25
Spec:
  Cgroup Mode:  v1
Events:         <none>

Version-Release number of selected component (if applicable):

4.14.0-rc.1

How reproducible:

so far 100%

Steps to Reproduce:

1. Deploy baremetal multinode cluster with GitOps-ZTP workflow
2.
3.

Actual results:

While all policies report Complaint state some configs are still being applied:

oc get mcp
NAME       CONFIG                                               UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
ht100gb    rendered-ht100gb-572f5aef443a21b21a8c5cfe816708e2    False     True       False      2              0                   0                     0                      77m
master     rendered-master-3c44ec28c389693028ad2cc6b74741ca     True      False      False      3              3                   3                     0                      103m
standard   rendered-standard-1942568110455a377b735e15f18c7ba8   True      False      False      2              2                   2                     0                      77m
worker     rendered-worker-033d4f0a2568efce241d02a2c54ab88e     True      False      False      0              0                   0                     0                      103m

Expected results:

All nodes are in Ready state

Additional info:

https://github.com/openshift/machine-config-operator/pull/3972

Bug OCPBUGS-22265: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/90

Bug OCPBUGS-33604: FailedPrecondition volume does not appear staged

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13551~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver/pull/118

Bug OCPBUGS-35869: ocm-operator: panic detected in pod

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35822~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35801~~. The following is the description of the original issue:
—
Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
at:
github.com/openshift/cluster-openshift-controller-manager-operator/pkg/operator/internalimageregistry/cleanup_controller.go:146 +0xd65

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/357

Bug OCPBUGS-28926: [4.15] Random numbers in pids.max file on pods as well as on nodes.

View the Description View the linked PRs

Description of problem:
In 4.12 OCP cluster the default podPidsLimit is 4096 when checked at node level in /sys/fs/cgroup/pids/kubepods.slice/kubepods-//pids.max path.

for f in /sys/fs/cgroup/pids/kubepods.slice/kubepods-*/*/pids.max; do echo $f.; cat $f; done
/sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb910bab7_528b_48c1_a0d5_72493eea2e0d.slice/pids.max.
4096
/sys/fs/cgroup/pids/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod08371dcf_fcf7_49e7_84ef_d3887fcc7694.slice/pids.max.
4096
/sys/fs/cgroup/pids/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod10795289d34b5e76d3845007b0111048.slice/pids.max.
4096
/sys/fs/cgroup/pids/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod188fca37_2aef_4668_8f5d_c2a390e86cc6.slice/pids.max.
4096
/sys/fs/cgroup/pids/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2a47340c_46ef_41aa_94ce_10bc726ab328.slice/pids.max.
4096
/sys/fs/cgroup/pids/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3c0e1a64_46bc_41f9_9adf_b73157d7ae86.slice/pids.max.
4096
/sys/fs/cgroup/pids/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3e77ad6b_6fdc_4936_8054_14be023f26d8.slice/pids.max.
4096

However , when we check inside any pod or node we could see a pids.max file with a random number . See section [A] and [B]

*[A] * pids.max inside container(having random number)

[root@mirrorreg1 ~]# oc rsh <podname>
sh-4.4$ cat /sys/fs/cgroup/pids/pids.max
1288812

[B] pids.max inside node(having random number)

cat /sys/fs/cgroup/pids/kubepods.slice/pids.max 
127385

Can someone please help me to understand :

why we have three pids.max value on the cluster and which one should we consider for pod pid limit?
If the default podPidsLimit is 4096 why we see two other pids.max file with a random number inside pod?

Version-Release number of selected component (if applicable):{code:none}

How reproducible:

On node:

1. Login to ocp node
2. check path /sys/fs/cgroup/pids/kubepods.slice/pids.max


On pod

1. Login to any pod
2. Check path cat /sys/fs/cgroup/pids/pids.max

Actual results:{code:none}

Expected results:

Pod pids limit can only be seen /sys/fs/cgroup/pids/kubepods.slice/kubepods-*/*/pids.max

Additional info:

This behavior can be seen in any OCP cluster. Do let me know if you need any logs.

https://github.com/openshift/machine-config-operator/pull/4163

Bug OCPBUGS-18641: [vsphere] dual-stack install fails nodes stuck in node.cloudprovider.kubernetes.io/uninitialized

View the Description View the linked PRs

Description of problem:

vSphere Dual-stack install fails in bootstrap.
All nodes are node.cloudprovider.kubernetes.io/uninitialized

cloud-controller-manager can't find the nodes?

I0906 15:05:22.922183       1 search.go:49] WhichVCandDCByNodeID called but nodeID is empty
E0906 15:05:22.922187       1 nodemanager.go:197] shakeOutNodeIDLookup failed. Err=nodeID is empty

Version-Release number of selected component (if applicable):

4.14.0-0.ci.test-2023-09-06-141839-ci-ln-98f4iqb-latest

How reproducible:

Always

Steps to Reproduce:

1. Install vSphere IPI with OVN Dual-stack

platform:
  vsphere:
    apiVIPs:
      - 192.168.134.3
      - fd65:a1a8:60ad:271c::200
    ingressVIPs:
      - 192.168.134.4
      - fd65:a1a8:60ad:271c::201
networking:
  networkType: OVNKubernetes
  machineNetwork:
  - cidr: 192.168.0.0/16
  - cidr: fd65:a1a8:60ad:271c::/64
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd65:10:128::/56
    hostPrefix: 64
  serviceNetwork:
  - 172.30.0.0/16
  - fd65:172:16::/112

Actual results:

Install fails in bootstrap

Expected results:

Install succeeds

Additional info:

I0906 15:03:21.393629       1 search.go:69] WhichVCandDCByNodeID by UUID
I0906 15:03:21.393632       1 search.go:76] WhichVCandDCByNodeID nodeID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406797       1 search.go:208] Found node 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406816       1 search.go:210] Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2, UUID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406830       1 nodemanager.go:159] Discovered VM using normal UUID format
I0906 15:03:21.416168       1 nodemanager.go:268] Adding Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2
I0906 15:03:21.416218       1 nodemanager.go:438] Adding Internal IP: 192.168.134.60
I0906 15:03:21.416229       1 nodemanager.go:443] Adding External IP: 192.168.134.60
I0906 15:03:21.416244       1 nodemanager.go:349] Found node 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.416266       1 nodemanager.go:351] Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2 UUID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.416278       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 421b78c3-f8bb-970c-781b-76827306e89e
E0906 15:03:21.416326       1 node_controller.go:236] error syncing 'ci-ln-bllxr6t-c1627-5p7mq-master-2': failed to get node modifiers from cloud provider: provided node ip for node "ci-ln-bllxr6t-c1627-5p7mq-master-2" is not valid: failed to get node address from cloud provider that matches ip: fd65:a1a8:60ad:271c::70, requeuing
I0906 15:03:21.623573       1 instances.go:102] instances.InstanceID() CACHED with ci-ln-bllxr6t-c1627-5p7mq-master-1

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/279

Bug OCPBUGS-20058: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7549

Bug OCPBUGS-21584: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1757

Bug OCPBUGS-22385: vmware-vsphere-csi-driver-webhook handles HTTP/2 requests

View the Description View the linked PRs

Description of problem:

Currently, vmware-vsphere-csi-driver-webhook exposes HTTP/2 endpoints:

$ oc -n openshift-cluster-csi-drivers exec deployment/vmware-vsphere-csi-driver-webhook -- curl -kv   https://localhost:8443/readyz

...
* ALPN, server accepted to use h2
> GET /readyz HTTP/2
< HTTP/2 404

To err on the side of caution, we should discontinue the handling of HTTP/2 requests.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. oc -n openshift-cluster-csi-drivers exec deployment/vmware-vsphere-csi-driver-webhook -- curl -kv https://localhost:8443/readyz 2.
3.

Actual results:

HTTP/2 requests are accepted

Expected results:

HTTP/2 requests shouldn't be accepted by wehook

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/179

Bug OCPBUGS-30944: Agent installer attempt to contact libvirt in openshift-baremetal-install

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30941~~. The following is the description of the original issue:
—
In 4.15 when the agent installer is run using the openshift-baremetal-installer binary using an install-config containing platform data, it attempts to contact libvirt to validate the provisioning network interfaces for the bootstrap VM. This should never happen, as the agent installer doesn't use the bootstrap VM.

It is possible that users in the process of converting from baremetal IPI to the agent installer might run into this issue, since they would already be using the openshift-baremetal-installer binary.

https://github.com/openshift/installer/pull/8167

Bug OCPBUGS-32097: Unable to create alert silence in UI though "creator" filed is NOT mandatory

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31931~~. The following is the description of the original issue:
—
Description of problem:

When creating alerting silence from RHOCP UI without specifying "Creator" field, error "createdBy in body is required" even though field "Creator" is not marked as mandatory.

Version-Release number of selected component (if applicable):

4.15.5

How reproducible:

100%

Steps to Reproduce:

    1. Login to webconsole (Admin view)
    2. Observe > Alerting
    3. Select the alert to silence
    4. Click Create Silence.
    5. in Info section, update the "Comment" field and skip the "Creator" field. Now, click on Create button.
    6. It will throw an error "createdBy in body is required".

Actual results:

Able to create alerting silence without specifying "Creator" field.

Expected results:

User should not be able to create silences without specifying "Creator" field as it should be a mandatory.

Additional info:

The steps works well for prior version of RHOCP 4.15 (tested on 4.14)

https://github.com/openshift/monitoring-plugin/pull/115

Bug OCPBUGS-36152: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-openshift-controller-manager-operator)

View the Description View the linked PRs

Backport to 4.15 of ~~OCPBUGS-35007~~ specifically for the cluster-openshift-controller-manager-operator.

All workloads of the following namespaces need SCC pinning:

openshift-controller-manager-operator
openshift-controller-manager
openshift-route-controller-manager

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/361

Bug OCPBUGS-25921: [OVN][IPSEC] ovn-ipsec-host pods got deleted when there is a NotReady node

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25337~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2181

Bug OCPBUGS-33477: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/234

Story TRT-1375: MCDPivotError firing on GCP

View the Description View the linked PRs

This is going to block the next payload, it failed 10/10 runs, payload is https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-11-30-112918

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-rt-upgrade-4.15-minor-release-openshift-release-analysis-aggregator/1730187997023834112

Suspect

https://github.com/openshift/machine-config-operator/pull/3965/files

https://github.com/openshift/machine-config-operator/pull/4051

Bug OCPBUGS-17906: HyperShift guest cluster does not have cloudcredentials instance

View the Description View the linked PRs

Description of problem:

On Hypershift(Guest) cluster, EFS driver pod stuck at ContainerCreating state

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Create Hypershift cluster.    
Flexy template: aos-4_14/ipi-on-aws/versioned-installer-ovn-hypershift-ci

2. Try to install EFS operator and driver from yaml file/web console as mentioned in below steps.  
a) Create iam role from ccoctl tool and will get ROLE ARN value from the output   
b) Install EFS operator using the above ROLE ARN value.   
c) Check EFS operator, node, controller pods are up and running  

// og-sub-hcp.yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  generateName: openshift-cluster-csi-drivers-
  namespace: openshift-cluster-csi-drivers
spec:
  namespaces:
  - ""
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: aws-efs-csi-driver-operator
  namespace: openshift-cluster-csi-drivers
spec:
    channel: stable
    name: aws-efs-csi-driver-operator
    source: qe-app-registry
    sourceNamespace: openshift-marketplace
    config:
      env:
      - name: ROLEARN
        value: arn:aws:iam::301721915996:role/hypershift-ci-16666-openshift-cluster-csi-drivers-aws-efs-cloud-

// driver.yaml
apiVersion: operator.openshift.io/v1
kind: ClusterCSIDriver
metadata:
  name: efs.csi.aws.com
spec:
  logLevel: TraceAll
  managementState: Managed
  operatorLogLevel: TraceAll

Actual results:

aws-efs-csi-driver-controller-699664644f-dkfdk   0/4     ContainerCreating   0          87m

Expected results:

EFS controller pods should be up and running

Additional info:

oc -n openshift-cluster-csi-drivers logs aws-efs-csi-driver-operator-6758c5dc46-b75hb

E0821 08:51:25.160599       1 base_controller.go:266] "AWSEFSDriverCredentialsRequestController" controller failed to sync "key", err: cloudcredential.operator.openshift.io "cluster" not found

Discussion: https://redhat-internal.slack.com/archives/GK0DA0JR5/p1692606247221239
Installation steps epic: https://issues.redhat.com/browse/STOR-1421

https://github.com/openshift/hypershift/pull/3009

Bug OCPBUGS-18267: 404: not found will shonw on Knative-serving Details page

View the Description View the linked PRs

Description of problem:

'404: Not Found' will show on Knative-serving Details page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

Always

Steps to Reproduce:

1. Installed 'Serveless' Operator, make sure the operator has been installed successfully, and the Knative Serving instance is created without any error
2. Navigate to Administration -> Cluster Settings -> Global Configuration
3. Go to Knative-serving Details page, check if 404 not found message is there
3.

Actual results:

Page will show 404 not found

Expected results:

the 404 not found page should not show

Additional info:

the dependency ticket is OCPBUGs-15008, more information could be checked in the comment

https://github.com/openshift/console/pull/13156

Bug OCPBUGS-22319: invalid memory address or nil pointer dereference in MAPO/CAPO v1alpha7

View the Description View the linked PRs

Description of problem:

Impossible to create NFV workers

Version-Release number of selected component (if applicable):

4.15 (current master)

Actual results:

I1024 02:36:28.388445       1 controller.go:156] sj6vp0y3-56ae0-2f4wl-worker-0-ph4nw: reconciling Machine
I1024 02:36:29.068382       1 controller.go:349] sj6vp0y3-56ae0-2f4wl-worker-0-ph4nw: reconciling machine triggers idempotent create
I1024 02:36:31.426442       1 controller.go:115]  "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="machine-controller" "name"="sj6vp0y3-56ae0-2f4wl-worker-0-ph4nw" "namespace"="openshift-machine-api" "object"={"name":"sj6vp0y3-56ae0-2f4wl-worker-0-ph4nw","namespace":"openshift-machine-api"} "reconcileID"="1041b0ba-067a-4e94-8a2a-f71f46821275"
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x27c49ff]

goroutine 247 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa
panic({0x2a72f60, 0x430deb0})
	/usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/machine-api-provider-openstack/pkg/machine.MachineToInstanceSpec(0xc0006698c0, {0xc000a49940, 0x1, 0x4}, {0xc000a49980, 0x1, 0x4}, {0xc00029aa00, 0x6a6}, {0x30ab820, ...}, ...)
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/convert.go:317 +0xb9f
github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).convertMachineToCapoInstanceSpec(0xc0000f11f0, {0x30cb3b0, 0xc000c50b80}, 0xc0006698c0)
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:157 +0x23b
github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).createInstance(0xc0000f11f0, {0xc000c50b80?, 0xc00072a1b0?}, 0xc0006698c0, {0x30cb3b0, 0xc000c50b80})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:246 +0x137
github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).reconcile(0xc0000f11f0, {0x30c5530, 0xc00072a1b0}, 0xc0006698c0)
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:201 +0x23e
github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).Create(0xc000a42150?, {0x30c5530?, 0xc00072a1b0?}, 0x0?)
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:172 +0x25
github.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc0002ab6d0, {0x30c5530, 0xc00072a1b0}, {{{0xc000c90a50?, 0x0?}, {0xc0000014a0?, 0xc00087bd48?}}})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:350 +0xbb8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x30c9578?, {0x30c5530?, 0xc00072a1b0?}, {{{0xc000c90a50?, 0xb?}, {0xc0000014a0?, 0x0?}}})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000322a00, {0x30c5488, 0xc00028e2d0}, {0x2b57480?, 0xc0000e64c0?})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000322a00, {0x30c5488, 0xc00028e2d0})
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587

Expected results:

It should work

I think this is related to https://github.com/openshift/machine-api-provider-openstack/pull/87

https://github.com/openshift/machine-api-provider-openstack/pull/91

Bug OCPBUGS-24172: Update 4.15 kube-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/sdn/pull/592

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/sdn/pull/592

Bug OCPBUGS-25982: [4.15] E2E Automation of Dynamic OVS Pinning

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20368~~. The following is the description of the original issue:
—
Description of problem:

Automate E2E tests of Dynamic OVS Pinning. This bug is created for merging

https://github.com/openshift/cluster-node-tuning-operator/pull/746

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/904

Bug OCPBUGS-27001: Cannot change default network type when not doing migration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25760~~. The following is the description of the original issue:
—
Description of problem:

During live OVN migration, network operator show the error message: Not applying unsafe configuration change: invalid configuration: [cannot change default network type when not doing migration]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create 4.15 nightly SDN ROSA cluster
2. oc delete validatingwebhookconfigurations.admissionregistration.k8s.io/sre-techpreviewnoupgrade-validation
3. oc edit featuregate cluster to enable featuregates 
4. Wait for all node rebooting and back to normal
5. oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'

Actual results:

[weliang@weliang ~]$ oc delete validatingwebhookconfigurations.admissionregistration.k8s.io/sre-techpreviewnoupgrade-validation[weliang@weliang ~]$ oc edit featuregate cluster[weliang@weliang ~]$ oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'network.config.openshift.io/cluster patched[weliang@weliang ~]$ [weliang@weliang ~]$ oc get co networkNAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGEnetwork   4.15.0-0.nightly-2023-12-18-220750   True        False         True       105m    Not applying unsafe configuration change: invalid configuration: [cannot change default network type when not doing migration]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.[weliang@weliang ~]$ oc describe Network.config.openshift.io clusterName:         clusterNamespace:    Labels:       <none>Annotations:  network.openshift.io/network-type-migration: API Version:  config.openshift.io/v1Kind:         NetworkMetadata:  Creation Timestamp:  2023-12-20T15:13:39Z  Generation:          3  Resource Version:    119899  UID:                 6a621b88-ac4f-4918-a7f6-98dba7df222cSpec:  Cluster Network:    Cidr:         10.128.0.0/14    Host Prefix:  23  External IP:    Policy:  Network Type:  OVNKubernetes  Service Network:    172.30.0.0/16Status:  Cluster Network:    Cidr:               10.128.0.0/14    Host Prefix:        23  Cluster Network MTU:  8951  Network Type:         OpenShiftSDN  Service Network:    172.30.0.0/16Events:  <none>[weliang@weliang ~]$ oc describe Network.operator.openshift.io clusterName:         clusterNamespace:    Labels:       <none>Annotations:  <none>API Version:  operator.openshift.io/v1Kind:         NetworkMetadata:  Creation Timestamp:  2023-12-20T15:15:37Z  Generation:          275  Resource Version:    120026  UID:                 278bd491-ac88-4038-887f-d1defc450740Spec:  Cluster Network:    Cidr:         10.128.0.0/14    Host Prefix:  23  Default Network:    Openshift SDN Config:      Enable Unidling:          true      Mode:                     NetworkPolicy      Mtu:                      8951      Vxlan Port:               4789    Type:                       OVNKubernetes  Deploy Kube Proxy:            false  Disable Multi Network:        false  Disable Network Diagnostics:  false  Kube Proxy Config:    Bind Address:      0.0.0.0  Log Level:           Normal  Management State:    Managed  Observed Config:     <nil>  Operator Log Level:  Normal  Service Network:    172.30.0.0/16  Unsupported Config Overrides:  <nil>  Use Multi Network Policy:      falseStatus:  Conditions:    Last Transition Time:  2023-12-20T15:15:37Z    Status:                False    Type:                  ManagementStateDegraded    Last Transition Time:  2023-12-20T16:58:58Z    Message:               Not applying unsafe configuration change: invalid configuration: [cannot change default network type when not doing migration]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.    Reason:                InvalidOperatorConfig    Status:                True    Type:                  Degraded    Last Transition Time:  2023-12-20T15:15:37Z    Status:                True    Type:                  Upgradeable    Last Transition Time:  2023-12-20T16:52:11Z    Status:                False    Type:                  Progressing    Last Transition Time:  2023-12-20T15:15:45Z    Status:                True    Type:                  Available  Ready Replicas:          0  Version:                 4.15.0-0.nightly-2023-12-18-220750Events:                    <none>[weliang@weliang ~]$ oc get clusterversionNAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUSversion   4.15.0-0.nightly-2023-12-18-220750   True        False         84m     Error while reconciling 4.15.0-0.nightly-2023-12-18-220750: the cluster operator network is degraded[weliang@weliang ~]$

Expected results:

Migration success

Additional info:

Get same error message from ROSA and GCP cluster.

https://github.com/openshift/cluster-network-operator/pull/2194

Bug OCPBUGS-21763: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-powervs/pull/44

Bug OCPBUGS-22840: [azure] Installer should have some pre-check for field plan when using marketplace image

View the Description View the linked PRs

Description of problem:

Install cluster with azure marketplace image 413.92.2023101700, and set field osImage:plan to NoPurchasePlan.

install-config.yaml:
--------------------
platform:
  azure:
    baseDomainResourceGroupName: os4-common
    cloudName: AzurePublicCloud
    outboundType: Loadbalancer
    region: southcentralus
    defaultMachinePlatform:
      osImage:
        offer: rh-ocp-worker
        publisher: redhat
        sku: rh-ocp-worker-gen1
        version: 413.92.2023101700
        plan: NoPurchasePlan


Bootstrap vm is provisioned failed with below terraform error: 

DEBUG In addition to the other similar warnings shown, 3 other variable(s) defined 
DEBUG without being declared.                      
ERROR                                              
ERROR Error: waiting for creation of Linux Virtual Machine: (Name "jima02test-7jf8d-bootstrap" / Resource Group "jima02test-7jf8d-rg"): Code="VMMarketplaceInvalidInput" Message="Creating a virtual machine from Marketplace image or a custom image sourced from a Marketplace image requires Plan information in the request. VM: '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima02test-7jf8d-rg/providers/Microsoft.Compute/virtualMachines/jima02test-7jf8d-bootstrap'." 
ERROR                                              
ERROR   with azurerm_linux_virtual_machine.bootstrap, 
ERROR   on main.tf line 194, in resource "azurerm_linux_virtual_machine" "bootstrap": 
ERROR  194: resource "azurerm_linux_virtual_machine" "bootstrap" { 
ERROR                                              
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "bootstrap" stage: failed to create cluster: failed to apply Terraform: exit status 1 
ERROR                                              
ERROR Error: waiting for creation of Linux Virtual Machine: (Name "jima02test-7jf8d-bootstrap" / Resource Group "jima02test-7jf8d-rg"): Code="VMMarketplaceInvalidInput" Message="Creating a virtual machine from Marketplace image or a custom image sourced from a Marketplace image requires Plan information in the request. VM: '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima02test-7jf8d-rg/providers/Microsoft.Compute/virtualMachines/jima02test-7jf8d-bootstrap'." 
ERROR                                              
ERROR   with azurerm_linux_virtual_machine.bootstrap, 
ERROR   on main.tf line 194, in resource "azurerm_linux_virtual_machine" "bootstrap": 
ERROR  194: resource "azurerm_linux_virtual_machine" "bootstrap" { 
ERROR                                              
ERROR

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-11-01-235040

How reproducible:

Always

Steps to Reproduce:

1. set azure marketplace image(it has purchase plan) and plan:NoPurchasePlan in install-config.yaml file
2. trigger the installation
3.

Actual results:

bootstrap vm is provisioned failed.

Expected results:

installer should have some validation for plan when using marketplace image with purchase plan, and exit earlier with proper message

Additional info:

https://github.com/openshift/installer/pull/7721

Bug OCPBUGS-23776: After PatternFly5 update: Add page > more button dropdown is too wide

View the Description View the linked PRs

Issue 50 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Add page dropdown doesn't break anymore and overlays if the window is too small.

Screenshots:

https://github.com/openshift/console/pull/13361

Bug OCPBUGS-25698: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/86

Bug OU-298: monitoring plugin docker does not load files

View the Description View the linked PRs

The dockerfile provided is not configured properly and does not load the files generated by a build command.

https://github.com/openshift/monitoring-plugin/pull/83

Bug OCPBUGS-18945: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-12868~~.

https://github.com/openshift/installer/pull/7499

Bug OCPBUGS-19241: Update 4.15 ose-machine-api-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1167

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1167

Bug OCPBUGS-22744: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/493

Bug OCPBUGS-23473: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-22757~~.

https://github.com/openshift/installer/pull/7770

Bug OCPBUGS-22594: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/414

Bug OCPBUGS-27202: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/monitoring-plugin/pull/94

Bug OCPBUGS-37761: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/583

Bug OCPBUGS-39085: The Utilization of CPU and memory looks not correct on node overview page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37430~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-23332~~. The following is the description of the original issue:
—
Description of problem:

Navigate to Node overview and check the Utilization of CPU and memory, it shows something like: "6.53 GiB available of 300 MiB total limit", which looks very confuse.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Navigate to Node overview
2. Check the Utilization of CPU and memory
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/14209

Bug OCPBUGS-19243: Update 4.15 ose-gcp-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-gcp/pull/38

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-gcp/pull/38

Story CFE-955: Change the owner file on oc-mirror

View the linked PRs

https://github.com/openshift/oc-mirror/pull/694

Bug OCPBUGS-11710: Connection problems with OVN-Kubernetes on OpenShift Container Platform 4.12 on AWS post hibernation

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/1939

Bug OCPBUGS-15817: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2077

Bug OCPBUGS-18857: Update 4.15 ose-cluster-samples-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-samples-operator/pull/517

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-21592: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/546

Bug OCPBUGS-33938: [cee.Next]Adding same value to the Vmware plugin on OCP console should not lead to nodes reboot

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31613~~. The following is the description of the original issue:
—
Description of problem:

On OCP console if we added a parameter related to VMware,add the same value back again and click on save the nodes are rebooted

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

Steps to Reproduce:

    1. On any 4.14+ cluster go to ocp console page
    2. Click on the vmware plugin
    3. Edit any parameter and add the same value again.
    4. Click on save

Actual results:

    The nodes reboot to pickup change

Expected results:

 nodes should not reboot if the same values are entered

Additional info:

https://github.com/openshift/console/pull/14210

Bug OCPBUGS-45248: [release-4.15] Remove ClusterTask dependency in console from Pipelines 1.17

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44183~~. The following is the description of the original issue:
—

Description

ClusterTask has been deprecated and will be removed in Pipelines Operator 1.17

We have to use Tasks from `openshift-pipelines` namespace. This change will happen in console-plugin repo(dynamic plugin). So in console repository we have to remove all the dependency of ClusterTask if the Pipelines Operator is 1.17 and above

Acceptance Criteria

Remove ClusterTask list page in search menu
Remove ClusterTask list page tab in Tasks navigation menu
ClusterTask to be removed from quick search in Pipelines builder
Update the test cases (can we remove ClusterTask test for Pipelines 1.17 and above??)

Additional Details:

https://github.com/openshift/console/pull/14564

Task HOSTEDCP-1184: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3008

Bug OCPBUGS-19761: On an SNO with Telco DU profile must-gather perf-node-gather-daemonset fails: Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace does not allow the workload type management

View the Description View the linked PRs

Description of problem:

When running must-gather against an SNO with Telco DU profile the perf-node-gather-daemonset seems to not be able to start with the error below:

 Warning  FailedCreate  2m37s (x16 over 5m21s)  daemonset-controller  Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace "openshift-must-gather-sbhml" does not allow the workload type management

must-gather shows it's retrying for 300s and reports that performance data collection was complete even though the daemonset pod didn't come up.

[must-gather-nhbgr] POD 2023-09-26T10:15:39.591582116Z Waiting for performance profile collector pods to become ready: 1
[..]
[must-gather-nhbgr] POD 2023-09-26T10:21:07.108893075Z Waiting for performance profile collector pods to become ready: 300
[must-gather-nhbgr] POD 2023-09-26T10:21:08.473217146Z daemonset.apps "perf-node-gather-daemonset" deleted
[must-gather-nhbgr] POD 2023-09-26T10:21:08.480906220Z INFO: Node performance data collection complete.

Version-Release number of selected component (if applicable):

4.14.0-rc.2

How reproducible:

100%

Steps to Reproduce:

1. Deploy SNO with Telco DU profile
2. Run oc adm must-gather

Actual results:

performance data collection doesn't run because daemonset cannot be scheduled.

Expected results:

performance data collection runs.

Additional info:

DaemonSet describe:

oc -n openshift-must-gather-sbhml describe ds
Name:           perf-node-gather-daemonset
Selector:       name=perf-node-gather-daemonset
Node-Selector:  <none>
Labels:         <none>
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       name=perf-node-gather-daemonset
  Annotations:  target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
  Containers:
   node-probe:
    Image:      registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/openshift-release-dev@sha256:2af2c135f69f162ed8e0cede609ddbd207d71a3c7bd49e9af3fcbb16737aa25a
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -c
      echo ok > /tmp/healthy && sleep INF
    Limits:
      cpu:     100m
      memory:  256Mi
    Requests:
      cpu:        100m
      memory:     256Mi
    Readiness:    exec [cat /tmp/healthy] delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /host/podresources from podres (rw)
      /host/proc from proc (ro)
      /host/sys from sys (ro)
      /lib/modules from lib-modules (ro)
  Volumes:
   sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:  Directory
   proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  Directory
   lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  Directory
   podres:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pod-resources
    HostPathType:  Directory
Events:
  Type     Reason        Age                     From                  Message
  ----     ------        ----                    ----                  -------
  Warning  FailedCreate  2m37s (x16 over 5m21s)  daemonset-controller  Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace "openshift-must-gather-sbhml" does not allow the workload type management

https://github.com/openshift/must-gather/pull/385

Bug OCPBUGS-29090: Add permission to network-node-identity secrets

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24299~~. The following is the description of the original issue:
—
CNO managed component (network-node-identity) to conform to hypershift control plane expectations that All secrets should be mounted to not have global read. change from 420(0644) to 416(0640)

https://github.com/openshift/cluster-network-operator/pull/2251

Bug OCPBUGS-29640: Output image url link leads to 404 for Shipwright Builds

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29355~~. The following is the description of the original issue:
—
When clicking on the output image link on a Shipwright BuildRun details page, the link leads to the imagestream details page but shows 404 error.

The image link is:

https://console-openshift-console.apps...openshiftapps.com/k8s/ns/buildah-example/imagestreams/sample-kotlin-spring%3A1.0-shipwright

The BuildRun spec

apiVersion: shipwright.io/v1beta1
kind: BuildRun
metadata: 
  generateName: sample-spring-kotlin-build-
  name: sample-spring-kotlin-build-xh2dq
  namespace: buildah-example
  labels: 
    build.shipwright.io/generation: '2'
    build.shipwright.io/name: sample-spring-kotlin-build
spec: 
  build: 
    name: sample-spring-kotlin-build
status: 
  buildSpec: 
    output: 
      image: 'image-registry.openshift-image-registry.svc:5000/buildah-example/sample-kotlin-spring:1.0-shipwright'
    paramValues: 
      - name: run-image
        value: 'paketocommunity/run-ubi-base:latest'
      - name: cnb-builder-image
        value: 'paketobuildpacks/builder-jammy-tiny:0.0.176'
      - name: app-image
        value: 'image-registry.openshift-image-registry.svc:5000/buildah-example/sample-kotlin-spring:1.0-shipwright'
    source: 
      git: 
        url: 'https://github.com/piomin/sample-spring-kotlin-microservice.git'
      type: Git
    strategy: 
      kind: ClusterBuildStrategy
      name: buildpacks
  completionTime: '2024-02-12T12:15:03Z'
  conditions: 
    - lastTransitionTime: '2024-02-12T12:15:03Z'
      message: All Steps have completed executing
      reason: Succeeded
      status: 'True'
      type: Succeeded
  output: 
    digest: 'sha256:dc3d44bd4d43445099ab92bbfafc43d37e19cfaf1cac48ae91dca2f4ec37534e'
  source: 
    git: 
      branchName: master
      commitAuthor: Piotr Mińkowski
      commitSha: aeb03d60a104161d6fd080267bf25c89c7067f61
  startTime: '2024-02-12T12:13:21Z'
  taskRunName: sample-spring-kotlin-build-xh2dq-j47ql

https://github.com/openshift/console/pull/13615

Bug OCPBUGS-41804: Update owners in route-override-cni

View the Description View the linked PRs

Backport ownerfile update

https://github.com/openshift/route-override-cni/pull/59

Bug OCPBUGS-22324: Node fails to join cluster as CSR contains wrong hostname in dualstack setup

View the Description View the linked PRs

Description of problem:

A node fails to join cluster as it's CSR contains incorrect hostname

oc describe csr csr-7hftm
Name:               csr-7hftm
Labels:             <none>
Annotations:        <none>
CreationTimestamp:  Tue, 24 Oct 2023 10:22:39 -0400
Requesting User:    system:serviceaccount:openshift-machine-config-operator:node-bootstrapper
Signer:             kubernetes.io/kube-apiserver-client-kubelet
Status:             Pending
Subject:
         Common Name:    system:node:openshift-worker-1
         Serial Number:
         Organization:   system:nodes
Events:  <none>

oc get csr csr-7hftm -o yaml
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
  creationTimestamp: "2023-10-24T14:22:39Z"
  generateName: csr-
  name: csr-7hftm
  resourceVersion: "96957"
  uid: 84b94213-0c0c-40e4-8f90-d6612fbdab58
spec:
  groups:
  - system:serviceaccounts
  - system:serviceaccounts:openshift-machine-config-operator
  - system:authenticated
  request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlIN01JR2lBZ0VBTUVBeEZUQVRCZ05WQkFvVERITjVjM1JsYlRwdWIyUmxjekVuTUNVR0ExVUVBeE1lYzNsegpkR1Z0T201dlpHVTZiM0JsYm5Ob2FXWjBMWGR2Y210bGNpMHhNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBECkFRY0RRZ0FFMjRabE1JWGE1RXRKSGgwdWg2b3RVYTc3T091MC9qN0xuSnFqNDJKY0dkU01YeTJVb3pIRTFycmYKOTFPZ3pOSzZ5Z1R0Qm16NkFOdldEQTZ0dUszMlY2QUFNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJRFhHMlFVWQoxMnVlWXhxSTV3blArRFBQaE5oaXhiemJvaTBpQzhHci9kMXRBaUVBdEFDcVVwRHFLYlFUNWVFZXlLOGJPN0dlCjhqVEI1UHN1SVpZM1pLU1R2WG89Ci0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=
  signerName: kubernetes.io/kube-apiserver-client-kubelet
  uid: c3adb2e0-6d60-4f56-a08d-6b01d3d3c065
  usages:
  - digital signature
  - client auth
  username: system:serviceaccount:openshift-machine-config-operator:node-bootstrapper
status: {}

Version-Release number of selected component (if applicable):

4.14.0-rc.6

How reproducible:

So far only on one setup

Steps to Reproduce:

1. Deploy dualstack baremetal cluster with day1 networking with static DHCP hostnames
2.
3.

Actual results:

A node fails to join the cluster

Expected results:

All nodes join the cluster

https://github.com/openshift/machine-config-operator/pull/4028

Bug OCPBUGS-37408: [4.15.z] SCC pinning for all workloads in platform namespaces (openshift-console-operator)

View the Description View the linked PRs

Backport to 4.16 of AUTH-482 specifically for the openshift-console-operator.

Namespaces with workloads that need pinning:

https://github.com/openshift/console-operator/pull/924

Bug OCPBUGS-19261: Update 4.15 openshift-enterprise-egress-dns-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/153

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/153

Bug OCPBUGS-9303: Install does not begin if secure boot was enabled for the first time

View the Description View the linked PRs

Description of problem:
If secure boot is currently disabled, and user attempts to enable it via ZTP, install will not begin the first time ZTP was triggered.

When secure boot is enabled viz ZTP, then boot options will be configured before virtual CD was attached, thus first boot will be booting into existing HD with secure boot on. Install will then get stuck because boot from CD was never triggered.

Version-Release number of selected component (if applicable):
4.10

How reproducible:
Always

Steps to Reproduce:
1. Secure boot is currently disabled in bios
2. Attempt to deploy a cluster with secure boot enabled via ZTP
3.

Actual results:

spoke cluster got booted with secure boot option toggled, into existing HD
spoke cluster did not boot into virtual CD, thus install never started.
agentclusterinstall gets stuck here:
State: insufficient
State Info: Cluster is not ready for install

Expected results:

installation started and completed successfully

Additional info:

Secure boot config used in ZTP siteconfig:
http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/ff814164cdcd355ed980f1edf269dbc2afbe09aa/siteconfig/master-2.yaml#L40

Bug OCPBUGS-3680: 4.15: Upgrade blocked: csi-snapshot-controller fails with read-only filesystem

View the Description View the linked PRs

Description of problem:

OCP upgrade blocks because of cluster operator csi-snapshot-controller fails to start its deployment with a fatal message of read-only filesystem

Version-Release number of selected component (if applicable):

Red Hat OpenShift 4.11
rhacs-operator.v3.72.1

How reproducible:

At least once in user's cluster while upgrading

Steps to Reproduce:

1. Have a OCP 4.11 installed
2. Install ACS on top of the OCP cluster
3. Upgrade OCP to the next z-stream version

Actual results:

Upgrade gets blocked: waiting on csi-snapshot-controller

Expected results:

Upgrade should succeed

Additional info:

stackrox SCCs (stackrox-admission-control, stackrox-collector and stackrox-sensor) contain the `readOnlyRootFilesystem` set to `true`, if not explicitly defined/requested, other Pods might receive this SCC which will make the deployment to fail with a `read-only filesystem` message

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/159

Bug OCPBUGS-37461: [4.15.z] SCC pinning for all workloads in platform namespaces (cloud-credential-operator)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the cloud-credential-operator.

Namespaces with workloads that need pinning:

openshift-cloud-credential-operator

See 4.16 PR for more info on what needs pinning.

https://github.com/openshift/cloud-credential-operator/pull/736

Story HOSTEDCP-1285: Simplify kas ports exposure

View the Description View the linked PRs

As an API consumer I don't want to be able to modify ports that are implementation details and have no consumer impact on features.

As a hypershift dev I want to support all the KAS exposure features while keeping the code sustainable and extensible

Problem
Hypershift requires the kas corev1.endpoint port to be exposed in the data plane hosts. This is so when resolving traffic via SVC we capture traffic in that endpoint port and we leet haproxy redirect it to the LB that resolves to KAS.
A while ago we introduced spec.metworking.apiServer.port to enable IBM to choose which port would be exposed in the data plane hosts, as using hardcode one might conflict with their env requirements.
However as we evolved the different support matrix for our endpoints publishing strategy, we mistakenly used that input as the source for other ports exposure as the internal HCP namespace SVC. We also forced overwriting the corev1.endpoint value to avoid a discrepancy with what the kas pod was generating.

Solutions
Untangle the above by:

Never overriding the corev1.endpoint
Using spec.metworking.apiServer.port only for what's meant, i.e the KAS pod port.
Hiding the KAS SVC port. This is an impl detail.
If we ever require the dedicated KAS LB port to be a choice, that would be input in the LB publishing strategy

https://github.com/openshift/hypershift/pull/2964
https://github.com/openshift/hypershift/pull/3149
https://github.com/openshift/hypershift/pull/3147

https://github.com/openshift/hypershift/pull/3185
https://github.com/openshift/hypershift/pull/3186

Bug OCPBUGS-24066: Update 4.15 atomic-openshift-cluster-autoscaler-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/270

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/270

Bug OCPBUGS-25505: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/165

Bug OCPBUGS-28909: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4161

Bug OCPBUGS-30139: Node fails to join cluster as CSR contains wrong hostname in dualstack setup

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22324~~. The following is the description of the original issue:
—
Description of problem:

A node fails to join cluster as it's CSR contains incorrect hostname

oc describe csr csr-7hftm
Name:               csr-7hftm
Labels:             <none>
Annotations:        <none>
CreationTimestamp:  Tue, 24 Oct 2023 10:22:39 -0400
Requesting User:    system:serviceaccount:openshift-machine-config-operator:node-bootstrapper
Signer:             kubernetes.io/kube-apiserver-client-kubelet
Status:             Pending
Subject:
         Common Name:    system:node:openshift-worker-1
         Serial Number:
         Organization:   system:nodes
Events:  <none>

oc get csr csr-7hftm -o yaml
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
  creationTimestamp: "2023-10-24T14:22:39Z"
  generateName: csr-
  name: csr-7hftm
  resourceVersion: "96957"
  uid: 84b94213-0c0c-40e4-8f90-d6612fbdab58
spec:
  groups:
  - system:serviceaccounts
  - system:serviceaccounts:openshift-machine-config-operator
  - system:authenticated
  request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlIN01JR2lBZ0VBTUVBeEZUQVRCZ05WQkFvVERITjVjM1JsYlRwdWIyUmxjekVuTUNVR0ExVUVBeE1lYzNsegpkR1Z0T201dlpHVTZiM0JsYm5Ob2FXWjBMWGR2Y210bGNpMHhNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBECkFRY0RRZ0FFMjRabE1JWGE1RXRKSGgwdWg2b3RVYTc3T091MC9qN0xuSnFqNDJKY0dkU01YeTJVb3pIRTFycmYKOTFPZ3pOSzZ5Z1R0Qm16NkFOdldEQTZ0dUszMlY2QUFNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJRFhHMlFVWQoxMnVlWXhxSTV3blArRFBQaE5oaXhiemJvaTBpQzhHci9kMXRBaUVBdEFDcVVwRHFLYlFUNWVFZXlLOGJPN0dlCjhqVEI1UHN1SVpZM1pLU1R2WG89Ci0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=
  signerName: kubernetes.io/kube-apiserver-client-kubelet
  uid: c3adb2e0-6d60-4f56-a08d-6b01d3d3c065
  usages:
  - digital signature
  - client auth
  username: system:serviceaccount:openshift-machine-config-operator:node-bootstrapper
status: {}

Version-Release number of selected component (if applicable):

4.14.0-rc.6

How reproducible:

So far only on one setup

Steps to Reproduce:

1. Deploy dualstack baremetal cluster with day1 networking with static DHCP hostnames
2.
3.

Actual results:

A node fails to join the cluster

Expected results:

All nodes join the cluster

https://github.com/openshift/machine-config-operator/pull/4220

Bug OCPBUGS-33371: openshift-controller-manager pod panic due to type assertion

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33088~~. The following is the description of the original issue:
—
Caught by the test: Undiagnosed panic detected in pod

Sample job run:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-upgrade/1783981854974545920

Error message

{  pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:02.367266       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret)
pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:03.368403       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret)
pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:04.370157       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret)}

Sippy indicates it's happening a small percentage of the time since around Apr 25th.

Took out the last payload so labeling trt-incident for now.

https://github.com/openshift/openshift-controller-manager/pull/303

Bug OCPBUGS-27651: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/668

Bug OCPBUGS-37629: Openshift uncordoned compute-node that was intentionally cordoned

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37460~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33397~~. The following is the description of the original issue:
—
Description of problem:

Node has been cordoned manually.After several days, machine-config-controller uncordoned the same node after rendering a new machine-config.

Version-Release number of selected component (if applicable):

    4.13

Actual results:

The mco rolled out and the node was uncordoned by the mco

Expected results:

 MCO treat unscedhulable node as not ready for performing update. Also, it may halt update on other nodes  in the pool based on what maxUnavailable is set for that pool

Additional info:

https://github.com/openshift/machine-config-operator/pull/4495

Bug OCPBUGS-46479: IBM Fusion operator upgrade is blocked with the error: "error validating existing CRs against new CRD's schema"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-46434~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-46054~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-46018~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42815~~. The following is the description of the original issue:
—
Description of problem:

    While upgrading the Fusion operator,  IBM team is facing the following error in the operator's subscription:
error validating existing CRs against new CRD's schema for "fusionserviceinstances.service.isf.ibm.com": error validating service.isf.ibm.com/v1, Kind=FusionServiceInstance "ibm-spectrum-fusion-ns/odfmanager": updated validation is too restrictive: [].status.triggerCatSrcCreateStartTime: Invalid value: "number": status.triggerCatSrcCreateStartTime in body must be of type integer: "number"


question here, "triggerCatSrcCreateStartTime" has been present in the operator for the past few releases and it's datatype (integer) hasn't changed in the latest release as well. There was  one "FusionServiceInstance" CR present in the cluster when this issue was hit and the value of "triggerCatSrcCreateStartTime" field being "1726856593000774400".

Version-Release number of selected component (if applicable):

    Its impacting between OCP 4.16.7 and OCP 4.16.14 versions

How reproducible:

    Always

Steps to Reproduce:

    1.Upgrade the fusion operator ocp version 4.16.7 to ocp 4.16.14
    2.
    3.

Actual results:

    Upgrade fails with error in description

Expected results:

    Upgrade should not be failed

Additional info:

https://github.com/openshift/operator-framework-olm/pull/917

Bug OCPBUGS-16550: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/95

Bug OCPBUGS-20511: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/platform-operators/pull/92

Bug OCPBUGS-24679: Nutanix: installer terraform created control-plane nodes not spreading through configured failure domains

View the Description View the linked PRs

Description of problem

Version-Release number of selected component (if applicable):

Nutanix starts to support multi prism-element (PE) failure domains in OCP 4.15. When doing the local testing, found that when multi-PE failure domains are configured, the installer terraform still creates the 3 control-plane nodes in a single PE, not spreading through the 3 PEs configured with the failure domains.

How reproducible:

 Always

Steps to Reproduce:

Use the 4.15 nightly build installer, configure multi-PE failure domains in the install-config.yaml. See the doc on how to configure: https://docs.google.com/document/d/1TA9vCH-3X_GttJ4fHg39sstdhBPCR7S83N1R8rzXFuk/

Run the installer to provision the Nutanix OCP cluster, check where (which PE) the control-plane nodes are running.

Actual results:

  The control-plane nodes are with the single PE (prism-element).

Expected results:

 The control-plane nodes are evenly spread through the PEs configured in the failure domains.

Additional info:

https://github.com/openshift/installer/pull/7813

Bug OCPBUGS-19137: Update 4.15 cluster-etcd-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-etcd-operator/pull/1115

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-etcd-operator/pull/1115

Bug OCPBUGS-33847: kubelet service is unable to parse the "kubelet_node_name" when multiple domain name used

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-10498~~. The following is the description of the original issue:
—
Description of problem:

The customer requires multiple domain names in their AWS VPCs DHCP option set which is to allow on-prem DNS(infoblox) lookups to work. 

The problem is that kubelet service is unable to parse the node name properly. 

~~~
hyperkube[2562]: Error: failed to run Kubelet: failed to create kubelet: could not initialize volume plugins for KubeletVolumePluginMgr: parse "http://example.compute.internal example.com:9001": invalid character " " in host name
~~~

/etc/systemd/system/kubelet.service.d/20-aws-node-name.conf
[Service]
Environment="KUBELET_NODE_NAME=ip-x-x-x-x.example.example test.example"
                                                         ^
                                                        space


The is customer is aware of this KCS article. If the cu follows what the KCS article says, it will break their DNS functionality. 

Kubelet fails to start on nodes during OCP 4.x IPI installation on AWS - Red Hat Customer Portal
https://access.redhat.com/solutions/6978959

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create/adding a node with multiple domain names
2. Add base domain to the DHCP option in the VPC setting
3.

Actual results:

kubelet is failing to start

Expected results:

should be able to add a worker node that has multiple domain names

Additional info:

https://github.com/openshift/machine-config-operator/pull/4373

Bug OCPBUGS-46570: [4.15] Backport new telemetry for Cluster Logging Operator

View the Description View the linked PRs

The new version of the Cluster Logging Operator added a different set of metrics for telemetry (see MON-4051). The change in cluster-monitoring-operator necessary for these metrics to be picked up is merged to master already, but the change needs to be backported to previous versions to be useful in tracking the existing installations of the operator.

https://github.com/openshift/cluster-monitoring-operator/pull/2541

Bug OCPBUGS-19280: Update 4.15 ose-must-gather image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/must-gather/pull/381

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/must-gather/pull/381

Bug OCPBUGS-22847: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-27586: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-catalogd/pull/39

Bug OCPBUGS-31604: disable http2 for ignition endpoint

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31585~~. The following is the description of the original issue:
—
The hypershift ignition endpoint needlessly supports APLN http2. In light of CVE-2023-39325, there is no reason to support http2 if it is not being used.

https://github.com/openshift/hypershift/pull/3825

Bug OCPBUGS-43635: OAuthServer service with Route type does not work with a custom hostname

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43104~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42714~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-36261. The following is the description of the original issue:
—
Description of problem:

In hostedcluster installations, when the following OAuthServer service is configure without any configured hostname parameter, the oauth route is created in the management cluster with the standard hostname  which following the pattern from ingresscontroller wilcard domain (oauth-<hosted-cluster-namespace>.<wildcard-default-ingress-controller-domain>):  

~~~
$ oc get hostedcluster -n <namespace> <hosted-cluster-name> -oyaml
  - service: OAuthServer
    servicePublishingStrategy:
      type: Route
~~~  

On the other hand, if any custom hostname parameter is configured, the oauth route is created in the management cluster with the following labels: 

~~~
$ oc get hostedcluster -n <namespace> <hosted-cluster-name> -oyaml
  - service: OAuthServer
    servicePublishingStrategy:
      route:
        hostname: oauth.<custom-domain>
      type: Route

$ oc get routes -n hcp-ns --show-labels
NAME    HOST/PORT             LABELS
oauth oauth.<custom-domain>  hypershift.openshift.io/hosted-control-plane=hcp-ns <---
~~~

The configured label makes the ingresscontroller does not admit the route as the following configuration is added by hypershift operator to the default ingresscontroller resource: 

~~~
$ oc get ingresscontroller -n openshift-ingress-default default -oyaml
    routeSelector:
      matchExpressions:
      - key: hypershift.openshift.io/hosted-control-plane <---
        operator: DoesNotExist <---
~~~

This configuration should be allowed as there are use-cases where the route should have a customized hostname. Currently the HCP platform is not allowing this configuration and the oauth route does not work.

Version-Release number of selected component (if applicable):

   4.15

How reproducible:

    Easily

Steps to Reproduce:

    1. Install HCP cluster 
    2. Configure OAuthServer with type Route 
    3. Add a custom hostname different than default wildcard ingress URL from management cluster

Actual results:

    Oauth route is not admitted

Expected results:

    Oauth route should be admitted by Ingresscontroller

Additional info:

https://github.com/openshift/hypershift/pull/4961

Bug OCPBUGS-23786: After PatternFly5 update: Snippets in Quick starts aren't readable in dark mode

View the Description View the linked PRs

Issue 57 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Open the "Add Helm Chart Repositories to extend the Developer Catalog for your project" quick start. Go to the next step. You will see a code sample that does not have the right style if you've enabled dark theme.

Note: Could we check if we can also update the PatternFly quickstart extension??

Screenshot: https://drive.google.com/file/d/1hxh5VI2S7jLKRdNlDQsdlAXL_G7TxtME/view?usp=sharing

https://github.com/openshift/console/pull/13366

Bug OCPBUGS-26051: AWS: The installer doesn’t precheck if node architecture and vm type are consistent

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25600~~. The following is the description of the original issue:
—
Description of problem:

The installer doesn’t do precheck if node architecture and vm type are consistent for aws and gcp, it works on azure

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-multi-2023-12-06-195439

How reproducible:

   Always

Steps to Reproduce:

    1.Config compute architecture field to arm64 but vm type choose amd64 instance type in install-config     
    2.Create cluster 
    3.Check installation

Actual results:

Azure will precheck if architecture is consistent with instance type when creating manifests, like:
12-07 11:18:24.452 [INFO] Generating manifests files.....12-07 11:18:24.452 level=info msg=Credentials loaded from file "/home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/azurecreds20231207-285-jd7gpj"
12-07 11:18:56.474 level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: controlPlane.platform.azure.type: Invalid value: "Standard_D4ps_v5": instance type architecture 'Arm64' does not match install config architecture amd64

But aws and gcp don’t have precheck, it will fail during installation, but many resources have been created. The case more likely to happen in multiarch cluster

Expected results:

The installer can do a precheck for architecture and vm type , especially for heterogeneous supported platforms(aws,gcp,azure)

Additional info:

https://github.com/openshift/installer/pull/7868

Bug OCPBUGS-28818: [4.15.z] Azure - OCP IPI Installation UDP packets are subject to SNAT with LB Service using ETP equals to Local (OVN-Kubernetes as CNI)

View the Description View the linked PRs

UDP Packets are subject to SNAT in a self-managed OCP 4.13.13 cluster on Azure (OVN-K as CNI) using a Load Balancer Service with `externalTrafficPolicy: Local`. UDP Packets correctly arrive to the Node hosting the Pod but the source IP seen by the Pod is the OVN GW Router of the Node.

I've reproduced the customer scenario with the following steps:

Deploy a blank enviroment on Azure using [demo lab | https://demo.redhat.com/catalog?item=babylon-catalog-prod/azure-gpte.open-environment-azure-subscription.prod&utm_source=webapp&utm_medium=share-link ]
Install an OCP 4.13.13 cluster with the installer
Use this GitHub repo to deploy a simple UDP server
Run `oc apply -f server.yaml` to deploy Deployment and Service resources
application listens on 10001 for UDP traffic, while the load balancer listens on 10001 for UDP traffic and forward it on 10001 pod port
Using nc from the bastion host to connect to the external load balancer IP and starting writing something

This is issue is very critical because it is blocking customer business.

https://github.com/openshift/ovn-kubernetes/pull/2045

Bug OCPBUGS-17279: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-aws/pull/478

Bug OCPBUGS-19546: YAML editor shows different style in console for configmaps with data exceeding 78 Characters

View the Description View the linked PRs

Testcases:

1. Create a configmap from a file with 77 characters in a line

File data:
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

CLI data:

$ oc get cm cm-test4 -o yaml
apiVersion: v1
data:
  cm-test4: |                                                                              ##Noticed the Literal style
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-28T12:39:43Z"
  name: cm-test4
  namespace: configmap-test
  resourceVersion: "8962738"
  uid: cf0e264b-72fb-4df7-bd3a-f3ed62423367


UI data:

kind: ConfigMap
apiVersion: v1
metadata:
  name: cm-test4
  namespace: configmap-test
  uid: cf0e264b-72fb-4df7-bd3a-f3ed62423367
  resourceVersion: '8962738'
  creationTimestamp: '2022-09-28T12:39:43Z'
  managedFields:
    - manager: kubectl-create
      operation: Update
      apiVersion: v1
      time: '2022-09-28T12:39:43Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:cm-test4': {}
data:
  cm-test4: |                                                                      ##Noticed the Literal style
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

2. Create a configmap from a file with characters more than 78 in a line,

File Data:
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

CLI Data:

$ oc get cm cm-test5 -o yaml
apiVersion: v1
data:
  cm-test5: |                                                                              ##Noticed the Literal style
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-28T12:39:54Z"
  name: cm-test5
  namespace: configmap-test
  resourceVersion: "8962813"
  uid: b8b12653-588a-4afc-8ed9-ff7c6ebaefb1

UI data:

kind: ConfigMap
apiVersion: v1
metadata:
  name: cm-test5
  namespace: configmap-test
  uid: b8b12653-588a-4afc-8ed9-ff7c6ebaefb1
  resourceVersion: '8962813'
  creationTimestamp: '2022-09-28T12:39:54Z'
  managedFields:
    - manager: kubectl-create
      operation: Update
      apiVersion: v1
      time: '2022-09-28T12:39:54Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:cm-test5': {}
data:
  cm-test5: >                                                                         ##Noticed the Folded style and newlines in between data
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

Conclusion:

When the CM is created with more than 78 characters in a single line the yaml editor in the web UI changes the style to folded and could see newline in between data.

https://github.com/openshift/console/pull/13182

Bug OCPBUGS-36278: [4.15] haproxy crashlooping fresh install Openshift 4.14.10

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35743~~. The following is the description of the original issue:
—
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Clone of https://issues.redhat.com/browse/OCPBUGS-32141 for 4.16
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Description of problem:
VIP's are on a different network than the machine network on a 4.14 cluster

failing cluster: 4:14

Infrastructure
--------------
Platform: VSphere
Install Type: IPI
apiServerInternalIP: 10.8.0.83
apiServerInternalIPs: 10.8.0.83
ingressIP: 10.8.0.84
ingressIPs: 10.8.0.84

All internal IP addresses of all nodes match the Machine Network.

Machine Network: 10.8.42.0/23

Node name IP Address Matches CIDR
..............................................................................................................
sv1-prd-ocp-int-bn8ln-master-0 10.8.42.24 YES
sv1-prd-ocp-int-bn8ln-master-1 10.8.42.35 YES
sv1-prd-ocp-int-bn8ln-master-2 10.8.42.36 YES
sv1-prd-ocp-int-bn8ln-worker-0-5rbwr 10.8.42.32 YES
sv1-prd-ocp-int-bn8ln-worker-0-h7fq7 10.8.42.49 YES

logs from one of the haproxy pods

oc logs -n openshift-vsphere-infra haproxy-sv1-prd-ocp-int-bn8ln-master-0 haproxy-monitor
.....
2024-04-02T18:48:57.534824711Z time="2024-04-02T18:48:57Z" level=info msg="An error occurred while trying to read master nodes details from api-vip:kube-apiserver: failed find a interface for the ip 10.8.0.83"
2024-04-02T18:48:57.534849744Z time="2024-04-02T18:48:57Z" level=info msg="Trying to read master nodes details from localhost:kube-apiserver"
2024-04-02T18:48:57.544507441Z time="2024-04-02T18:48:57Z" level=error msg="Could not retrieve subnet for IP 10.8.0.83" err="failed find a interface for the ip 10.8.0.83"
2024-04-02T18:48:57.544507441Z time="2024-04-02T18:48:57Z" level=error msg="Failed to retrieve API members information" kubeconfigPath=/var/lib/kubelet/kubeconfig
2024-04-02T18:48:57.544507441Z time="2024-04-02T18:48:57Z" level=info msg="GetLBConfig failed, sleep half of interval and retry" kubeconfigPath=/var/lib/kubelet/kubeconfig
2024-04-02T18:49:00.572652095Z time="2024-04-02T18:49:00Z" level=error msg="Could not retrieve subnet for IP 10.8.0.83" err="failed find a interface for the ip 10.8.0.83"

There is a kcs that addresses this:
https://access.redhat.com/solutions/7037425

Howerver, this same configuration works in production on 4.12

working cluster:
Infrastructure
--------------
Platform: VSphere
Install Type: IPI
apiServerInternalIP: 10.8.0.73
apiServerInternalIPs: 10.8.0.73
ingressIP: 10.8.0.72
ingressIPs: 10.8.0.72

All internal IP addresses of all nodes match the Machine Network.

Machine Network: 10.8.38.0/23

Node name IP Address Matches CIDR
..............................................................................................................
sb1-prd-ocp-int-qls2m-cp4d-4875s 10.8.38.29 YES
sb1-prd-ocp-int-qls2m-cp4d-phczw 10.8.38.19 YES
sb1-prd-ocp-int-qls2m-cp4d-ql5sj 10.8.38.43 YES
sb1-prd-ocp-int-qls2m-cp4d-svzl7 10.8.38.27 YES
sb1-prd-ocp-int-qls2m-cp4d-x286s 10.8.38.18 YES
sb1-prd-ocp-int-qls2m-cp4d-xk48m 10.8.38.40 YES
sb1-prd-ocp-int-qls2m-master-0 10.8.38.25 YES
sb1-prd-ocp-int-qls2m-master-1 10.8.38.24 YES
sb1-prd-ocp-int-qls2m-master-2 10.8.38.30 YES
sb1-prd-ocp-int-qls2m-worker-njzdx 10.8.38.15 YES
sb1-prd-ocp-int-qls2m-worker-rhqn5 10.8.38.39 YES

logs from one of the haproxy pods

2023-08-18T21:12:19.730010034Z time="2023-08-18T21:12:19Z" level=info msg="API is not reachable through HAProxy"
2023-08-18T21:12:19.755357706Z time="2023-08-18T21:12:19Z" level=info msg="Config change detected" configChangeCtr=1 curConfig="{6443 9445 29445 [

{sb1-prd-ocp-int-qls2m-master-1 10.8.38.24 6443} {sb1-prd-ocp-int-qls2m-master-0 10.8.38.25 6443} {sb1-prd-ocp-int-qls2m-master-2 10.8.38.30 6443}] }"
2023-08-18T21:12:19.782529185Z time="2023-08-18T21:12:19Z" level=info msg="Removing existing nat PREROUTING rule" spec="--dst 10.8.0.73 -p tcp --dport 6443 -j REDIRECT --to-ports 9445 -m comment --comment OCP_API_LB_REDIRECT"
2023-08-18T21:12:19.794532220Z time="2023-08-18T21:12:19Z" level=info msg="Removing existing nat OUTPUT rule" spec="--dst 10.8.0.73 -p tcp --dport 6443 -j REDIRECT --to-ports 9445 -m comment --comment OCP_API_LB_REDIRECT -o lo"
2023-08-18T21:12:25.816406455Z time="2023-08-18T21:12:25Z" level=info msg="Config change detected" configChangeCtr=2 curConfig="{6443 9445 29445 [{sb1-prd-ocp-int-qls2m-master-1 10.8.38.24 6443}

{sb1-prd-ocp-int-qls2m-master-0 10.8.38.25 6443} {sb1-prd-ocp-int-qls2m-master-2 10.8.38.30 6443}] }"
2023-08-18T21:12:25.919248671Z time="2023-08-18T21:12:25Z" level=info msg="Removing existing nat PREROUTING rule" spec="--dst 10.8.0.73 -p tcp --dport 6443 -j REDIRECT --to-ports 9445 -m comment --comment OCP_API_LB_REDIRECT"
2023-08-18T21:12:25.965663811Z time="2023-08-18T21:12:25Z" level=info msg="Removing existing nat OUTPUT rule" spec="--dst 10.8.0.73 -p tcp --dport 6443 -j REDIRECT --to-ports 9445 -m comment --comment OCP_API_LB_REDIRECT -o lo"
2023-08-18T21:12:32.005310398Z time="2023-08-18T21:12:32Z" level=info msg="Config change detected" configChangeCtr=3 curConfig="{6443 9445 29445 [{sb1-prd-ocp-int-qls2m-master-1 10.8.38.24 6443} {sb1-prd-ocp-int-qls2m-master-0 10.8.38.25 6443}

{sb1-prd-ocp-int-qls2m-master-2 10.8.38.30 6443}

] }"

The data is being redirected

found this in the sos report: sos_commands/firewall_tables/

nft_-a_list_ruleset

table ip nat { # handle 2
chain PREROUTING

{ # handle 1 type nat hook prerouting priority dstnat; policy accept; meta l4proto tcp ip daddr 10.8.0.73 tcp dport 6443 counter packets 0 bytes 0 redirect to :9445 # handle 66 counter packets 82025408 bytes 5088067290 jump OVN-KUBE-ETP # handle 30 counter packets 82025421 bytes 5088068062 jump OVN-KUBE-EXTERNALIP # handle 28 counter packets 82025439 bytes 5088069114 jump OVN-KUBE-NODEPORT # handle 26 }

chain INPUT

{ # handle 2 type nat hook input priority 100; policy accept; }

chain POSTROUTING

{ # handle 3 type nat hook postrouting priority srcnat; policy accept; counter packets 245475292 bytes 16221809463 jump OVN-KUBE-EGRESS-SVC # handle 25 oifname "ovn-k8s-mp0" counter packets 58115015 bytes 4184247096 jump OVN-KUBE-SNAT-MGMTPORT # handle 16 counter packets 187360548 bytes 12037581317 jump KUBE-POSTROUTING # handle 10 }

chain OUTPUT

{ # handle 4 type nat hook output priority -100; policy accept; oifname "lo" meta l4proto tcp ip daddr 10.8.0.73 tcp dport 6443 counter packets 0 bytes 0 redirect to :9445 # handle 67 counter packets 245122162 bytes 16200621351 jump OVN-KUBE-EXTERNALIP # handle 29 counter packets 245122163 bytes 16200621411 jump OVN-KUBE-NODEPORT # handle 27 counter packets 245122166 bytes 16200621591 jump OVN-KUBE-ITP # handle 24 }

... many more lines ...

This code was not added by the customer

None of the redirect statements are in the same file for 4.14 (the failing cluster)

ocp 4.14: (if applicable):{code:none}

How reproducible:100%

    Steps to Reproduce:{code:none}
This is the install script that our ansible job uses to install 4.12

If you need it cleared up let me know, all the items in {{}} are just variables for file paths

cp -r {{  item.0.cluster_name }}/install-config.yaml {{ openshift_base }}{{  item.0.cluster_name }}/
./openshift-install create manifests --dir {{ openshift_base }}{{  item.0.cluster_name }}/
cp -r machineconfigs/* {{ openshift_base }}{{  item.0.cluster_name }}/openshift/
cp -r {{  item.0.cluster_name }}/customizations/* {{ openshift_base }}{{  item.0.cluster_name }}/openshift/
./openshift-install create ignition-configs --dir {{ openshift_base }}{{  item.0.cluster_name }}/
./openshift-install create cluster --dir {{ openshift_base }}{{  item.0.cluster_name }} --log-level=debug

We are installing IPI on vmware

API and Ingress VIPs are configured on our external load balancer appliance. (Citrix ADCs if that matters)

Actual results:


haproxy pods crashloop and do not work
In 4.14 following the same install workflow neither the API or Ingress IP binds to masters or workers and we see HAPROXY crashlooping

Expected results:


for 4.12
Following a completion of 4.12 if we look in vmware at our master and worker nodes we will see all of them have an IP address from the machine network assigned to them, and one node from both masters and workers will have the VIP bound to them as well.

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/322

Bug MGMT-13461: Day-2 hosts cannot join imported Tang clusters

View the Description View the linked PRs

Description of the problem:

Agents don't run the StepTypeTangConnectivityCheck step on day-2 hosts in imported clusters

How reproducible:

Unknown

Steps to reproduce:

1. Install day-1 cluster with Tang

2. Attempt to add day-2 host

Actual results:

disk-encryption-requirements-satisfied stuck pending

Expected results:

disk-encryption-requirements-satisfied should be eventually either failed or success

https://github.com/openshift/assisted-service/pull/5700

Bug OCPBUGS-22200: Workers fail to join cluster if metadata service is temporarily unavailable on first boot

View the Description View the linked PRs

This was originally reported in AWS (details below), but the OpenStack configuration suffers the same issue. If the metadata query for the instance name fails on initial boot, kubelet will start with an invalid nodename and will fail to come up.

Description of problem:

worker CSR are pending, so no worker nodes available

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-06-234925

How reproducible:

Always

Steps to Reproduce:

Create a cluster with profile - aws-c2s-ipi-disconnected-private-fips

Actual results:

Workers csrs are pending

Expected results:

workers should be up and running all CSRs approved

Additional info:

failed to find machine for node ip-10-143-1-120” , in logs of cluster-machine-approver 

Seems like we should have ips like 
“ip-10-143-1-120.ec2.internal”

failing here - https://github.com/openshift/cluster-machine-approver/blob/master/pkg/controller/csr_check.go#L263

Must-gather - https://drive.google.com/file/d/15tz9TLdTXrH6bSBSfhlIJ1l_nzeFE1R3/view?usp=sharing

cluster - https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/238922/

template for installation - https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-fips-c2s-ci

cc Yunfei Jiang Zhaohua Sun

https://github.com/openshift/machine-config-operator/pull/3990

Bug OCPBUGS-24097: Update 4.15 ose-machine-api-provider-openstack-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-openstack/pull/99

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-openstack/pull/99

Bug OCPBUGS-28762: [azure] permissions required on customer vnet when installing private cluster by using workload identity

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25193~~. The following is the description of the original issue:
—
Description of problem:

Install private cluster by using azure workload identity, and failed due to no worker machines being provisioned.

install-config:
----------------------
platform:
  azure:
    region: eastus
    networkResourceGroupName: jima971b-12015319-rg
    virtualNetwork: jima971b-vnet
    controlPlaneSubnet: jima971b-master-subnet
    computeSubnet: jima971b-worker-subnet
    resourceGroupName: jima971b-rg
publish: Internal
credentialsMode: Manual

Detailed check on cluster and found machine-api/ingress/image-registry operators reported permissions issues and have no access to customer vnet.

$ oc get machine -n openshift-machine-api
NAME                                  PHASE     TYPE              REGION   ZONE   AGE
jima971b-qqjb7-master-0               Running   Standard_D8s_v3   eastus   2      5h14m
jima971b-qqjb7-master-1               Running   Standard_D8s_v3   eastus   3      5h14m
jima971b-qqjb7-master-2               Running   Standard_D8s_v3   eastus   1      5h15m
jima971b-qqjb7-worker-eastus1-mtc47   Failed                                      4h52m
jima971b-qqjb7-worker-eastus2-ph8bk   Failed                                      4h52m
jima971b-qqjb7-worker-eastus3-hpmvj   Failed                                      4h52m

Errors on worker machine:
--------------------
  errorMessage: 'failed to reconcile machine "jima971b-qqjb7-worker-eastus1-mtc47":
    network.SubnetsClient#Get: Failure responding to request: StatusCode=403 -- Original
    Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed"
    Message="The client ''705eb743-7c91-4a16-a7cf-97164edc0341'' with object id ''705eb743-7c91-4a16-a7cf-97164edc0341''
    does not have authorization to perform action ''Microsoft.Network/virtualNetworks/subnets/read''
    over scope ''/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima971b-12015319-rg/providers/Microsoft.Network/virtualNetworks/jima971b-vnet/subnets/jima971b-worker-subnet''
    or the scope is invalid. If access was recently granted, please refresh your credentials."'
  errorReason: InvalidConfiguration

After manually creating customer role with missed permissions for machine-api/ingress/cloud-controller-manager/image-registry, and assigning it to machine-api/ingress/cloud-controller-manager/image-registry user-assigned identity on scope of customer vnet, cluster was recovered and became running.

Permissions for machine-api/cloud-controller-manager/ingress on customer vnet:
"Microsoft.Network/virtualNetworks/subnets/read",
"Microsoft.Network/virtualNetworks/subnets/join/action"

Permissions for image-registry on customer vnet:
"Microsoft.Network/virtualNetworks/subnets/read",
"Microsoft.Network/virtualNetworks/subnets/join/action"
"Microsoft.Network/virtualNetworks/join/action"

Version-Release number of selected component (if applicable):

    4.15 nightly build

How reproducible:

    always on recent 4.15 payload

Steps to Reproduce:

    1. prepare install-config with private cluster configuration + credentialsMode: Manual
    2. using ccoctl tool to create workload identity
    3. install cluster

Actual results:

    Installation failed due to permission issues

Expected results:

    ccoctl also needs to assign customer role to machine-api/ccm/image-registry user-assigned identity on scope of customer vnet if it is configured in install-config

Additional info:

Issue is only detected on 4.15, it works on 4.14.

https://github.com/openshift/machine-api-operator/pull/1216

Bug OCPBUGS-34949: prometheus bound service token causing issues with version skew between mgmt and cluster-under-test

View the Description View the linked PRs

hypershift e2e and conformance for 4.15 and 4.14 clusters under tests are failing due to version skew with the mgmt cluster version (4.16).

As of 4.16.0-rc3, openshift-controller-manager switched to using bound service account tokens in https://issues.redhat.com/browse/API-1644

Two PRs were made against openshift/origin and openshift/library-go to support this for 4.16.

https://github.com/openshift/origin/pull/28679
https://github.com/openshift/library-go/pull/1697

However, for 4.15 and 4.14 tests, both repos do not have those changes. The changes might not be backward compatible (need feedback from Luis Sanchez on this).

In o/o for example, it is using 4.15 o/o code to do prom operation on a 4.16 mgmt cluster.
https://github.com/openshift/origin/blob/2320880deab4c456c7d8b157ea7dc91b85c85302/test/extended/etcd/leader_changes.go#L32

https://github.com/openshift/origin/pull/28919

Bug OCPBUGS-22947: oc-mirror panic when use v2

View the Description View the linked PRs

Description of problem:

oc-mirror will hit panic when use v2 and mirror from disk to registry

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create  imageset that we are using:
cat config.yaml 
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
mirror:
  platform:
    channels:
      - name: stable-4.13
        minVersion: 4.13.13
        maxVersion: 4.13.13
    graph: true
2. Mirror to disk by command :
`oc-mirror --config config.yaml file://out  --v2`
3. Mirror from disk to registry by command:
`oc-mirror --config config.yaml  --from out/working-dir/ docker://ec2-18-217-139-237.us-east-2.compute.amazonaws.com:5000/ocpv2  --v2`

Actual results:
oc-mirror --from out/working-dir/ docker://ec2-18-217-139-237.us-east-2.compute.amazonaws.com:5000/ocpv2 --v2
--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used.
2023/11/06 03:10:19 [ERROR] : use the --config flag it is mandatory
[root@preserve-fedora36 1106]# oc-mirror --config config.yaml --from out/working-dir/ docker://ec2-18-217-139-237.us-east-2.compute.amazonaws.com:5000/ocpv2 --v2
--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used.
panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/openshift/oc-mirror/v2/pkg/cli.(*ExecutorSchema).Complete(0xc000c28a80, {0xc00012cd20, 0x1, 0x0?})
/go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:330 +0x1a18
github.com/openshift/oc-mirror/v2/pkg/cli.NewMirrorCmd.func1(0xc000005500?, {0xc00012cd20, 0x1, 0x6})
/go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:137 +0xfd
github.com/spf13/cobra.(*Command).execute(0xc000005500, {0xc000052080, 0x6, 0x6})
/go/src/github.com/openshift/oc-mirror/vendor/github.com/spf13/cobra/command.go:944 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc000005500)
/go/src/github.com/openshift/oc-mirror/vendor/github.com/spf13/cobra/command.go:1068 +0x3bd
github.com/spf13/cobra.(*Command).Execute(0x0?)
/go/src/github.com/openshift/oc-mirror/vendor/github.com/spf13/cobra/command.go:992 +0x19
main.main()
/go/src/github.com/openshift/oc-mirror/cmd/oc-mirror/main.go:10 +0x1e

Expected results:

No panic

Additional info:

https://github.com/openshift/oc-mirror/pull/725

Bug OCPBUGS-23553: update packages in ironic-agent

View the Description View the linked PRs

update packages versions in ironic-agent container to bring in latest fixes

Bug OCPBUGS-10906: machine-os-images rhcos version not in sync with installer metadata

View the Description View the linked PRs

A case was found recently (see https://github.com/openshift/machine-os-images/pull/27) where the rhcos image version stored within the machine-os-images was different than the one reported in the installer rhcos metadata.
This sync is particular relevant for the agent-based installer, since the create image command logic could fetch the base ISO either from the machine-os-images content either from a direct download, depending on the availability or not of the oc command in the current execution environment.

Even though this scenario is very unlikely to happen in production, a missing sync between the machine-os-images and the installer metadata may produce different results depending on the environmental condition, and moreover can hide silently severe issues.

https://github.com/openshift/installer/pull/7030

Bug OCPBUGS-19130: Update 4.15 ose-installer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7493

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7493

Task HOSTEDCP-1300: Bump k8s.io/client-go to v0.28.3

View the Description View the linked PRs

Bump k8s.io/client-go to v0.28.3

https://github.com/openshift/hypershift/pull/3191

Bug OCPBUGS-42144: After updating the cluster to openshift 4.15.11 the value for vCenter Cluster in vsphere connnection configuration is missing.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41619~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-39453~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-35321. The following is the description of the original issue:
—
Description of problem:

Customer have update its cluster to 4.15.11 from 4.11.x After updating the cluster to openshift 4.15.11 the value for vCenter Cluster in vsphere connection configuration is missing. From GUI it should be observable.

-> Not displaying the vcenter clsuter name in GUI.

-> We have also see the Cloud-config all things are at it's place but we missing some parameter from openshift console in v-sphere connection configuration.


Please find the attached screenshot for more reference here.

Version-Release number of selected component (if applicable):

How reproducible:

Customer have reproduced that issue we are yet to do so.

Steps to Reproduce:

[x] -- Customer have update it's cluster from 4.11.x to 4.15.11 after upgrade cluster looks fine & healthy in it's state but missing a parameter from the v-sphere connection configuration in open-shift console as shown in attached screenshot.

Expected results:

Additional info:

https://github.com/openshift/console/pull/14307

Bug OCPBUGS-19418: OCP upgrade 4.13 to 4.14 fails with: an unknown error has occurred: MultipleErrors

View the Description View the linked PRs

Description of problem:

OCP Upgrades fail with message "Upgrade error from 4.13.X: Unable to apply 4.14.0-X: an unknown error has occurred: MultipleErrors"

Version-Release number of selected component (if applicable):

Currently 4.14.0-rc.1, but we observed the same issue with previous 4.14 nightlies too: 
4.14.0-0.nightly-2023-09-12-195514
4.14.0-0.nightly-2023-09-02-132842
4.14.0-0.nightly-2023-08-28-154013

How reproducible:

1 out of 2 upgrades

Steps to Reproduce:

1. Deploy OCP 4.13 with latest GA on a baremetal cluster with IPI and OVN-K
2. Upgrade to latest 4.14 available
3. Check cluster version status during the upgrade, at some point upgrade stops with message: "Upgrade error from 4.13.X Unable to apply 4.14.0-X: an unknown error has occurred: MultipleErrors"
4. Check OVN pods "oc get pods -n openshift-ovn-kubernetes", there are pods running 7 out 8 containers (missing ovnkube-node) constantly restarting, and pods running only 5 containers that show errors to connect to the OVN DBs.
5. Check cluster operators "oc get co" mainly dns, network, and machine-config remained in 4.13 and degraded.

Actual results:

Upgrade not completed, and OVN pods remain in a restarting loop with failures.

Expected results:

Upgrade should be completed without issues, and OVN pods should remain in a Running status without restarts.

Additional info:

We have tested this with latest GA versions of 4.13 (as today Sep 19: 4.13.13 to 4.14.0-rc1), but we have been observing this since 20 days ago, with previous versions of 4.13 and 4.14.
Our deployments have single stack IPv4 , one NIC for provisioning and one NIC for baremetal (machine network)

These are the results from our latest test from 4.13.13 to 4.14.0-rc1

$ oc get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         2h8m   Unable to apply 4.14.0-rc.1: an unknown error has occurred: MultipleErrors

$ oc get mcp
NAME    CONFIG                                            UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
master  rendered-master-ebb1da47ad5cb76c396983decb7df1ea  True     False     False     3             3                  3                    0                     3h41m
worker  rendered-worker-26ccb35941236935a570dddaa0b699db  False    True      True      3             2                  2                    1                     3h41m

$ oc get co
NAME                                      VERSION      AVAILABLE  PROGRESSING  DEGRADED  SINCE
authentication                            4.14.0-rc.1  True       False        False     2h21m
baremetal                                 4.14.0-rc.1  True       False        False     3h38m
cloud-controller-manager                  4.14.0-rc.1  True       False        False     3h41m
cloud-credential                          4.14.0-rc.1  True       False        False     2h23m
cluster-autoscaler                        4.14.0-rc.1  True       False        False     2h21m
config-operator                           4.14.0-rc.1  True       False        False     3h40m
console                                   4.14.0-rc.1  True       False        False     2h20m
control-plane-machine-set                 4.14.0-rc.1  True       False        False     3h40m
csi-snapshot-controller                   4.14.0-rc.1  True       False        False     2h21m
dns                                       4.13.13      True       True         True      2h9m
etcd                                      4.14.0-rc.1  True       False        False     2h40m
image-registry                            4.14.0-rc.1  True       False        False     2h9m
ingress                                   4.14.0-rc.1  True       True         True      1h14m
insights                                  4.14.0-rc.1  True       False        False     3h34m
kube-apiserver                            4.14.0-rc.1  True       False        False     2h35m
kube-controller-manager                   4.14.0-rc.1  True       False        False     2h30m
kube-scheduler                            4.14.0-rc.1  True       False        False     2h29m
kube-storage-version-migrator             4.14.0-rc.1  False      True         False     2h9m
machine-api                               4.14.0-rc.1  True       False        False     2h24m
machine-approver                          4.14.0-rc.1  True       False        False     3h40m
machine-config                            4.13.13      True       False        True      59m
marketplace                               4.14.0-rc.1  True       False        False     3h40m
monitoring                                4.14.0-rc.1  False      True         True      2h3m
network                                   4.13.13      True       True         True      2h4m
node-tuning                               4.14.0-rc.1  True       False        False     2h9m
openshift-apiserver                       4.14.0-rc.1  True       False        False     2h20m
openshift-controller-manager              4.14.0-rc.1  True       False        False     2h20m
openshift-samples                         4.14.0-rc.1  True       False        False     2h23m
operator-lifecycle-manager                4.14.0-rc.1  True       False        False     2h23m
operator-lifecycle-manager-catalog        4.14.0-rc.1  True       False        False     2h18m
operator-lifecycle-manager-packageserver  4.14.0-rc.1  True       False        False     2h20m
service-ca                                4.14.0-rc.1  True       False        False     2h23m
storage                                   4.14.0-rc.1  True       False        False     3h40m

Some OVN pods are running 7 out 8 containers (missing ovnkube-node) constantly restarting, and pods running only 5 containers that show errors to connect to the OVN DBs.

$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                                    READY  STATUS   RESTARTS  AGE    IP             NODE
ovnkube-control-plane-5f5c598768-czkjv  2/2    Running  0         2h16m  192.168.16.32  dciokd-master-1
ovnkube-control-plane-5f5c598768-kg69r  2/2    Running  0         2h16m  192.168.16.31  dciokd-master-0
ovnkube-control-plane-5f5c598768-prfb5  2/2    Running  0         2h16m  192.168.16.33  dciokd-master-2
ovnkube-node-9hjv9                      5/5    Running  1         3h43m  192.168.16.32  dciokd-master-1
ovnkube-node-fmswc                      7/8    Running  19        2h10m  192.168.16.36  dciokd-worker-2
ovnkube-node-pcjhp                      7/8    Running  20        2h15m  192.168.16.35  dciokd-worker-1
ovnkube-node-q7kcj                      5/5    Running  1         3h43m  192.168.16.33  dciokd-master-2
ovnkube-node-qsngm                      5/5    Running  3         3h27m  192.168.16.34  dciokd-worker-0
ovnkube-node-v2d4h                      7/8    Running  20        2h15m  192.168.16.31  dciokd-master-0

$ oc logs ovnkube-node-9hjv9 -c ovnkube-node -n openshift-ovn-kubernetes | less
...
2023-09-19T03:40:23.112699529Z E0919 03:40:23.112660    5883 ovn_db.go:511] Failed to retrieve cluster/status info for database "OVN_Northbound", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnnb_db.ctl
2023-09-19T03:40:23.112699529Z ovn-appctl: cannot connect to "/var/run/ovn/ovnnb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.112699529Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 cluster/status OVN_Northbound' failed: exit status 1)
2023-09-19T03:40:23.112699529Z E0919 03:40:23.112677    5883 ovn_db.go:590] OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 cluster/status OVN_Northbound' failed: exit status 1
2023-09-19T03:40:23.114791313Z E0919 03:40:23.114777    5883 ovn_db.go:283] Failed retrieving memory/show output for "OVN_NORTHBOUND", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnnb_db.ctl
2023-09-19T03:40:23.114791313Z ovn-appctl: cannot connect to "/var/run/ovn/ovnnb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.114791313Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 memory/show' failed: exit status 1)
2023-09-19T03:40:23.116492808Z E0919 03:40:23.116478    5883 ovn_db.go:511] Failed to retrieve cluster/status info for database "OVN_Southbound", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
2023-09-19T03:40:23.116492808Z ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.116492808Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 cluster/status OVN_Southbound' failed: exit status 1)
2023-09-19T03:40:23.116492808Z E0919 03:40:23.116488    5883 ovn_db.go:590] OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 cluster/status OVN_Southbound' failed: exit status 1
2023-09-19T03:40:23.118468064Z E0919 03:40:23.118450    5883 ovn_db.go:283] Failed retrieving memory/show output for "OVN_SOUTHBOUND", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
2023-09-19T03:40:23.118468064Z ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.118468064Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 memory/show' failed: exit status 1)
2023-09-19T03:40:25.118085671Z E0919 03:40:25.118056    5883 ovn_northd.go:128] Failed to get ovn-northd status stderr() :(failed to run the command since failed to get ovn-northd's pid: open /var/run/ovn/ovn-northd.pid: no such file or directory)

https://github.com/openshift/cluster-network-operator/pull/2018

Bug OCPBUGS-24375: oc process command fails while running it with a template file

View the Description View the linked PRs

Description of problem:

oc process command fails while running it with a template file

Version-Release number of selected component (if applicable):

4.12.41

How reproducible:

100%

Steps to Reproduce:

1. Create a new project and a template file 
$ oc new-project test
$ oc get template httpd-example -n openshift -o yaml > /tmp/template_http.yaml 

2. Run oc process command as given below
$ oc process -f /tmp/template_http.yaml 
error: unable to process template: the namespace of the provided object does not match the namespace sent on the request

3. When we run this command as a template from other namespace it runs fine.
$ oc process openshift//httpd-example

4. $ oc version
Client Version: 4.12.41
Kustomize Version: v4.5.7
Server Version: 4.12.42
Kubernetes Version: v1.25.14+bcb9a60

Actual results:

$ oc process -f /tmp/template_http.yaml
error: unable to process template: the namespace of the provided object does not match the namespace sent on the request

Expected results:

Command should display the output of resources it will create

Additional info:

https://github.com/openshift/oc/pull/1612

Bug OCPBUGS-24668: [release-4.15] VPAs from different projects are shown under one deployment "Resources" tab

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23925~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-29515: [release-4.15] Core CAPI CRDs not deployed on unsupported platforms even when explicitly needed by other operators

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29476~~. The following is the description of the original issue:
—
Description of problem:

Core CAPI CRDs not deployed on unsupported platforms even when explicitly needed by other operators.

An example of this is on VSphere clusters. CAPI is not yet supported on VSphere clusters, but the CAPI IPAM CRDs, are needed by other operators than the usual consumer, cluster-capi-operator and the CAPI controllers.

Version-Release number of selected component (if applicable):

How reproducible:

    Launch a techpreview cluster for an unsupported platform (e.g. vsphere/azure). Check that the Core CAPI CRDs are not present.

Steps to Reproduce:

    $ oc get crds | grep cluster.x-k8s.io

Actual results:

    Core CAPI CRDs are not present (only the metal ones)

Expected results:

    Core CAPI CRDs should be present

Additional info:

Bug OCPBUGS-31842: [release-4.15] OLM: Catalog Pods CrashLoopBackOff after Cluster `WakesUp` from Hibernating

View the Description View the linked PRs

Description of problem:

   If a cluster is put into Hibernation via ACM/Hive, and during the time that the cluster is a asleep, if any of the 4 catalogs that ship with OCP's digest gets update (ie a new bundle is added to the catalog tag, 4,15 in this case), then the cluster is woken up, the catalog(s) Pods that were updated now CrashLoopBackOff, and cause the cluster to be in an un-usable state. By unusable state, it means that no other operator subscriptions, or catalog sources can be applied to the cluster.

Version-Release number of selected component (if applicable):

    OCP 4.15

How reproducible:

   See below

Steps to Reproduce:

    1. Create a 4.15 cluster with ACM/Hive.
    2. Wait for it to become Health.
    3. Put the cluster into Hibernation.
    4. While the cluster is asleep, add a new bundle to an existing       catalog (yes, I understand catalogs are immutable, and this would result in a new catalog) such that the digest changes.
    5. Wake the cluster up via ACK/Hive.
    6. Either (both yeild similar logs/results):
       a: Note that the pods in `openshift-marketplace` namespace are in CrashBackLoop state.
       b: Create a new subscription and not that this fails, since the catalogs are unhealthy.

Actual results:

    New catalogs, subscriptions, operators can't be applied to the cluster.

Expected results:

    I'd expect that when a cluster wakes up that the catalogs are healthy, no matter if they have a different digest then when the cluster went to sleep.

Additional info:

    Code in question (ie throwing the error): https://github.com/operator-framework/operator-registry/blob/master/pkg/cache/json.go#L181-L194

Log from certified-operator (note it could be any pod, we have examples of marketplace as well) pod:

 time="2024-04-02T01:01:22Z" level=info msg="starting pprof endpoint" address="localhost:6060"
2
time="2024-04-02T01:01:22Z" level=fatal msg="cache requires rebuild: cache reports digest as \"2e210f20d7ad085a\", but computed digest is \"9d0c54855f748780\""

Custom Catalog Subscription
{"apiVersion":"operators.coreos.com/v1alpha1","kind":"Subscription","metadata":{"creationTimestamp":"2024-04-01T22:35:35Z","generation":1,"labels":{"operators.coreos.com/nginx-ingress-operator.nginx-ingress-operator":""},"managedFields":[{"apiVersion":"operators.coreos.com/v1alpha1","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:operators.coreos.com/nginx-ingress-operator.nginx-ingress-operator":{}}}},"manager":"Go-http-client","operation":"Update","time":"2024-04-01T22:35:35Z"},{"apiVersion":"operators.coreos.com/v1alpha1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{".":{},"f:channel":{},"f:name":{},"f:source":{},"f:sourceNamespace":{}}},"manager":"preflight","operation":"Update","time":"2024-04-01T22:35:35Z"},{"apiVersion":"operators.coreos.com/v1alpha1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:catalogHealth":{},"f:conditions":{},"f:lastUpdated":{}}},"manager":"catalog","operation":"Update","subresource":"status","time":"2024-04-01T22:38:35Z"}],"name":"nginx-ingress-operator","namespace":"nginx-ingress-operator","resourceVersion":"40266","uid":"b5ee2e64-c43e-4062-bbb8-4a1f68518753"},"spec":{"channel":"alpha","name":"nginx-ingress-operator","source":"nginx-ingress-operator","sourceNamespace":"nginx-ingress-operator"},"status":{"catalogHealth":[{"catalogSourceRef":{"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","name":"nginx-ingress-operator","namespace":"nginx-ingress-operator","resourceVersion":"39191","uid":"a1e95a54-2302-4cd1-9ad3-1c352c8f1379"},"healthy":true,"lastUpdated":"2024-04-01T22:36:10Z"},{"catalogSourceRef":{"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","name":"certified-operators","namespace":"openshift-marketplace","resourceVersion":"38032","uid":"5180bff4-d2e2-45e3-a24e-bb37826feef5"},"healthy":true,"lastUpdated":"2024-04-01T22:36:10Z"},{"catalogSourceRef":{"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","name":"community-operators","namespace":"openshift-marketplace","resourceVersion":"38073","uid":"8e2e195a-cb09-42ac-b931-666f798ab68f"},"healthy":true,"lastUpdated":"2024-04-01T22:36:10Z"},{"catalogSourceRef":{"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","name":"redhat-marketplace","namespace":"openshift-marketplace","resourceVersion":"38078","uid":"11781d2c-03c0-48bb-b29a-06b9a5e1990f"},"healthy":true,"lastUpdated":"2024-04-01T22:36:10Z"},{"catalogSourceRef":{"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","name":"redhat-operators","namespace":"openshift-marketplace","resourceVersion":"38079","uid":"3ea5a9a4-039b-4646-9f47-484e13589e83"},"healthy":true,"lastUpdated":"2024-04-01T22:36:10Z"}],"conditions":[{"message":"[failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.30.209.124:50051: connect: connection refused\", failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.30.243.69:50051: connect: connection refused\"]","reason":"ErrorPreventedResolution","status":"True","type":"ResolutionFailed"},{"lastTransitionTime":"2024-04-01T22:36:10Z","message":"all available catalogsources are healthy","reason":"AllCatalogSourcesHealthy","status":"False","type":"CatalogSourcesUnhealthy"}],"lastUpdated":"2024-04-01T22:38:34Z"}}

Custom CatalogSoruce
{"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","metadata":{"creationTimestamp":"2024-04-01T22:35:35Z","generation":1,"managedFields":[{"apiVersion":"operators.coreos.com/v1alpha1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{".":{},"f:displayName":{},"f:icon":{".":{},"f:base64data":{},"f:mediatype":{}},"f:image":{},"f:secrets":{},"f:sourceType":{}}},"manager":"preflight","operation":"Update","time":"2024-04-01T22:35:35Z"},{"apiVersion":"operators.coreos.com/v1alpha1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:connectionState":{".":{},"f:address":{},"f:lastConnect":{},"f:lastObservedState":{}},"f:registryService":{".":{},"f:createdAt":{},"f:port":{},"f:protocol":{},"f:serviceName":{},"f:serviceNamespace":{}}}},"manager":"catalog","operation":"Update","subresource":"status","time":"2024-04-01T22:36:03Z"}],"name":"nginx-ingress-operator","namespace":"nginx-ingress-operator","resourceVersion":"39191","uid":"a1e95a54-2302-4cd1-9ad3-1c352c8f1379"},"spec":{"displayName":"nginx-ingress-operator","icon":{"base64data":"","mediatype":""},"image":"quay.io/operator-pipeline-prod/nginx-ingress-operator-index:v4.16-36a87cabd459f7be3258a7e60ef53751ea737de4","secrets":["registry-auth-keys"],"sourceType":"grpc"},"status":{"connectionState":{"address":"nginx-ingress-operator.nginx-ingress-operator.svc:50051","lastConnect":"2024-04-01T22:36:03Z","lastObservedState":"READY"},"registryService":{"createdAt":"2024-04-01T22:35:37Z","port":"50051","protocol":"grpc","serviceName":"nginx-ingress-operator","serviceNamespace":"nginx-ingress-operator"}}}

Must gather from prow (might not be for the above operator testing and another operator testing)
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-redhat-openshift-ecosystem-certified-operators-prod-ocp-4.15-preflight-prod-claim/1774927027938267136/

Slack Discussion:
https://redhat-internal.slack.com/archives/C3VS0LV41/p1711751930041179

https://github.com/openshift/operator-framework-olm/pull/728

Bug OCPBUGS-19365: Azure cluster installation failed with sdn plugin

View the Description View the linked PRs

Description of problem:

Azure cluster installation failed with sdn network plugin

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-09-17-045811
4.13.0-0.nightly-2023-09-18-210322

How reproducible:

Sometimes, found 2 failed in 5 jobs in ci

Steps to Reproduce:

1.  Install azure cluster with template aos-4_15/ipi-on-azure/versioned-installer-customer_vpc

Actual results:

Installation failed 
 09-19 10:56:47.536  level=info msg=Cluster operator node-tuning Progressing is True with Reconciling: Working towards "4.15.0-0.nightly-2023-09-17-045811"
09-19 10:56:47.536  level=info msg=Cluster operator openshift-apiserver Progressing is True with APIServerDeployment_PodsUpdating: APIServerDeploymentProgressing: deployment/apiserver.openshift-apiserver: 1/3 pods have been updated to the latest generation
09-19 10:56:47.536  level=info msg=Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 3
09-19 10:56:47.536  level=info msg=Progressing: deployment/route-controller-manager: updated replicas is 1, desired replicas is 3
09-19 10:56:47.536  level=info msg=Cluster operator storage Progressing is True with AzureDiskCSIDriverOperatorCR_AzureDiskDriverNodeServiceController_Deploying::AzureFileCSIDriverOperatorCR_AzureFileDriverNodeServiceController_Deploying: AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
09-19 10:56:47.536  level=info msg=AzureFileCSIDriverOperatorCRProgressing: AzureFileDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
09-19 10:56:47.536  level=error msg=Cluster initialization failed because one or more operators are not functioning properly.
09-19 10:56:47.536  level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
09-19 10:56:47.537  level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
09-19 10:56:47.537  level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
09-19 10:56:47.537  level=error msg=failed to initialize the cluster: Cluster operators authentication, console, control-plane-machine-set, kube-apiserver, machine-config are not available
09-19 10:56:47.537  [[1;31mERROR[0;39m] Installation failed with error code '6'. Aborting execution.

oc get nodes
NAME                                           STATUS     ROLES                  AGE     VERSION
jima41501-c646k-master-0                       NotReady   control-plane,master   3h35m   v1.28.2+fde2a12
jima41501-c646k-master-1                       Ready      control-plane,master   3h35m   v1.28.2+fde2a12
jima41501-c646k-master-2                       Ready      control-plane,master   3h35m   v1.28.2+fde2a12
jima41501-c646k-worker-southcentralus1-x82cb   Ready      worker                 3h22m   v1.28.2+fde2a12
jima41501-c646k-worker-southcentralus2-jxbbt   Ready      worker                 3h19m   v1.28.2+fde2a12
jima41501-c646k-worker-southcentralus3-s4j6c   Ready      worker                 3h18m   v1.28.2+fde2a12
huirwang@huirwang-mac workspace % oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2023-09-17-045811   False       True          True       3h31m   WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.0.7:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)
baremetal                                  4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
cloud-controller-manager                   4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h34m   
cloud-credential                           4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h39m   
cluster-autoscaler                         4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
config-operator                            4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h31m   
console                                    4.15.0-0.nightly-2023-09-17-045811   False       True          False      3h20m   DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.15.0-0.nightly-2023-09-17-045811   False       True          False      3h24m   Missing 1 available replica(s)
csi-snapshot-controller                    4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
dns                                        4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h30m   DNS "default" reports Progressing=True: "Have 5 available node-resolver pods, want 6."
etcd                                       4.15.0-0.nightly-2023-09-17-045811   True        True          True       3h29m   NodeControllerDegraded: The master nodes not ready: node "jima41501-c646k-master-0" not ready since 2023-09-19 02:13:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
image-registry                             4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h19m   Progressing: The registry is ready...
ingress                                    4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h19m   
insights                                   4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h19m   
kube-apiserver                             4.15.0-0.nightly-2023-09-17-045811   False       True          True       3h31m   StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 8
kube-controller-manager                    4.15.0-0.nightly-2023-09-17-045811   True        True          True       3h27m   NodeControllerDegraded: The master nodes not ready: node "jima41501-c646k-master-0" not ready since 2023-09-19 02:13:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-scheduler                             4.15.0-0.nightly-2023-09-17-045811   True        True          True       3h27m   NodeControllerDegraded: The master nodes not ready: node "jima41501-c646k-master-0" not ready since 2023-09-19 02:13:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-storage-version-migrator              4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
machine-api                                4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h17m   
machine-approver                           4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
machine-config                             4.15.0-0.nightly-2023-09-17-045811   False       False         True       164m    Cluster not available for [{operator 4.15.0-0.nightly-2023-09-17-045811}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 6, updated: 6, ready: 5, unavailable: 1)]
marketplace                                4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
monitoring                                 4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h15m   
network                                    4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h31m   DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)...
node-tuning                                4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h30m   Working towards "4.15.0-0.nightly-2023-09-17-045811"
openshift-apiserver                        4.15.0-0.nightly-2023-09-17-045811   True        True          True       3h24m   APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
openshift-controller-manager               4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h27m   Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 3...
openshift-samples                          4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h23m   
operator-lifecycle-manager                 4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
operator-lifecycle-manager-catalog         4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h30m   
operator-lifecycle-manager-packageserver   4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h25m   
service-ca                                 4.15.0-0.nightly-2023-09-17-045811   True        False         False      3h31m   
storage                                    4.15.0-0.nightly-2023-09-17-045811   True        True          False      3h30m   AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods...

[systemd]
Failed Units: 1
  openshift-azure-routes.service
[core@jima41501-c646k-master-0 ~]$ sudo -i
[systemd]
Failed Units: 1
  openshift-azure-routes.service
[root@jima41501-c646k-master-0 ~]# systemctl status openshift-azure-routes.service
× openshift-azure-routes.service - Work around Azure load balancer hairpin
     Loaded: loaded (/etc/systemd/system/openshift-azure-routes.service; static)
     Active: failed (Result: exit-code) since Tue 2023-09-19 02:10:31 UTC; 3h 23min ago
   Duration: 55ms
TriggeredBy: ● openshift-azure-routes.path
    Process: 13908 ExecStart=/bin/bash /opt/libexec/openshift-azure-routes.sh start (code=exited, status=1/FAILURE)
   Main PID: 13908 (code=exited, status=1/FAILURE)
        CPU: 77ms

Sep 19 02:10:31 jima41501-c646k-master-0 systemd[1]: Started Work around Azure load balancer hairpin.
Sep 19 02:10:31 jima41501-c646k-master-0 openshift-azure-routes[13908]: processing v4 vip 10.0.0.4
Sep 19 02:10:31 jima41501-c646k-master-0 openshift-azure-routes[13908]: /opt/libexec/openshift-azure-routes.sh: line 130: ovnkContaine>
Sep 19 02:10:31 jima41501-c646k-master-0 systemd[1]: openshift-azure-routes.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 02:10:31 jima41501-c646k-master-0 systemd[1]: openshift-azure-routes.service: Failed with result 'exit-code'.


4.13 failed in ci
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-azure-sdn/1703878138968150016/artifacts/e2e-azure-sdn/gather-extra/artifacts/oc_cmds/clusteroperators
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-09-18-210322   False       True          True       55m     WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.0.6:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)
baremetal                                  4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
cloud-controller-manager                   4.13.0-0.nightly-2023-09-18-210322   True        False         False      56m     
cloud-credential                           4.13.0-0.nightly-2023-09-18-210322   True        False         False      58m     
cluster-autoscaler                         4.13.0-0.nightly-2023-09-18-210322   True        False         False      53m     
config-operator                            4.13.0-0.nightly-2023-09-18-210322   True        False         False      55m     
console                                    4.13.0-0.nightly-2023-09-18-210322   False       True          False      45m     DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.13.0-0.nightly-2023-09-18-210322   False       True          False      47m     Missing 1 available replica(s)
csi-snapshot-controller                    4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
dns                                        4.13.0-0.nightly-2023-09-18-210322   True        True          False      53m     DNS "default" reports Progressing=True: "Have 5 available node-resolver pods, want 6."
etcd                                       4.13.0-0.nightly-2023-09-18-210322   True        True          True       52m     NodeControllerDegraded: The master nodes not ready: node "ci-op-pjxb081y-0c3e0-bxvlr-master-0" not ready since 2023-09-18 21:40:51 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
image-registry                             4.13.0-0.nightly-2023-09-18-210322   True        True          False      45m     NodeCADaemonProgressing: The daemon set node-ca is deploying node pods...
ingress                                    4.13.0-0.nightly-2023-09-18-210322   True        False         False      44m     
insights                                   4.13.0-0.nightly-2023-09-18-210322   True        False         False      47m     
kube-apiserver                             4.13.0-0.nightly-2023-09-18-210322   False       True          True       53m     StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 10
kube-controller-manager                    4.13.0-0.nightly-2023-09-18-210322   True        True          True       51m     NodeControllerDegraded: The master nodes not ready: node "ci-op-pjxb081y-0c3e0-bxvlr-master-0" not ready since 2023-09-18 21:40:51 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-scheduler                             4.13.0-0.nightly-2023-09-18-210322   True        True          True       51m     NodeControllerDegraded: The master nodes not ready: node "ci-op-pjxb081y-0c3e0-bxvlr-master-0" not ready since 2023-09-18 21:40:51 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-storage-version-migrator              4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
machine-api                                4.13.0-0.nightly-2023-09-18-210322   True        False         False      46m     
machine-approver                           4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
machine-config                             4.13.0-0.nightly-2023-09-18-210322   False       False         True       31m     Cluster not available for [{operator 4.13.0-0.nightly-2023-09-18-210322}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 6, updated: 6, ready: 5, unavailable: 1)]
marketplace                                4.13.0-0.nightly-2023-09-18-210322   True        False         False      53m     
monitoring                                 4.13.0-0.nightly-2023-09-18-210322   True        False         False      43m     
network                                    4.13.0-0.nightly-2023-09-18-210322   True        True          False      55m     DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)...
node-tuning                                4.13.0-0.nightly-2023-09-18-210322   True        True          False      53m     Working towards "4.13.0-0.nightly-2023-09-18-210322"
openshift-apiserver                        4.13.0-0.nightly-2023-09-18-210322   True        True          True       44m     APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver (3 containers are waiting in pending apiserver-66d764fbd6-r2s8d pod)
openshift-controller-manager               4.13.0-0.nightly-2023-09-18-210322   True        True          False      54m     Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 3...
openshift-samples                          4.13.0-0.nightly-2023-09-18-210322   True        False         False      47m     
operator-lifecycle-manager                 4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-09-18-210322   True        False         False      54m     
operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-09-18-210322   True        False         False      48m     
service-ca                                 4.13.0-0.nightly-2023-09-18-210322   True        False         False      55m     
storage                                    4.13.0-0.nightly-2023-09-18-210322   True        True          False      54m     AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods...

Expected results:


Installation succeeds

Additional info:

We doubted this is caused by PR https://github.com/openshift/machine-config-operator/pull/3878/files

https://github.com/openshift/machine-config-operator/pull/3926

Bug OCPBUGS-21777: BMH keep showing power status as off while IMM is powered on

View the Description View the linked PRs

Description of problem:

BMH is showing powered off even when node is up, this is causing cu's software to behave incorrectly due to incorrect status on BMH 

$ oc get bmh -n openshift-machine-api control-1-ru2 -o json | jq '.status|.operationalStatus,.poweredOn,.provisioning.state'
"OK"
false
"externally provisioned"


Following error can be seen:
2023-10-10T06:05:02.554453960Z {"level":"info","ts":1696917902.5544183,"logger":"provisioner.ironic","msg":"could not update node settings in ironic, busy","host":"openshift-machine-api~control-1-ru4"}

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1.Launch the cluster with OCP v4.12.32 on Lenovo servers  
2.
3.

Actual results:

It is giving false report of node status

Expected results:

It should report correct status of node

Additional info:

Bug OCPBUGS-24177: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4049

Bug OCPBUGS-35902: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2216

Bug OCPBUGS-36916: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4443

Bug OCPBUGS-23759: Ironic side of external_http_url (METAL-163) is not wired in correctly

View the Description View the linked PRs

Description of problem:

In the implementation of ~~METAL-163~~, the support for the new Ironic Node field external_http_url was only added for floppy-based configuration images, not for CD images that we use in OpenShift. This makes external_http_url a no-op.

See https://review.opendev.org/c/openstack/ironic/+/901696

https://github.com/openshift/ironic-image/pull/432

Bug OCPBUGS-28703: [backport-4.15] metrics-server should handle kubelet server CA rotation

View the Description View the linked PRs

Description of problem

Build02, a years old cluster currently running 4.15.0-ec.2 with TechPreviewNoUpgrade, has been Available=False for days:

$ oc get -o json clusteroperator monitoring | jq '.status.conditions[] | select(.type == "Available")'
{
  "lastTransitionTime": "2024-01-14T04:09:52Z",
  "message": "UpdatingMetricsServer: reconciling MetricsServer Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/metrics-server: context deadline exceeded",
  "reason": "UpdatingMetricsServerFailed",
  "status": "False",
  "type": "Available"
}

Both pods had been having CA trust issues. We deleted one pod, and it's replacement is happy:

$ oc -n openshift-monitoring get -l app.kubernetes.io/component=metrics-server pods
NAME                             READY   STATUS    RESTARTS   AGE
metrics-server-9cc8bfd56-dd5tx   1/1     Running   0          136m
metrics-server-9cc8bfd56-k2lpv   0/1     Running   0          36d

The young, happy pod has occasional node-removed noise, which is expected in this cluster with high levels of compute-node autoscaling:

$ oc -n openshift-monitoring logs --tail 3 metrics-server-9cc8bfd56-dd5tx
E0117 17:16:13.492646       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.32.33:10250/metrics/resource\": dial tcp 10.0.32.33:10250: connect: connection refused" node="build0-gstfj-ci-builds-worker-b-srjk5"
E0117 17:16:28.611052       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.32.33:10250/metrics/resource\": dial tcp 10.0.32.33:10250: connect: connection refused" node="build0-gstfj-ci-builds-worker-b-srjk5"
E0117 17:16:56.898453       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.32.33:10250/metrics/resource\": context deadline exceeded" node="build0-gstfj-ci-builds-worker-b-srjk5"

While the old, sad pod is complaining about unknown authorities:

$ oc -n openshift-monitoring logs --tail 3 metrics-server-9cc8bfd56-k2lpv
E0117 17:19:09.612161       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.0.3:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate signed by unknown authority" node="build0-gstfj-m-2.c.openshift-ci-build-farm.internal"
E0117 17:19:09.620872       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.32.90:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate signed by unknown authority" node="build0-gstfj-ci-prowjobs-worker-b-cg7qd"
I0117 17:19:14.538837       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"

More details in the Additional details section, but the timeline seems to have been something like:

2023-12-11, metrics-server-* pods come up, and are running happily, scraping kubelets with a CA trust store descended from openshift-config-managed's kubelet-serving-ca ConfigMap.
2024-01-02, a new openshift-kube-controller-manager-operator_csr-signer-signer@1704206554 is created.
2024-01-04, kubelets rotate their serving CA. Not entirely clear how this works yet outside of bootstrapping, but at least for bootstrapping it uses a CertificateSigningRequest, approved by cluster-machine-approver, and signed by the kubernetes.io/kubelet-serving signing component in the kube-controller-manager-* pods in the openshift-kube-controller-manager namespace.
2024-01-04, the csr-signer Secret in openshift-kube-controller-manager has the new openshift-kube-controller-manager-operator_csr-signer-signer@1704206554 issuing a certificate for kube-csr-signer_@1704338196.
The kubelet-serving-ca ConfigMap gets updated to include a CA for the new kube-csr-signer_@1704338196, signed by the new openshift-kube-controller-manager-operator_csr-signer-signer@1704206554.
Local /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt updated in metrics-server-* containers.
But metrics-server-* pods fail to notice the file change and reload /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt, so the existing pods do not trust the new kubelet server certs.
Mysterious time delay. Perhaps the monitoring operator does not notice sad metrics-server-* pods outside of things that trigger DeploymentRollout?
2024-01-14, monitoring ClusterOperator goes Available=False on UpdatingMetricsServerFailed.
2024-01-17, deleting one metrics-server-* pod triggers replacement-pod creation, and the replacement pod comes up fine.

So addressing the metrics-server /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt change detection should resolve this use-case. And triggering a container or pod restart would be an aggressive-but-sufficient mechanism, although loading the new data without rolling the process would be less invasive.

Version-Release number of selected component (if applicable)

4.15.0-ec.3, which has fast CA rotation, see discussion in API-1687.

How reproducible

Unclear.

Steps to Reproduce

Unclear.

Actual results

metrics-server pods having trouble with CA trust when attempting to scrape nodes.

Expected results

metrics-server pods successfully trusting kubelets when scraping nodes.

Additional details

The monitoring operator sets up the metrics server with --kubelet-certificate-authority=/etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt, which is the "Path to the CA to use to validate the Kubelet's serving certificates" and is mounted from the kubelet-serving-ca-bundle ConfigMap. But that mount point only contains openshift-kube-controller-manager-operator_csr-signer-signer@... CAs:

$ oc --as system:admin -n openshift-monitoring debug pod/metrics-server-9cc8bfd56-k2lpv -- cat /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt | while openssl x509 -noout -text; do :; done | grep '^Certificate:\|Issuer\|Subject:\|Not '
Starting pod/metrics-server-9cc8bfd56-k2lpv-debug-gtctn ...

Removing debug pod ...
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
            Not Before: Dec  3 14:42:33 2023 GMT
            Not After : Feb  1 14:42:34 2024 GMT
        Subject: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
            Not Before: Dec 20 03:16:35 2023 GMT
            Not After : Jan 19 03:16:36 2024 GMT
        Subject: CN = kube-csr-signer_@1703042196
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
            Not Before: Jan  4 03:16:35 2024 GMT
            Not After : Feb  3 03:16:36 2024 GMT
        Subject: CN = kube-csr-signer_@1704338196
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
            Not Before: Jan  2 14:42:34 2024 GMT
            Not After : Mar  2 14:42:35 2024 GMT
        Subject: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
unable to load certificate
137730753918272:error:0909006C:PEM routines:get_name:no start line:../crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE

While actual kubelets seem to be using certs signed by kube-csr-signer_@1704338196 (which is one of the Subjects in /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt):

$ oc get -o wide -l node-role.kubernetes.io/master= nodes
NAME                                                  STATUS   ROLES    AGE      VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
build0-gstfj-m-0.c.openshift-ci-build-farm.internal   Ready    master   3y240d   v1.28.3+20a5764   10.0.0.4      <none>        Red Hat Enterprise Linux CoreOS 415.92.202311271112-0 (Plow)   5.14.0-284.41.1.el9_2.x86_64   cri-o://1.28.2-2.rhaos4.15.gite7be4e1.el9
build0-gstfj-m-1.c.openshift-ci-build-farm.internal   Ready    master   3y240d   v1.28.3+20a5764   10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 415.92.202311271112-0 (Plow)   5.14.0-284.41.1.el9_2.x86_64   cri-o://1.28.2-2.rhaos4.15.gite7be4e1.el9
build0-gstfj-m-2.c.openshift-ci-build-farm.internal   Ready    master   3y240d   v1.28.3+20a5764   10.0.0.3      <none>        Red Hat Enterprise Linux CoreOS 415.92.202311271112-0 (Plow)   5.14.0-284.41.1.el9_2.x86_64   cri-o://1.28.2-2.rhaos4.15.gite7be4e1.el9
$ oc --as system:admin -n openshift-monitoring debug pod/metrics-server-9cc8bfd56-k2lpv -- openssl s_client -connect 10.0.0.3:10250 -showcerts </dev/null
Starting pod/metrics-server-9cc8bfd56-k2lpv-debug-ksl2k ...
Can't use SSL_get_servername
depth=0 O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
verify error:num=21:unable to verify the first certificate
verify return:1
depth=0 O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
verify return:1
CONNECTED(00000003)
---
Certificate chain
 0 s:O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
   i:CN = kube-csr-signer_@1704338196
-----BEGIN CERTIFICATE-----
MIIC5DCCAcygAwIBAgIQAbKVl+GS6s2H20EHAWl4WzANBgkqhkiG9w0BAQsFADAm
MSQwIgYDVQQDDBtrdWJlLWNzci1zaWduZXJfQDE3MDQzMzgxOTYwHhcNMjQwMTE3
MDMxNDMwWhcNMjQwMjAzMDMxNjM2WjBhMRUwEwYDVQQKEwxzeXN0ZW06bm9kZXMx
SDBGBgNVBAMTP3N5c3RlbTpub2RlOmJ1aWxkMC1nc3Rmai1tLTIuYy5vcGVuc2hp
ZnQtY2ktYnVpbGQtZmFybS5pbnRlcm5hbDBZMBMGByqGSM49AgEGCCqGSM49AwEH
A0IABFqT+UgohFAxJrGYQUeYsEhNB+ufFo14xYDedKBCeNzMhaC+5/I4UN1e1u2X
PH7J4ncmH+M/LXI7v+YfEIG7cH+jgZ0wgZowDgYDVR0PAQH/BAQDAgeAMBMGA1Ud
JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHwYDVR0jBBgwFoAU394ABuS2
9i0qss9AKk/mQ9lhJ88wRAYDVR0RBD0wO4IzYnVpbGQwLWdzdGZqLW0tMi5jLm9w
ZW5zaGlmdC1jaS1idWlsZC1mYXJtLmludGVybmFshwQKAAADMA0GCSqGSIb3DQEB
CwUAA4IBAQCiKelqlgK0OHFqDPdIR+RRdjXoCfFDa0JGCG0z60LYJV6Of5EPv0F/
vGZdM/TyGnPT80lnLCh2JGUvneWlzQEZ7LEOgXX8OrAobijiFqDZFlvVwvkwWNON
rfucLQWDFLHUf/yY0EfB0ZlM8Sz4XE8PYB6BXYvgmUIXS1qkV9eGWa6RPLsOnkkb
q/dTLE/tg8cz24IooDC8lmMt/wCBPgsq9AnORgNdZUdjCdh9DpDWCw0E4csSxlx2
H1qlH5TpTGKS8Ox9JAfdAU05p/mEhY9PEPSMfdvBZep1xazrZyQIN9ckR2+11Syw
JlbEJmapdSjIzuuKBakqHkDgoq4XN0KM
-----END CERTIFICATE-----
---
Server certificate
subject=O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal

issuer=CN = kube-csr-signer_@1704338196

---
Acceptable client certificate CA names
OU = openshift, CN = admin-kubeconfig-signer
CN = openshift-kube-controller-manager-operator_csr-signer-signer@1699022534
CN = kube-csr-signer_@1700450189
CN = kube-csr-signer_@1701746196
CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
CN = openshift-kube-apiserver-operator_kube-apiserver-to-kubelet-signer@1691004449
CN = openshift-kube-apiserver-operator_kube-control-plane-signer@1702234292
CN = openshift-kube-apiserver-operator_kube-control-plane-signer@1699642292
OU = openshift, CN = kubelet-bootstrap-kubeconfig-signer
CN = openshift-kube-apiserver-operator_node-system-admin-signer@1678905372
Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512
Peer signing digest: SHA256
Peer signature type: ECDSA
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1902 bytes and written 383 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.3, Cipher is TLS_AES_128_GCM_SHA256
Server public key is 256 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 21 (unable to verify the first certificate)
---
DONE

Removing debug pod ...
$ openssl x509 -noout -text <<EOF 2>/dev/null
> -----BEGIN CERTIFICATE-----
MIIC5DCCAcygAwIBAgIQAbKVl+GS6s2H20EHAWl4WzANBgkqhkiG9w0BAQsFADAm
MSQwIgYDVQQDDBtrdWJlLWNzci1zaWduZXJfQDE3MDQzMzgxOTYwHhcNMjQwMTE3
MDMxNDMwWhcNMjQwMjAzMDMxNjM2WjBhMRUwEwYDVQQKEwxzeXN0ZW06bm9kZXMx
SDBGBgNVBAMTP3N5c3RlbTpub2RlOmJ1aWxkMC1nc3Rmai1tLTIuYy5vcGVuc2hp
ZnQtY2ktYnVpbGQtZmFybS5pbnRlcm5hbDBZMBMGByqGSM49AgEGCCqGSM49AwEH
A0IABFqT+UgohFAxJrGYQUeYsEhNB+ufFo14xYDedKBCeNzMhaC+5/I4UN1e1u2X
PH7J4ncmH+M/LXI7v+YfEIG7cH+jgZ0wgZowDgYDVR0PAQH/BAQDAgeAMBMGA1Ud
JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHwYDVR0jBBgwFoAU394ABuS2
9i0qss9AKk/mQ9lhJ88wRAYDVR0RBD0wO4IzYnVpbGQwLWdzdGZqLW0tMi5jLm9w
ZW5zaGlmdC1jaS1idWlsZC1mYXJtLmludGVybmFshwQKAAADMA0GCSqGSIb3DQEB
CwUAA4IBAQCiKelqlgK0OHFqDPdIR+RRdjXoCfFDa0JGCG0z60LYJV6Of5EPv0F/
vGZdM/TyGnPT80lnLCh2JGUvneWlzQEZ7LEOgXX8OrAobijiFqDZFlvVwvkwWNON
rfucLQWDFLHUf/yY0EfB0ZlM8Sz4XE8PYB6BXYvgmUIXS1qkV9eGWa6RPLsOnkkb
q/dTLE/tg8cz24IooDC8lmMt/wCBPgsq9AnORgNdZUdjCdh9DpDWCw0E4csSxlx2
H1qlH5TpTGKS8Ox9JAfdAU05p/mEhY9PEPSMfdvBZep1xazrZyQIN9ckR2+11Syw
JlbEJmapdSjIzuuKBakqHkDgoq4XN0KM
-----END CERTIFICATE-----
> EOF
...
        Issuer: CN = kube-csr-signer_@1704338196
        Validity
            Not Before: Jan 17 03:14:30 2024 GMT
            Not After : Feb  3 03:16:36 2024 GMT
        Subject: O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
...

The monitoring operator populates the openshift-monitoring kubelet-serving-ca-bundle} ConfigMap using data from the openshift-config-managed kubelet-serving-ca ConfigMap, and that propagation is working, but does not contain the kube-csr-signer_ CA:

$ oc -n openshift-config-managed get -o json configmap kubelet-serving-ca | jq -r '.data["ca-bundle.crt"]' | while openssl x509 -noout -text; do :; done | grep '^Certificate:\|Issuer\|Subject:\|Not '
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
            Not Before: Dec  3 14:42:33 2023 GMT
            Not After : Feb  1 14:42:34 2024 GMT
        Subject: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
            Not Before: Dec 20 03:16:35 2023 GMT
            Not After : Jan 19 03:16:36 2024 GMT
        Subject: CN = kube-csr-signer_@1703042196
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
            Not Before: Jan  4 03:16:35 2024 GMT
            Not After : Feb  3 03:16:36 2024 GMT
        Subject: CN = kube-csr-signer_@1704338196
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
            Not Before: Jan  2 14:42:34 2024 GMT
            Not After : Mar  2 14:42:35 2024 GMT
        Subject: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
unable to load certificate
140531510617408:error:0909006C:PEM routines:get_name:no start line:../crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE
$ oc -n openshift-config-managed get -o json configmap kubelet-serving-ca | jq -r '.data["ca-bundle.crt"]' | sha1sum 
a32ab44dff8030c548087d70fea599b0d3fab8af  -
$ oc -n openshift-monitoring get -o json configmap kubelet-serving-ca-bundle | jq -r '.data["ca-bundle.crt"]' | sha1sum 
a32ab44dff8030c548087d70fea599b0d3fab8af  -

Flipping over to the kubelet side, nothing in the machine-config operator's template is jumping out at me as a key/cert pair for serving on 10250. The kubelet seems to set up server certs via serverTLSBootstrap: true. But we don't seem to set the beta RotateKubeletServerCertificate, so I'm not clear on how these are supposed to rotate on the kubelet side. But there are CSRs from kubelets requesting serving certs:

$ oc get certificatesigningrequests | grep 'NAME\|kubelet-serving'
NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   REQUESTEDDURATION   CONDITION
csr-8stgd   51m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-xkdw2                           <none>              Approved,Issued
csr-blbjx   9m1s    kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-longtests-worker-b-5w9dz                        <none>              Approved,Issued
csr-ghxh5   64m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-sdwdn                           <none>              Approved,Issued
csr-hng85   33m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-longtests-worker-d-7d7h2                        <none>              Approved,Issued
csr-hvqxz   24m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-fp6wb                           <none>              Approved,Issued
csr-vc52m   50m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-xlmt6                           <none>              Approved,Issued
csr-vflcm   40m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-djpgq                           <none>              Approved,Issued
csr-xfr7d   51m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-8v4vk                           <none>              Approved,Issued
csr-zhzbs   51m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-rqr68                           <none>              Approved,Issued
$ oc get -o json certificatesigningrequests csr-blbjx
{
    "apiVersion": "certificates.k8s.io/v1",
    "kind": "CertificateSigningRequest",
    "metadata": {
        "creationTimestamp": "2024-01-17T19:20:43Z",
        "generateName": "csr-",
        "name": "csr-blbjx",
        "resourceVersion": "4719586144",
        "uid": "5f12d236-3472-485f-8037-3896f51a809c"
    },
    "spec": {
        "groups": [
            "system:nodes",
            "system:authenticated"
        ],
        "request": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQlh6Q0NBUVFDQVFBd1ZqRVZNQk1HQTFVRUNoTU1jM2x6ZEdWdE9tNXZaR1Z6TVQwd093WURWUVFERXpSegplWE4wWlcwNmJtOWtaVHBpZFdsc1pEQXRaM04wWm1vdFkya3RiRzl1WjNSbGMzUnpMWGR2Y210bGNpMWlMVFYzCk9XUjZNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUV5Y0dhSDMvZ3F4ZHNZWkdmQXovTEpoZVgKd1o0Z1VRbjB6TlZUenJncHpvd1VPOGR6NTN4UUZTOTRibm40NldlZFg3Q2xidUpVSUpUN2pCblV1WEdnZktCTQpNRW9HQ1NxR1NJYjNEUUVKRGpFOU1Ec3dPUVlEVlIwUkJESXdNSUlvWW5WcGJHUXdMV2R6ZEdacUxXTnBMV3h2CmJtZDBaWE4wY3kxM2IzSnJaWEl0WWkwMWR6bGtlb2NFQ2dBZ0F6QUtCZ2dxaGtqT1BRUURBZ05KQURCR0FpRUEKMHlRVzZQOGtkeWw5ZEEzM3ppQTJjYXVJdlhidTVhczNXcUZLYWN2bi9NSUNJUURycEQyVEtScHJOU1I5dExKTQpjZ0ZpajN1dVNieVJBcEJ5NEE1QldEZm02UT09Ci0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=",
        "signerName": "kubernetes.io/kubelet-serving",
        "usages": [
            "digital signature",
            "server auth"
        ],
        "username": "system:node:build0-gstfj-ci-longtests-worker-b-5w9dz"
    },
    "status": {
        "certificate": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN6ekNDQWJlZ0F3SUJBZ0lSQUlGZ1NUd0ovVUJLaE1hWlE4V01KcEl3RFFZSktvWklodmNOQVFFTEJRQXcKSmpFa01DSUdBMVVFQXd3YmEzVmlaUzFqYzNJdGMybG5ibVZ5WDBBeE56QTBNek00TVRrMk1CNFhEVEkwTURFeApOekU1TVRVME0xb1hEVEkwTURJd016QXpNVFl6Tmxvd1ZqRVZNQk1HQTFVRUNoTU1jM2x6ZEdWdE9tNXZaR1Z6Ck1UMHdPd1lEVlFRREV6UnplWE4wWlcwNmJtOWtaVHBpZFdsc1pEQXRaM04wWm1vdFkya3RiRzl1WjNSbGMzUnoKTFhkdmNtdGxjaTFpTFRWM09XUjZNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUV5Y0dhSDMvZwpxeGRzWVpHZkF6L0xKaGVYd1o0Z1VRbjB6TlZUenJncHpvd1VPOGR6NTN4UUZTOTRibm40NldlZFg3Q2xidUpVCklKVDdqQm5VdVhHZ2ZLT0JrakNCanpBT0JnTlZIUThCQWY4RUJBTUNCNEF3RXdZRFZSMGxCQXd3Q2dZSUt3WUIKQlFVSEF3RXdEQVlEVlIwVEFRSC9CQUl3QURBZkJnTlZIU01FR0RBV2dCVGYzZ0FHNUxiMkxTcXl6MEFxVCtaRAoyV0VuenpBNUJnTlZIUkVFTWpBd2dpaGlkV2xzWkRBdFozTjBabW90WTJrdGJHOXVaM1JsYzNSekxYZHZjbXRsCmNpMWlMVFYzT1dSNmh3UUtBQ0FETUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFBRE5ad0pMdkp4WWNta2RHV08KUm5ocC9rc3V6akJHQnVHbC9VTmF0RjZScml3eW9mdmpVNW5Kb0RFbGlLeHlDQ2wyL1d5VXl5a2hMSElBK1drOQoxZjRWajIrYmZFd0IwaGpuTndxQThudFFabS90TDhwalZ5ZzFXM0VwR2FvRjNsZzRybDA1cXBwcjVuM2l4WURJClFFY2ZuNmhQUnlKN056dlFCS0RwQ09lbU8yTFllcGhqbWZGY2h5VGRZVGU0aE9IOW9TWTNMdDdwQURIM2kzYzYKK3hpMDhhV09LZmhvT3IybTVBSFBVN0FkTjhpVUV0M0dsYzI0SGRTLzlLT05tT2E5RDBSSk9DMC8zWk5sKzcvNAoyZDlZbnYwaTZNaWI3OGxhNk5scFB0L2hmOWo5TlNnMDN4OFZYRVFtV21zN29xY1FWTHMxRHMvWVJ4VERqZFphCnEwMnIKLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=",
        "conditions": [
            {
                "lastTransitionTime": "2024-01-17T19:20:43Z",
                "lastUpdateTime": "2024-01-17T19:20:43Z",
                "message": "This CSR was approved by the Node CSR Approver (cluster-machine-approver)",
                "reason": "NodeCSRApprove",
                "status": "True",
                "type": "Approved"
            }
        ]
    }
}
$ oc get -o json certificatesigningrequests csr-blbjx | jq -r '.status.certificate | @base64d' | openssl x509 -noout -text | grep '^Certificate:\|Issuer\|Subject:\|Not '
Certificate:
        Issuer: CN = kube-csr-signer_@1704338196
            Not Before: Jan 17 19:15:43 2024 GMT
            Not After : Feb  3 03:16:36 2024 GMT
        Subject: O = system:nodes, CN = system:node:build0-gstfj-ci-longtests-worker-b-5w9dz

So that's approved by cluster-machine-approver, but signerName: kubernetes.io/kubelet-serving is an upstream Kubernetes component documented here, and the signer is implemented by kube-controller-manager.

https://github.com/openshift/cluster-monitoring-operator/pull/2249

Bug OCPBUGS-32215: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/k8s-prometheus-adapter/pull/102

Task MGMT-16045: fix http2 CVE-2023-44487

View the Description View the linked PRs

OCPBUGS-20385

https://github.com/openshift/assisted-service/pull/5614

Bug OCPBUGS-13348: Hypershift Audit configuration not working for Hypershift HostedCluster

View the Description View the linked PRs

Description of problem:

Add Audit configuration for hypershift Hosted Cluster not working as expected.

Version-Release number of selected component (if applicable):

# oc get clusterversions.config.openshift.io
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-05-04-090524   True        False         15m     Cluster version is 4.13.0-0.nightly-2023-05-04-090524

How reproducible:

Always

Steps to Reproduce:

1. Get hypershift hosted cluster detail from management cluster. 

# hostedcluster=$( oc get -n clusters hostedclusters -o json | jq -r .items[].metadata.name)  

2. Apply audit profile for hypershift hosted cluster. 
# oc patch HostedCluster $hostedcluster -n clusters -p '{"spec": {"configuration": {"apiServer": {"audit": {"profile": "WriteRequestBodies"}}}}}' --type merge     
hostedcluster.hypershift.openshift.io/85ea85757a5a14355124 patched 

# oc get HostedCluster $hostedcluster -n clusters -ojson | jq .spec.configuration.apiServer.audit        
{
  "profile": "WriteRequestBodies"
}

3. Check Pod or operator restart to apply configuration changes. 

# oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster}
NAME                              READY   STATUS    RESTARTS   AGE
kube-apiserver-7c98b66949-9z6rw   5/5     Running   0          36m
kube-apiserver-7c98b66949-gp5rx   5/5     Running   0          36m
kube-apiserver-7c98b66949-wmk8x   5/5     Running   0          36m

# oc get pods -l app=openshift-apiserver   -n clusters-${hostedcluster}
NAME                                  READY   STATUS    RESTARTS   AGE
openshift-apiserver-dc4c84ff4-566z9   3/3     Running   0          29m
openshift-apiserver-dc4c84ff4-99zq9   3/3     Running   0          29m
openshift-apiserver-dc4c84ff4-9xdrz   3/3     Running   0          30m

4. Check generated audit log.
# NOW=$(date -u "+%s"); echo "$NOW"; echo "$NOW" > now
1683711189

# kaspod=$(oc get pods -l app=kube-apiserver -n clusters-${hostedcluster} --no-headers -o=jsonpath={.items[0].metadata.name})                                     

# oc logs $kaspod -c audit-logs -n clusters-${hostedcluster} > kas-audit.log                                                                                      
# cat kas-audit.log | grep -iE '"verb":"(get|list|watch)","user":.*(requestObject|responseObject)' | jq -c 'select (.requestReceivedTimestamp | .[0:19] + "Z" | fromdateiso8601 > '"`cat now`)" | wc -l
0

# cat kas-audit.log | grep -iE '"verb":"(create|delete|patch|update)","user":.*(requestObject|responseObject)' | jq -c 'select (.requestReceivedTimestamp | .[0:19] + "Z" | fromdateiso8601 > '"`cat now`)" | wc -l
0  

All results should not be zero
In backend it should apply the configuration or pod/operator restart after configuration changes.

Actual results:

Config changes not applied in backend.Not operator & pod restart

Expected results:

Configuration should applied and pod & operator should restart after config changes.

Additional info:

https://github.com/openshift/hypershift/pull/3014

Bug OCPBUGS-19108: Update 4.15 prometheus-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/242

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/242

Bug OCPBUGS-39171: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4625

Bug OCPBUGS-42137: Removal of additionalTrustBundle CA that was passed via install-config.yaml during agent-based installation, does not remove certificate from node

View the Description View the linked PRs

Description of problem:

 When we remove additionalTrustBundle CA of mirror registry(user-ca-bundle) that was passed via the install-config.yaml for agent installer installation,
MCO does not remove certificatefrom the nodes.

$ oc version
Client Version: 4.15.23
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.23
Kubernetes Version: v1.28.11+add48d0
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.23   True        False         3h2m    Cluster version is 4.15.23

How reproducible:

    Always

Steps to Reproduce:

    1.Create cluster with additionalTrustBundle CA in install-config
    2.Locate the mirror reg CA certificate stored on the node's /etc/pki/ directory
     ~~~
     cd /etc/pki/ca-trust/source/anchors
[root@master1 anchors]# ls -la
total 216
drwxr-xr-x. 2 root root     49 Sep 18 05:23 .
drwxr-xr-x. 4 root root     80 Sep 18 05:20 ..
-rw-------. 1 root root 220593 Sep 18 05:23 openshift-config-user-ca-bundle.crt
    ~~~

    3. back up and delete the CM (user-ca-bundle)
     ~~~
   $ oc delete configmap/user-ca-bundle -n openshift-config
configmap "user-ca-bundle" deleted
     ~~~

    4. Observer if some changes happens at the MCO/MCP level due to the same.
    5. Switch to the node and check same /etc/pki/../ to see if CA is present or not

Actual results:

Certificate still present under  "/etc/pki/ca-trust/source/anchors" on the nodes. No new MC got generated

# cd /etc/pki/ca-trust/source/anchors
[root@master1 anchors]# ls -la
total 216
drwxr-xr-x. 2 root root     49 Sep 18 05:23 .
drwxr-xr-x. 4 root root     80 Sep 18 05:20 ..
-rw-------. 1 root root 220593 Sep 18 05:23 openshift-config-user-ca-bundle.crt

[root@master1 anchors]# cat openshift-config-user-ca-bundle.crt | grep "MIID2TCCAsGgAwIBAgIUb1e2U0GXeW5qmTlgzE8SSDvht2YwDQYJKoZIhvcNAQEL"

MIID2TCCAsGgAwIBAgIUb1e2U0GXeW5qmTlgzE8SSDvht2YwDQYJKoZIhvcNAQEL
MIID2TCCAsGgAwIBAgIUb1e2U0GXeW5qmTlgzE8SSDvht2YwDQYJKoZIhvcNAQEL

Expected results:

    New MC should get created once the user-ca-bundle has been removed and roll out of MC should happen on the node. Certificate should be removed on the nodes.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4687

Bug OCPBUGS-18071: Ignore headless services in ovnkube-node when restarting and syncing services

View the Description View the linked PRs

Description of problem:

ovnkube-node fails to start on a customer cluster (see OHSS-26032), the error message doesn't state which step of the startup process (or which Service or other object defined on the cluster) stops.

Version-Release number of selected component (if applicable):

How reproducible:

Unknown. After a Force Rebuild of the OVN databases the ovnkube-node doesn't start.
The issue seems to be with a headless service with internalTrafficPolicy:Local which isn't allowed according to https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2086-service-internal-traffic-policy/README.md#proposal

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1923

Bug OCPBUGS-30148: OpenShiftSDN error should say "unsupported" rather than "deprecated"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30135~~. The following is the description of the original issue:
—
Description of problem:

    Installer now errors when attempting to use networkType: OpenShiftSDN; but the message still says "deprecated".

Version-Release number of selected component (if applicable):

4.15+

How reproducible:

100%

Steps to Reproduce:

    1. Attempt to install 4.15+ with networkType: OpenShiftSDN
Observe error in logs: time="2024-03-01T14:37:25Z" level=error msg="failed to fetch Master Machines: failed to load asset \"Install Config\": failed to create install config: invalid \"install-config.yaml\" file: networking.networkType: Invalid value: \"OpenShiftSDN\": networkType OpenShiftSDN is deprecated, please use OVNKubernetes"

Actual results:

Observe error in logs:

time="2024-03-01T14:37:25Z" level=error msg="failed to fetch Master Machines: failed to load asset \"Install Config\": failed to create install config: invalid \"install-config.yaml\" file: networking.networkType: Invalid value: \"OpenShiftSDN\": networkType OpenShiftSDN is deprecated, please use OVNKubernetes"

Expected results:

A message more like:

Observe error in logs: time="2024-03-01T14:37:25Z" level=error msg="failed to fetch Master Machines: failed to load asset \"Install Config\": failed to create install config: invalid \"install-config.yaml\" file: networking.networkType: Invalid value: \"OpenShiftSDN\": networkType OpenShiftSDN is not supported, please use OVNKubernetes"

Additional info:
See thread

https://github.com/openshift/installer/pull/8094

Bug OCPBUGS-34831: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/538

Bug OCPBUGS-35002: Hosted Cluster etcd automatic defragmentation is not enabled by default

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-12150~~. The following is the description of the original issue:
—
Description of problem:

As outlined here (https://docs.openshift.com/container-platform/4.12/scalability_and_performance/recommended-host-practices.html#etcd-defrag_recommended-host-practices) it is generally a good process to defragment etcd. Today, hosted clusters on Hypershift do not have automatic defragmentation enabled. This may lead to increased etcd sizes and poor performance.

Version-Release number of selected component (if applicable):

MC OCP: 4.12.11
HC OCP: 4.12.12

How reproducible:

100%

Steps to Reproduce:

1. Deploy cluster
2. Watch etcd for signs of defragmentation
3.

Actual results:

etcd defragmention is not done

Expected results:

etcd automatically defragments

Additional info:

https://github.com/openshift/hypershift/pull/4162

Bug OCPBUGS-23913: machine-api-controller stuck in CrashLoopBackOff

View the Description View the linked PRs

Description of problem: Panic on machine-controller

2023-11-23T18:18:47.899851056Z I1123 18:18:47.899752       1 controller.go:115]  "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="machine-controller" "name"="bogus-6121tjfqk-cpr4v" "namespace"="openshift-machine-api" "object"={"name":"bogus-6121tjfqk-cpr4v","namespace":"openshift-machine-api"} "reconcileID"="38050b3e-3313-4500-8955-59f6822fd650"
2023-11-23T18:18:47.901976792Z panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2023-11-23T18:18:47.901976792Z 	panic: runtime error: invalid memory address or nil pointer dereference
2023-11-23T18:18:47.901976792Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x27fcb31]
2023-11-23T18:18:47.902001202Z 
2023-11-23T18:18:47.902001202Z goroutine 261 [running]:
2023-11-23T18:18:47.902001202Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
2023-11-23T18:18:47.902001202Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa
2023-11-23T18:18:47.902013625Z panic({0x2ab4640, 0x4373ed0})
2023-11-23T18:18:47.902022923Z 	/usr/lib/golang/src/runtime/panic.go:884 +0x213
2023-11-23T18:18:47.902043867Z github.com/openshift/machine-api-provider-openstack/pkg/machine.extractRootVolumeFromProviderSpec(...)
2023-11-23T18:18:47.902043867Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/convert.go:211
2023-11-23T18:18:47.902053364Z github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).Delete(0xc0000bfab0, {0x3113ff0?, 0xc000605ec0?}, 0xc00065fd40)
2023-11-23T18:18:47.902062370Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:335 +0x1b1
2023-11-23T18:18:47.902082577Z github.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc000304aa0, {0x3113ff0, 0xc000605ec0}, {{{0xc000d66a50?, 0x0?}, {0xc000d66a38?, 0xc00043cd48?}}})
2023-11-23T18:18:47.902117667Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:216 +0x1dee
2023-11-23T18:18:47.902139450Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x31181b8?, {0x3113ff0?, 0xc000605ec0?}, {{{0xc000d66a50?, 0xb?}, {0xc000d66a38?, 0x0?}}})
2023-11-23T18:18:47.902166210Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8
2023-11-23T18:18:47.902186773Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0005488c0, {0x3113f48, 0xc000350550}, {0x2b9b6a0?, 0xc000475760?})
2023-11-23T18:18:47.902196557Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca
2023-11-23T18:18:47.902205655Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0005488c0, {0x3113f48, 0xc000350550})
2023-11-23T18:18:47.902214747Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9
2023-11-23T18:18:47.902223782Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
2023-11-23T18:18:47.902223782Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85
2023-11-23T18:18:47.902233237Z created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
2023-11-23T18:18:47.902242150Z 	/go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587

The bogus machine bogus-6121tjfqk-cpr4v was created by openstack-test "[sig-installer][Suite:openshift/openstack] Bugfix bz_2073398: [Serial] MachineSet scale-in does not leak OpenStack ports" which was run before and passed.

Version-Release number of selected component (if applicable):

Network_Type: OVNKubernetes
osp_puddle: ~~RHOS-17~~.1-RHEL-9-20231102.n.1
ocp_puddle: 4.15.0-0.nightly-2023-11-20-205649

How reproducible: Observed once.
Additional info: must-gather provided on private comment

https://github.com/openshift/machine-api-provider-openstack/pull/98

Bug OCPBUGS-31046: Application creation fail when manually entering input scaling value in local setup

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26415~~. The following is the description of the original issue:
—
Description of problem:

In local setup this error appears when creating a deployment with scaling in the git form page locally: 
`Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field DeploymentSpec.spec.replicas of type int32`

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-01-05-154400

How reproducible:

Everytime

Steps to Reproduce:

    1. In the local setup go to the git form page
    2. Enter a git repo and select deployment as the resource type
    3. In scaling enter the value as '5' and click on Create button

Actual results:

Got this error:
"Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field DeploymentSpec.spec.replicas of type int32"

Expected results:

Deployment should be created

Additional info:

Happening with Deployment-config creation as well

https://github.com/openshift/console/pull/13682

Bug OCPBUGS-6513: admission web hook probe error when deploy sample KSVC based app and then modifying icon

View the Description View the linked PRs

Description of problem:

Using the web console on the RH Developer Sandbox, created the most basic Knative Service (KSVC) using the default suggested, ie image openshift/hello-openshift.

Then tried to change the displayed icon using the web UI and an error about Probes was displayed. See attached images.

The error has no relevance to the item changed.

Version-Release number of selected component (if applicable):

whatever the RH sandbox uses, this value is not displayed to users

How reproducible:

very

Steps to Reproduce:

Using the web console on the RH Developer Sandbox, created the most basic Knative Service (KSVC) using the default image openshift/hello-openshift.

Then used the webUi to edit the KSVC sample to change the icon used from an OpenShift logo to a 3Scale logo for instance.

When saving from this form an error was reported: admission webhook 'validation webhook.serving.knative.dev' denied the request: validation failed: must not set the field(s): spec.template.spec.containers[0].readiness.Probe

Actual results:

Expected results:

Either a failure message related to changing the icon, or the icon change to take effect

Additional info:

KSVC details as provided by the web console.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: sample
  namespace: agroom-dev
spec:
  template:
    spec:
      containers:
        - image: openshift/hello-openshift

https://github.com/openshift/console/pull/12832

Bug OCPBUGS-26240: 4.14-fast ARO after upgrade to 4.14 new Machinesets do not get worker config

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25406~~. The following is the description of the original issue:
—
Description of problem:

On a 4.14.5-fast channel cluster in ARO after the upgrade when the customer tried to add a new node the Machine Config was not applied and the node never joined the pool. This happens for every node and can only be remediated by SRE not the customer.

Version-Release number of selected component (if applicable):

4.14.5 -candidate

How reproducible:

Every time a node is added to the cluster at version.

Steps to Reproduce:

    1. Install an ARO cluster
    2. Upgrade it to 4.14 along fast channel
    3. Add a node

Actual results:

 message: >-
        could not Create/Update MachineConfig: Operation cannot be fulfilled on
        machineconfigs.machineconfiguration.openshift.io
        "99-worker-generated-kubelet": the object has been modified; please
        apply your changes to the latest version and try again
      status: 'False'
      type: Failure
    - lastTransitionTime: '2023-11-29T17:44:37Z'

~~~

Expected results:

Node is created and configured correctly.

Additional info:

 MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "kube-apiserver" in namespace: "openshift-kube-apiserver" for revision: 15 on node: "aro-cluster-REDACTED-master-0" didn't show up, waited: 4m45s

https://github.com/openshift/machine-config-operator/pull/4101

Bug OCPBUGS-27378: [vSphere-CSI-Driver-Operator] does not update the VSphereCSIDriverOperatorCRAvailable status timely

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24421~~. The following is the description of the original issue:
—
Description of problem:

[vSphere-CSI-Driver-Operator] does not update the VSphereCSIDriverOperatorCRAvailable status timely

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-04-162702

How reproducible:

Always

Steps to Reproduce:

1. Set up a vSphere cluster with 4.15 nightly;
2. Backup the secret/vmware-vsphere-cloud-credentials to "vmware-cc.yaml"
3. Change the secret/vmware-vsphere-cloud-credentials password to an invalid value under ns/openshift-cluster-csi-drivers by oc edit;
4. Wait for the cluster storage operator degrade and the driver controller pods CrashLoopBackOff, then recover the backup secret "vmware-cc.yaml" back by apply;
5. Observer the driver controller pods back to Running and the cluster storage operator should be back to healthy.

Actual results:

In Step5 : The driver controller pods back to Running but the cluster storage operator stuck at Degrade: True status for almost 1 hour$ oc get po
NAME                                                    READY   STATUS    RESTARTS        AGE
vmware-vsphere-csi-driver-controller-664db7d497-b98vt   13/13   Running   0               16s
vmware-vsphere-csi-driver-controller-664db7d497-rtj49   13/13   Running   0               23s
vmware-vsphere-csi-driver-node-2krg6                    3/3     Running   1 (3h4m ago)    3h5m
vmware-vsphere-csi-driver-node-2t928                    3/3     Running   2 (3h16m ago)   3h16m
vmware-vsphere-csi-driver-node-45kb8                    3/3     Running   2 (3h16m ago)   3h16m
vmware-vsphere-csi-driver-node-8vhg9                    3/3     Running   1 (3h16m ago)   3h16m
vmware-vsphere-csi-driver-node-9fh9l                    3/3     Running   1 (3h4m ago)    3h5m
vmware-vsphere-csi-driver-operator-5954476ddc-rkpqq     1/1     Running   2 (3h10m ago)   3h17m
vmware-vsphere-csi-driver-webhook-7b6b5d99f6-rxdt8      1/1     Running   0               3h16m
vmware-vsphere-csi-driver-webhook-7b6b5d99f6-skcbd      1/1     Running   0               3h16m
$ oc get co/storage -w
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
storage   4.15.0-0.nightly-2023-12-04-162702   False       False         True       8m39s   VSphereCSIDriverOperatorCRAvailable: VMwareVSphereControllerAvailable: error logging into vcenter: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
storage   4.15.0-0.nightly-2023-12-04-162702   True        False         False      0s
$  oc get co/storage
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
storage   4.15.0-0.nightly-2023-12-04-162702   True        False         False      3m41s

Expected results:

In Step5 : After driver controller pods back to Running the cluster storage operator should recover healthy status immediatelly

Additional info:

I compare with the previous CI results seems this issue happened after 4.15.0-0.nightly-2023-11-25-110147

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/216

Bug OCPBUGS-7893: TaskRun duration chart legend shows only 4 taskruns

View the Description View the linked PRs

Description of problem:
The TaskRun duration diagram on the "Metrics" tab of pipeline is set to only show 4 TaskRuns in the legend regardless of the number of TaskRuns on the diagram.

Expected results:

All TaskRuns should be displayed in the legend.

https://github.com/openshift/console/pull/13077

Bug OCPBUGS-25724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28483

Bug OCPBUGS-31940: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/747

Task MGMT-16039: Upgrade to golang 1.20

View the Description View the linked PRs

Upgrade to golang 1.20 for all assisted-installer components

Bug OCPBUGS-18003: Outgoing traffic throughs EgressRouter is broken

View the Description View the linked PRs

Description of problem:

Found auto case OCP-42340 failed in ci job which version is 4.14.0-ec.4 and then reproduced issue in 4.14.0-0.nightly-2023-08-22-221456

Version-Release number of selected component (if applicable):

4.14.0-ec.4 4.14.0-0.nightly-2023-08-22-221456

How reproducible:

Always

Steps to Reproduce:

1. Deploy egressrouter on baremetal with 
{
    "kind": "List",
    "apiVersion": "v1",
    "metadata": {},
    "items": [
        {
            "apiVersion": "network.operator.openshift.io/v1",
            "kind": "EgressRouter",
            "metadata": {
                "name": "egressrouter-42430",
                "namespace": "e2e-test-networking-egressrouter-l4xgx"
            },
            "spec": {
                "addresses": [
                    {
                        "gateway": "192.168.111.1",
                        "ip": "192.168.111.55/24"
                    }
                ],
                "mode": "Redirect",
                "networkInterface": {
                    "macvlan": {
                        "mode": "Bridge"
                    }
                },
                "redirect": {
                    "redirectRules": [
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 80,
                            "protocol": "TCP"
                        },
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 8080,
                            "protocol": "TCP",
                            "targetPort": 80
                        },
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 8888,
                            "protocol": "TCP",
                            "targetPort": 80
                        }
                    ]
                }
            }
        }
    ]
}

 % oc get pods -n  e2e-test-networking-egressrouter-l4xgx -o wide
NAME                                           READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
egress-router-cni-deployment-c4bff88cf-skv9j   1/1     Running   0          69m   10.131.0.26   worker-0   <none>           <none>

2. Create service which point to egressrouter
% oc get svc -n e2e-test-networking-egressrouter-l4xgx -o yaml  
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2023-08-23T05:58:30Z"
    name: ovn-egressrouter-multidst-svc
    namespace: e2e-test-networking-egressrouter-l4xgx
    resourceVersion: "50383"
    uid: 07341ff1-6df3-40a6-b27e-59102d56e9c1
  spec:
    clusterIP: 172.30.10.103
    clusterIPs:
    - 172.30.10.103
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: con1
      port: 80
      protocol: TCP
      targetPort: 80
    - name: con2
      port: 5000
      protocol: TCP
      targetPort: 8080
    - name: con3
      port: 6000
      protocol: TCP
      targetPort: 8888
    selector:
      app: egress-router-cni
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
kind: List
metadata:
  resourceVersion: ""

  3. create a test pod to access the service or curl the egressrouter IP:port directly 
oc rsh -n e2e-test-networking-egressrouter-l4xgx hello-pod1                                  
~ $ curl 172.30.10.103:80 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms
~ $ curl 10.131.0.26:80 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms
 $ curl 10.131.0.26:8080 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms

Actual results:

  connection failed

Expected results:

  connection succeed

Additional info:
Note, the issue didn't exist in 4.13. It passed in 4.13 latest nightly build 4.13.0-0.nightly-2023-08-11-101506

08-23 15:26:16.955  passed: (1m3s) 2023-08-23T07:26:07 "[sig-networking] SDN ConnectedOnly-Author:huirwang-High-42340-Egress router redirect mode with multiple destinations."

https://github.com/openshift/egress-router-cni/pull/77

Bug OCPBUGS-19267: Update 4.15 ose-azure-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/86

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/86

Bug OCPBUGS-19281: Update 4.15 openshift-enterprise-deployer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1544

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1544

Bug OCPBUGS-23971: After PatternFly5 update: table headers are missing at mobile resolutions

View the Description View the linked PRs

The existing tables that have hard-coded PF5 classnames don't display table headers at mobile resolutions. This is because of the inclusion of `pf-m-grid-md` alongside `pf-v5-c-table`. We should remove `pf-m-grid-md` to preserve the functionality it was prior to the PF5 upgrade.

https://github.com/openshift/console/pull/13373

Bug OCPBUGS-24167: Update 4.15 ose-cluster-ingress-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-ingress-operator/pull/1002

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-ingress-operator/pull/1002

Bug OCPBUGS-13664: There is no clear error log when create sts cluster with KMS key without install role in it

View the Description View the linked PRs

Description of problem:

There is no clear error log when create sts cluster with KMS key without install role in it

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.Prepare KMS with aws command
   aws kms create-key --tags TagKey=Purpose,TagValue=Test --description "kms Key" 2.Create sts cluster with KMS key 

rosa create cluster --cluster-name ying-k1 --sts --role-arn arn:aws:iam::301721915996:role/ying16-Installer-Role --support-role-arn arn:aws:iam::301721915996:role/ying16-Support-Role --controlplane-iam-role arn:aws:iam::301721915996:role/ying16-ControlPlane-Role --worker-iam-role arn:aws:iam::301721915996:role/ying16-Worker-Role --operator-roles-prefix ying-k1-e2g3 --oidc-config-id 23ggvdh2jouranue87r5ujskp8hctisn --region us-west-2 --version 4.12.15 --replicas 2 --compute-machine-type m5.xlarge --machine-cidr 10.0.0.0/16 --service-cidr 172.30.0.0/16 --pod-cidr 10.128.0.0/14 --host-prefix 23 --kms-key-arn arn:aws:kms:us-west-2:301721915996:key/c60b5a31-1a5c-4d73-93ee-67586d0eb90d

Actual results:

It is failed. Here is the install log 
http://pastebin.test.redhat.com/1100008

Expected results:

There should be a detailed error message for the KMS that has no installer role

Additional info:

It can be successful if set install role arn to KMS key 
  {
    "Version": "2012-10-17",
    "Id": "key-default-1",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                   "arn:aws:iam::301721915996:role/ying16-Installer-Role",
                    "arn:aws:iam::301721915996:root"
                ]
            },
            "Action": "kms:*",
            "Resource": "*"
        }
    ]
}

Bug OCPBUGS-19187: Update 4.15 ose-libvirt-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-libvirt/pull/262

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-libvirt/pull/262

Bug OCPBUGS-19236: Update 4.15 cluster-network-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-network-operator/pull/2006

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-network-operator/pull/2006

Bug OCPBUGS-19248: Update 4.15 telemeter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/telemeter/pull/480

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/telemeter/pull/480

Bug OCPBUGS-19698: Multi-egress source route entries do not get properly updated with adminpolicybasedexternalroutes CR

View the Description View the linked PRs

Description of problem:

Multi-egress source route entries do not get properly updated with adminpolicybasedexternalroutes CR

Version-Release number of selected component (if applicable):

Upstream ovn-kubernetes commit c60963123d28075288a8c23d2796c2df89f54601

How reproducible (100%):

Create a served/application pod after creating the adminpolicybasedexternalroutes CR. The corresponding source route entries wont be added to the worker routing table

Steps to Reproduce:

1. Create a ovn-kubernetes kind cluster:
./kind.sh --install-cni-plugins --disable-snat-multiple-gws --multi-network-enable
2. Create two namespaces:
$ cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Namespace
metadata:
  name: frr
  labels:     
    gws: "true"
spec: {}
---
apiVersion: v1
kind: Namespace
metadata:
  name: bar
  labels:
    multiple_gws: "true"
spec: {}
EOF

3. Create a network attachment definition:
$ cat <<EOF | kubectl apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: internal-net
  namespace: frr
spec:
  config: |-
    {
      "cniVersion": "0.3.1",
      "name": "internal-net",
      "plugins": [
        {
          "type": "macvlan",
          "master": "breth0",
          "mode": "bridge",
          "ipam": {
            "type": "static"
          }
        },
        {
          "capabilities": {
            "mac": true,
            "ips": true
          },
          "type": "tuning"
        }
      ]
    }
EOF

4. Create the first dummy pod:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dummy1
  namespace: bar
spec:
  containers:
  - name: dummy
    image: centos
    command:
      - sleep
      - infinity
  nodeSelector:
    kubernetes.io/hostname: ovn-worker2
EOF

5. Create the AdminPolicyBasedExternalRoute CR:
$ cat <<EOF | kubectl apply -f -
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
  name: honeypotting
spec:
## gateway example
  from:
    namespaceSelector:
      matchLabels:
          multiple_gws: "true"
  nextHops:       
    dynamic:
      - podSelector:
          matchLabels:
            gw: "true"
        bfdEnabled: true
        namespaceSelector:
          matchLabels:
            gws: "true"
        networkAttachmentName: frr/internal-net
EOF

6. Create the lb pod:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: ext-gw
  labels:
    gw: "true"
  namespace: frr
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
        {
          "name": "internal-net",
          "ips": [ "172.18.0.10/16" ]
        }
      ]'
spec:
  containers:
  - name: frr
    image: centos
    command:
      - sleep
      - infinity
    securityContext:
      privileged: true
  nodeSelector:
    kubernetes.io/hostname: ovn-worker
EOF

7. Create a second dummy pod:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dummy2
  namespace: bar
spec:
  containers:
  - name: dummy
    image: centos
    command:
      - sleep
      - infinity
  nodeSelector:
    kubernetes.io/hostname: ovn-worker2
EOF

Actual results:

Only source route entries for the first dummy pod were created:

$ kubectl get po -o wide -n bar
dummy1   Running  10.244.1.3
dummy2   Running  10.244.1.4

$ POD=$(kubectl get pod -n ovn-kubernetes -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep ovnkube-db-) ; kubectl exec -ti $POD -n ovn-kubernetes -c nb-ovsdb -- bash

[root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker2
IPv4 Routes
Route Table <main>:
               10.244.1.3               172.18.0.10 src-ip exgw-rtoe-GR_ovn-worker2 ecmp-symmetric-reply bfd
         169.254.169.0/29             169.254.169.4 dst-ip rtoe-GR_ovn-worker2
            10.244.0.0/16                100.64.0.1 dst-ip
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_ovn-worker

Expected results:

Source route entries for both dummy pods created:
[root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker2
IPv4 Routes
Route Table <main>:
               10.244.1.3               172.18.0.10 src-ip exgw-rtoe-GR_ovn-worker2 ecmp-symmetric-reply bfd
               10.244.1.4               172.18.0.10 src-ip exgw-rtoe-GR_ovn-worker2 ecmp-symmetric-reply bfd
          169.254.169.0/29             169.254.169.4 dst-ip rtoe-GR_ovn-worker2
            10.244.0.0/16                100.64.0.1 dst-ip
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_ovn-worke

Additional info:

$ kubectl describe adminpolicybasedexternalroutes
...
Status:
  Last Transition Time:  2023-09-25T09:50:25Z
  Messages:
    Configured external gateway IPs: 172.18.0.10
    Status:  Success
Events:    <none>

https://github.com/openshift/ovn-kubernetes/pull/1923

Bug OCPBUGS-39288: UPI playbook failing due to missing metadata.json

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39287~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-39286~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39285. The following is the description of the original issue:
—
Description of problem: https://github.com/openshift/installer/pull/7727 changed the order of some playbooks and we're expected to run the network.yaml playbook before the metadata.json file is created. This isn't a problem with newer version of ansible, that will happily ignore missing var_files, however this is a problem with older ansible that fail with:

[cloud-user@installer-host ~]$ ansible-playbook -i "/home/cloud-user/ostest/inventory.yaml" "/home/cloud-user/ostest/network.yaml"

PLAY [localhost] *****************************************************************************************************************************************************************************************************************************
ERROR! vars file metadata.json was not found                                                                                       
Could not find file on the Ansible Controller.                                                                                      
If you are using a module and expect the file to exist on the remote, see the remote_src option

https://github.com/openshift/installer/pull/9047

Bug OCPBUGS-48280: OWNERS update

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-48202~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-47769~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-47726. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-47527. The following is the description of the original issue:
—
Description of problem:

  OWNERS file updated to include prabhakar and Moe as owners and reviewers

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    This is to fecilitate easy backport via automation

https://github.com/openshift/openshift-controller-manager/pull/359

Bug OCPBUGS-36577: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/6753

Bug OCPBUGS-18847: Update 4.15 ose-multus-whereabouts-ipam-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/whereabouts-cni/pull/192

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/whereabouts-cni/pull/192

Bug OCPBUGS-19119: Update 4.15 cluster-policy-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-policy-controller/pull/131

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-policy-controller/pull/131

Bug OCPBUGS-19228: Update 4.15 ose-machine-api-provider-gcp image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-gcp/pull/58

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-gcp/pull/58

Bug OCPBUGS-27035: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource-operator/pull/96

Bug OU-261: monitoring-plugin: The "Overwriting current silence" message should have padding

View the Description View the linked PRs

The "Overwriting current silence" information alert should have padding to be consistent with other alert messages.

https://github.com/openshift/monitoring-plugin/pull/74

Bug OCPBUGS-18707: Inadvertent peering of alertmanager instances during upgrade

View the Description View the linked PRs

Description of problem:

Cluster and userworkload alertmanager instances inadvertenly become peered during upgrade

Version-Release number of selected component (if applicable):

How reproducible:

infrequently - customer observed this on 3 cluster out of 15

Steps to Reproduce:

Deploy userworkload monitoring 

~~~
 config.yaml: |
    enableUserWorkload: true
    prometheusK8s:
~~~

Deploy user workload alertmanager  

~~~
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    alertmanager:
      enabled: true 
~~~

upgrade the cluster
verify the state of the alertmanager clusters: 

~~~
$ oc exec -n openshift-monitoring alertmanager-main-0 -- amtool cluster show -o json --alertmanager.url=http://localhost:9093
~~~

Actual results:

alertmanager show 4 peers

Expected results:

we should have 2 pairs

Additional info:

Mitigation steps: 

Scaling down one of the alertmanager statefulsets to 0 and then scaling up again restores the expected configuration (i.e. 2 separate alertmanager clusters)

- the customer then added networkpolicies to prevent alertmanager gossip between namespaces.

https://github.com/openshift/prometheus-operator/pull/255

Bug OCPBUGS-24143: Update 4.15 ose-machine-api-provider-gcp-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-gcp/pull/72

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-gcp/pull/72

Bug OCPBUGS-26510: CCO reports wrong credentials mode in metrics

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26488~~. The following is the description of the original issue:
—
Description of problem:

CCO reports credsremoved mode in metrics when the cluster is actually in the default mode. 
See https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/47349/rehearse-47349-pull-ci-openshift-cloud-credential-operator-release-4.16-e2e-aws-qe/1744240905512030208 (OCP-31768).

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always.

Steps to Reproduce:

1. Creates an AWS cluster with CCO in the default mode (ends up in mint)
2. Get the value of the cco_credentials_mode metric

Actual results:

credsremoved

Expected results:

mint

Root cause:

The controller-runtime client used in metrics calculator (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/operator/metrics/metrics.go#L77) is unable to GET the root credentials Secret (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/operator/metrics/metrics.go#L184) since it is backed by a cache which only contains target Secrets requested by other operators (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/cmd/operator/cmd.go#L164-L168).

https://github.com/openshift/cloud-credential-operator/pull/646

Bug OCPBUGS-37182: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8770

Bug OCPBUGS-38969: Tooltip on Pipeline when expression is not shows

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36601~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-36260~~. The following is the description of the original issue:
—

Description of problem:

Tooltip on Pipeline whenexpression is not shows in Pipeline visualization.

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Create a Pipeline with whenExpression
navigate to the Pipeline details page
hover over the whenExpression diamond shape

Actual results:

When expression tooltip is not shows on hover

Expected results:

Should show When expression tooltip on hover

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Workaround:

Additional info:

https://github.com/openshift/console/pull/14194

Bug OCPBUGS-42780: Nodes to Node and subsequently pod to pod communication are repeatedly degrading despite multiple OVN DB rebuilds to fix the issue

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41551~~. The following is the description of the original issue:
—
Description of problem:

Bare Metal UPI cluster

Nodes lose communication with other nodes and this affects the pod communication on these nodes as well. This issue can be fixed with an OVN rebuild on the nodes db that are hitting the issue but eventually the nodes will degrade again and lose communication again. Note despite an OVN Rebuild fixing the issue temporarily Host Networking is set to True so it's using the kernel routing table. 

**update: observed on Vsphere with routingViaHost: false, ipForwarding: global configuration as well.

Version-Release number of selected component (if applicable):

 4.14.7, 4.14.30

How reproducible:

Can't reproduce locally but reproducible and repeatedly occurring in customer environment

Steps to Reproduce:

identify a host node who's pods can't be reached from other hosts in default namespaces ( tested via openshift-dns). observe curls to that peer pod consistently timeout. TCPdumps to target pod observe that packets are arriving and are acknowledged, but never route back to the client pod successfully. (SYN/ACK seen at pod network layer, not at geneve; so dropped before hitting geneve tunnel).

Actual results:

Nodes will repeatedly degrade and lose communication despite fixing the issue with a ovn db rebuild (db rebuild only provides hours/days of respite, no permanent resolve).

Expected results:

Nodes should not be losing communication and even if they did it should not happen repeatedly

Additional info:

What's been tried so far
========================

- Multiple OVN rebuilds on different nodes (works but node will eventually hit issue again)

- Flushing the conntrack (Doesn't work)

- Restarting nodes (doesn't work)

Data gathered
=============

- Tcpdump from all interfaces for dns-pods going to port 7777 (to segregate traffic)

- ovnkube-trace

- SOSreports of two nodes having communication issues before an OVN rebuild

- SOSreports of two nodes having communication issues after an OVN rebuild 

- OVS trace dumps of br-int and br-ex 


====

More data in nested comments below.

https://github.com/openshift/ovn-kubernetes/pull/2313

Bug OCPBUGS-19213: Update 4.15 ose-openstack-cinder-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/133

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/133

Bug OCPBUGS-23723: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1604

Bug OCPBUGS-35355: ironic.service fails to start on bootstrap node when provisioning network is disabled

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35235~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34493~~. The following is the description of the original issue:
—
Description of problem:

Failed to deploy baremetal cluster as cluster nodes are not introspected

Version-Release number of selected component (if applicable):

4.15.15

How reproducible:

periodically

Steps to Reproduce:

    1. Deploy baremetal dualstack cluster with disabled provisioning network
    2.
    3.

Actual results:

Cluster fails to deploy as ironic.service fails to start on the bootstrap node:

[root@api ~]# systemctl status ironic.service
○ ironic.service - Ironic baremetal deployment service
     Loaded: loaded (/etc/containers/systemd/ironic.container; generated)
     Active: inactive (dead)

May 27 08:01:05 api.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: Dependency failed for Ironic baremetal deployment service.
May 27 08:01:05 api.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: ironic.service: Job ironic.service/start failed with result 'dependency'.

Expected results:

ironic.service is started, nodes are introspected and cluster is deployed

Additional info:

https://github.com/openshift/installer/pull/8580

Bug OCPBUGS-42471: Need to allow blank for Project/namespace when setting SA Subject in 'Project access tab'

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41709~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-39109~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38011. The following is the description of the original issue:
—
Description of problem:

Until OCP 4.11, the form with Name and Role in 'Dev Console -> Project -> Project Access tab' seems to have been changed to the form of Subject, Name, and Role through ~~OCPBUGS-7800~~. Here, when the Subject is ServiceAccount, the Save button is not available unless Project is selected.

This seems to be a requirement to set Project/namespace.However, in the CLI, RoleBinding objects can be created without namespace with no issues.

$ oc describe rolebinding.rbac.authorization.k8s.io/monitor
Name: monitor
Labels: <none>
Annotations: <none>
Role:
Kind: ClusterRole
Name: view
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount monitor

—

This is inconsistent with the dev console, causing confusion for developers and administrators and making things cumbersome for administrators.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Login to the web console for Developer.
    2. Select Project on the left.
    3. Select 'Project Access' tab.
    4. Add  access -> Select Sevice Account on the dropdown

Actual results:

   Save button is not active when no project is selected

Expected results:

    The Save button is enabled even though the Project is not selected, so that it can be created just as it is handled in the CLI.

Additional info:

https://github.com/openshift/console/pull/14332

Bug OCPBUGS-18246: Azure AD Workload Identity does not work with bring your own vnet

View the Description View the linked PRs

Description of problem:

Role assignment for Azure AD Workload Identity performed by ccoctl does not provide an option to scope role assignments to a resource group containing customer vnet in a byo vnet installation workflow.

https://docs.openshift.com/container-platform/4.13/installing/installing_azure/installing-azure-vnet.html

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Create Azure resource group and vnet for OpenShift within that resource group.
2. Create Azure AD Workload Identity infrastructure with ccoctl.
3. Follow steps to configure existing vnet for installation setting networkResourceGroupName within the install config.
4. Attempt cluster installation.

Actual results:

Cluster installation fails.

Expected results:

Cluster installation succeeds.

Additional info:

ccoctl must be extended to accept a parameter specifying the network resource group name and scope relevant component role assignments to the network resource group in addition to the installation resource group.

https://github.com/openshift/cloud-credential-operator/pull/597

Bug OCPBUGS-32435: [release-4.15] Ingress traffic degradation after upgrade to 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32044~~. The following is the description of the original issue:
—
Description of problem:

We have an escalation for a customer case where after upgrading to OCP 4.14 they started to see application traffic degradation that seems to be related to the new version of the HAProxy that changed from 2.2.24 to 2.6.13. 

Was already tested by the customer that if the router pods use the old haproxy-router image from OCP 4.12 the issue disappears.

What was observed is that router pods unexpectedly close HTTP keep-alive connection sending a FIN packet while the client is still sending HTTP requests.

Version-Release number of selected component (if applicable):

4.14.16 (HAProxy 2.6)

How reproducible:

Only on customer clusters.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Router pods are unexpectedly terminating Keep-alive connections.

Expected results:

Router pods should not terminate a keep-live connection when requests are still coming.

Additional info:

- Was already tried to change the hard-stop to 20m
- Was already tried to change the reload interval to the maximum (2m)
- Was already tried to set `no option idle-close-on-response` in the defualt section of the HAProxy configuration
- Was already verified that content-length headers have a value grater than 0 in the HTTP requests.

https://github.com/openshift/router/pull/579

Story HOSTEDCP-1438: Preserve container resources for more hosted control plane components

View the Description View the linked PRs

This is an extension of https://issues.redhat.com/browse/HOSTEDCP-190, in which we are adding container resource preservation to more hosted control plane components.

https://github.com/openshift/hypershift/pull/3828

Bug OCPBUGS-26195: regression - aws-ebs-csi-driver-node- fails to deploy too many times because of SCCs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25125~~. The following is the description of the original issue:
—
Description of problem:

 The `aws-ebs-csi-driver-node-` appears to be failing to deploy way too often in the CI recently

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

  in a statistically significant pattern

Steps to Reproduce:

    1. run OCP test suite many times for it to matter

Actual results:

    fail [github.com/openshift/origin/test/extended/authorization/scc.go:76]: 1 pods failed before test on SCC errors
Error creating: pods "aws-ebs-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider restricted-v2: .containers[0].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[0].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[1].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[1].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[1].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[2].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[2].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/aws-ebs-csi-driver-node -n openshift-cluster-csi-drivers happened 4 times

Expected results:

Test pass

Additional info:

Link to the regression dashboard - https://sippy.dptools.openshift.org/sippy-ng/component_readiness/capability?baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=SCC&component=oauth-apiserver&confidence=95&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&pity=5&sampleEndTime=2023-12-11%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2023-12-05%2000%3A00%3A00

[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]

https://github.com/openshift/csi-operator/pull/93

Story TRT-1329: Port Kube event intervals and pathological framework to structured

View the Description View the linked PRs

This is not going to be pretty. Likely mostly a re-implementation given the way everything was coded to use regexes that depend on the old locator and keys in specific orders. We need a new way to define matchers that uses structured intervals.

We also have some very complex logic around hashing the message to get it into the locator. Possible duplication between watchevents/event.go and duplicated_events.go.

Will be quite delicate and probably very time consuming.

https://github.com/openshift/origin/pull/28399

Bug OCPBUGS-17841: GCP SNO installation fails because redirect ipt doesn't take effect on SGW

View the Description View the linked PRs

I tried upgrading a 4.14 SNO cluster from one nightly image to another and, while on AWS the upgrade works fine, it fails on GCP.

Cluster Network Operator successfully upgrades ovn-kubernetes, but is stuck on cloud network config controller, which is on crash loop back off state because it receives a wrong IP address from the name server when trying to reach the API server. The node IP is actually 10.0.0.3 and the name server returns 10.0.0.2, which I suspect is the bootstrap node IP, but that's only my guess.

Some relevant logs:

$ oc get co network
network                                    4.14.0-0.nightly-2023-08-15-200133   True        True          False      86m     Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is not available (awaiting 1 nodes)

$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                                     READY   STATUS    RESTARTS       AGE   IP         NODE                                 NOMINATED NODE   READINESS GATES ovnkube-control-plane-844c8f76fb-q4tvp   2/2     Running   3              24m   10.0.0.3   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none> ovnkube-node-24kb7                       10/10   Running   12 (13m ago)   25m   10.0.0.3   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>

$ oc get pods -n openshift-cloud-network-config-controller -o wide
openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69               0/1     CrashLoopBackOff   15 (2m37s ago)   40m    10.128.0.141   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>

$ oc logs -n openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69  W0816 11:06:00.666825       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. F0816 11:06:30.673952       1 main.go:345] Error building controller runtime client: Get "https://api-int.ci-ln-rij2p1b-72292.gcp-2.ci.openshift.org:6443/api?timeout=32s": dial tcp 10.0.0.2:6443: i/o timeout

I also get 10.0.0.2 if I run a DNS query from the node itself or from a pod:

dig api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org
...
;; ANSWER SECTION:
api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org. 60 IN A 10.0.0.2

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always.

Steps to Reproduce:

1.on clusterbot: launch 4.14 gcp,single-node
2. on a terminal: oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-08-15-200133 --allow-explicit-upgrade --force

Actual results:

name server returns 10.0.0.2, so CNCC fails to reach the API server

Expected results:

name server should return 10.0.0.3

Must-gather: https://drive.google.com/file/d/1MDbsMgIQz7dE6e76z4ad95dwaxbSNrJM/view?usp=sharing

I'm assigning this bug first to the network edge team for a first pass. Please do reassign it if necessary.

https://github.com/openshift/machine-config-operator/pull/3953

Bug OCPBUGS-22869: OVN secondary network annotation timeout in hosted pod using Kubevirt provider

View the Description View the linked PRs

Description of problem:

A net-attach-def using "type: ovn-k8s-cni-overlay, topology:layer2"
does not work in a hosted pod when using the Kubevirt provider.

Note: As a general hosted multus sanity check, using a "type: bridge" NAD does work properly in a hosted pod and both interfaces start as expected:
  Normal  AddedInterface  86s   multus             Add eth0 [10.133.0.21/23] from ovn-kubernetes
  Normal  AddedInterface  86s   multus             Add net1 [192.0.2.193/27] from default/bridge-net

Version-Release number of selected component (if applicable):

OCP 4.14.1
CNV 4.14.0-2385

How reproducible:

Reproduced w/ multiple attempts when using OVN secondary network

Steps to Reproduce:

1. Create the NAD on the hosted Kubevirt cluster:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: l2-network
spec:
  config: |-
    {
      "cniVersion": "0.3.1",
      "name": "l2-network",
      "type": "ovn-k8s-cni-overlay",
      "topology":"layer2",
      "netAttachDefName": "default/l2-network"
    }

2. Create a hosted pod w/ that net annotation:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks:  '[
      {
        "name": "l2-network",
        "interface": "net1",
        "ips": [
          "192.0.2.22/24"
          ]
      }
    ]'
  name: debug-ovnl2-c
  namespace: default
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
    runAsNonRoot: true
    runAsUser: 1000
  containers:
  - name: debug-ovnl2-c
    command:
    - /usr/bin/bash
    - -x
    - -c
    - |
      sleep infinity
    image: quay.io/cloud-bulldozer/uperf:latest
    imagePullPolicy: Always
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
  nodeSelector:
    kubernetes.io/hostname: kv1-a8a5d7f1-9xwm4

3. Pod remains in ContainerCreating because it cannot create the net1 iface, pod describe event logs:

Events:
  Type     Reason                  Age    From               Message
  ----     ------                  ----   ----               -------
  Normal   Scheduled               4m21s  default-scheduler  Successfully assigned default/debug-ovnl2-c to kv1-a8a5d7f1-9xwm4
  Warning  FailedCreatePodSandBox  2m20s  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_debug-ovnl2-c_default_1b42bc5a-1148-49d8-a2d0-7689a46f59ea_0(1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73): error adding pod default_debug-ovnl2-c to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73 Netns:/var/run/netns/5da048e3-b534-481d-acc6-2ddc6a439586 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689a46f59ea Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73" Netns:"/var/run/netns/5da048e3-b534-481d-acc6-2ddc6a439586" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689a46f59ea" Path:"" ERRORED: error configuring pod [default/debug-ovnl2-c] networking: [default/debug-ovnl2-c/1b42bc5a-1148-49d8-a2d0-7689a46f59ea:l2-network]: error adding container to network "l2-network": CNI request failed with status 400: '[default/debug-ovnl2-c 1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73 network l2-network NAD default/l2-network] [default/debug-ovnl2-c 1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73 network l2-network NAD default/l2-network] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
'
  Warning  FailedCreatePodSandBox  19s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_debug-ovnl2-c_default_1b42bc5a-1148-49d8-a2d0-7689a46f59ea_0(48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb): error adding pod default_debug-ovnl2-c to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb Netns:/var/run/netns/cae8fab7-80c2-40b7-b1a7-49c8fc8732b2 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689a46f59ea Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb" Netns:"/var/run/netns/cae8fab7-80c2-40b7-b1a7-49c8fc8732b2" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689
a46f59ea" Path:"" ERRORED: error configuring pod [default/debug-ovnl2-c] networking: [default/debug-ovnl2-c/1b42bc5a-1148-49d8-a2d0-7689a46f59ea:l2-network]: error adding container to network "l2-network": CNI request failed with status 400: '[default/debug-ovnl2-c 48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb network l2-network NAD default/l2-network] [default/debug-ovnl2-c 48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb network l2-network NAD default/l2-network] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
'
  Normal  AddedInterface  18s (x3 over 4m20s)  multus  Add eth0 [10.133.0.21/23] from ovn-kubernetes

Actual results:

Pod cannot start

Expected results:

Pod can start with additional "ovn-k8s-cni-overlay" network

Additional info:

Slack thread: https://redhat-internal.slack.com/archives/C02UVQRJG83/p1698857051578159
I did confirm the same NAD and pod definition start fine on the management cluster.

https://github.com/openshift/cluster-network-operator/pull/2113

Bug OCPBUGS-11286: Installed Operators page crashes with "Oh no! Something went wrong." error

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

OCP 4.13.0-0.nightly-2023-03-23-204038
ODF 4.13.0-121.stable

How reproducible:

Steps to Reproduce:

1. Installed ODF over OCP, everything was fine on the Installed Operators page.
2. Later when checked Installed Operators page, it crashed with "Oh no! Something went wrong" error.
3.

Actual results:

 Installed Operators page crashes with "Oh no! Something went wrong." error

Expected results:

 Installed Operators page shouldn't crash

Component and Stack trace logs from the console page- http://pastebin.test.redhat.com/1096522

Additional info:

https://github.com/openshift/console/pull/12810

Bug OCPBUGS-20063: Regenerating the machine config operator certificates can panic on vSphere

View the Description View the linked PRs

Description of problem:

An infra object in some vsphere deployments can look like this:

~]$ oc get infrastructure cluster -o json | jq .status
{
  "apiServerInternalURI": "xxx",
  "apiServerURL": "xxx",
  "controlPlaneTopology": "HighlyAvailable",
  "etcdDiscoveryDomain": "",
  "infrastructureName": "xxx",
  "infrastructureTopology": "HighlyAvailable",
  "platform": "VSphere",
  "platformStatus": {
    "type": "VSphere" 
  }
}

Which if we attempt to run the regenerate MCO command in https://access.redhat.com/articles/regenerating_cluster_certificates will cause a panic

Version-Release number of selected component (if applicable):

4.10.65
4.11.47
4.12.29
4.13.8
4.14.0
4.15

How reproducible:

100%

Steps to Reproduce:

1. Run procedure on cluster with above infra
2.
3.

Actual results:

panic

Expected results:

no panic

Additional info:

https://github.com/openshift/oc/pull/1555

Bug OCPBUGS-36024: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/311

Bug OCPBUGS-18858: Update 4.15 golang-github-openshift-oauth-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-proxy/pull/265

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-proxy/pull/265

Bug OCPBUGS-18902: Internal Registry Secrets merge causing excessive API calls

View the Description View the linked PRs

In this recent PR that merged, a number of API calls do not use caches causing excessive calls.

Done when:

-Change all Get() calls to use listers

-API call metric should decrease

https://github.com/openshift/machine-config-operator/pull/3912

Bug OCPBUGS-25751: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/314

Bug OCPBUGS-32845: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/396

Bug OCPBUGS-36817: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-version-operator)

View the Description View the linked PRs

Backport to 4.15 of ~~OCPBUGS-35007~~ specifically for the cluster-version-operator

All workloads of the following namespaces need SCC pinning:

openshift-cluster-version

https://github.com/openshift/cluster-version-operator/pull/1068

Bug OCPBUGS-30595: [release-4.15] CAPI E2Es failing to start in some CAPI provider's release branches

View the Description View the linked PRs

Description of problem:

CAPI E2Es failing to start in some CAPI provider's release branches.

Failing with the following error:

`go: errors parsing go.mod:94/tmp/tmp.ssf1LXKrim/go.mod:5: unknown directive: toolchain`

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-api/199/pull-ci-openshift-cluster-api-master-e2e-aws-capi-techpreview/1765512397532958720#1:build-log.txt%3A91-95

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

  This is because the script launching the e2e is launching it from the `main` branch of the cluster-capi-operator (which has some backward incompabible go toolchain changes), rather than the correctly matching release branch.

Bug OCPBUGS-36719: [release-4.15] ServiceMonitor proxyUrl validation missing which breaks reload and restart of Prometheus

View the Description View the linked PRs

Description of problem:

Injecting ServiceMonitor CR's having invalid syntax on the proxyUrl attribute will not be valdidated and rejected but will break reloading and restarting of Prometheus due to possible invalid syntax.

Version-Release number of selected component (if applicable):

4.12.x 4.13.x 4.14.x

How reproducible:

always

Steps to Reproduce:

    1. Inject custom ServiceMonitor CR with an invalid proxyUrl (example: 'http://xxx-${STAGE}.svc.cluster.local:80')

Complete CR

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:  
  name: servicemonitor  
  namespace: xxxx
spec:  
  endpoints:    
    - path: /actuator/prometheus-text      
      port: http-web      
      proxyUrl: 'http://xxx-${STAGE}.svc.cluster.local:80'            
      relabelings:        
       - action: labeldrop          
         regex: pod        
       - action: drop          
         regex: destination          
         sourceLabels:
           - reporter        
       - action: drop          
         regex: ^envoy_.*          
         sourceLabels:
           - __name__  
  selector:
    matchLabels:
      monitored: prometheus

Actual results:

ts=2024-03-13T13:42:47.471Z caller=main.go:928 level=error msg="Error reloading config" err="couldn't load configuration (--config.file=\"/etc/prometheus/config_out/prometheus.env.yaml\"): parsing YAML file /etc/prometheus/config_out/prometheus.env.yaml: parse \http://XXXX-${STAGE}.svc.cluster.local:80\: invalid character \"{\" in host name"

Expected results:

successful reload

Additional info:

https://github.com/openshift/prometheus-operator/pull/298

Bug OCPBUGS-15087: /sysroot mountpoint failed to resize automatically on new nodes during machineset scaleup

View the Description View the linked PRs

Description of problem:
New machines got stuck in Provisioned state when the customer tried to scale the machineset.
~~~
NAME PHASE TYPE REGION ZONE AGE
ocp4-ftf8t-worker-2-wn6lp Provisioned 44m
ocp4-ftf8t-worker-redhat-x78s5 Provisioned 44m
~~~

Upon checking the journalctl logs from these VMs, we noticed that it was failing with "no space left on the device" errors while pulling images.

To troubleshoot the issue further we had to break root password in order to login and check the issue further.

Once root password was broken, we logged in to the system and check journalctl logs for failure errors.
We could see "no space left of device" for image pulls. Checking df -h output we could see /dev/sda4 (/dev/mapper/coreos-luks-root-nocrypt) which is mounted on /sysroot was 100% full.
As image would fail to get pulled, the machine-config-daemon-firstboot.service will not get completed. This would not allow us to get the node to 4.12, nor be part of the cluster.
The rest of the errors were side effect of the "no space left on device" error.
We could see that the /dev/sda4 was correctly partitioned to 120Gib. We compared to the working system and partition scheme matched.
The filesystem was only of 2.8 Gib instead of 120 Gib.
We manually extended the filesystem for / (xfs_growfs /) after which / mount was resized to 120Gib.
The node got rebooted once this step was performed and system came up fine with 4.12 Red Hat Coreos.
We waited for a while for the node to come up with kubelet and crio running, approved the certs and now the node is part of the cluster.

Later while checking the logs for RCA, we observed below errors from the logs which might help in determining why the sysroot mountpoint was not resized.
~~~
$ grep ~~i growfs sos_commands/logs/journalctl_no-pager_~~-since_-3days
Jun 12 10:37:30 ocp4-ftf8t-worker-2-wn6lp systemd[1]: ignition-ostree-growfs.service: Failed to load configuration: No such file or directory <---
Jun 12 10:37:30 ocp4-ftf8t-worker-2-wn6lp systemd[1]: ignition-ostree-growfs.service: Collecting.
~~~

Version-Release number of selected component (if applicable):
OCP 4.12.18.
IPI installation on RHV.

How reproducible:
Not able to reproduce the issue.

Steps to Reproduce:

1.
2.
3.

Actual results:
The /sysroot mountpoint was not resized to the actual size of the /dev/sda4 partition which further prevented the machine-config-daemon-firstboot.service from completing and the node was stuck at RHCOS version 4.6.

Currently the customer has to manually resize the /sysroot mountpoint everytime he adds a new node in the cluster as a workaround.

Expected results:
The /sysroot mountpoint should be automatically resized as a part of ignition-ostree-growfs.sh script.

Additional info:
The customer has recently migrated from old storagedomain to a new one on RHV if that matters? However they performed successful machineset scaleup tests with the new storagedomain on OCP 4.11.33 (before upgrading OCP).
They started facing issue with all the machinesets (new/existing) only after they upgraded the OCP version to 4.12.18.

https://github.com/openshift/machine-config-operator/pull/3865

Bug OCPBUGS-19364: Update 4.15 ose-multus-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/multus-cni/pull/183

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/multus-cni/pull/183

Bug OCPBUGS-38400: Cannot use new proxy settings in Alertmanager configuration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38399~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-38398~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38174. The following is the description of the original issue:
—
Description of problem:

The prometheus operator fails to reconcile when proxy settings like no_proxy are set in the Alertmanager configuration secret.

Version-Release number of selected component (if applicable):

4.15.z and later

How reproducible:

    Always when AlertmanagerConfig is enabled

Steps to Reproduce:

    1. Enable UWM with AlertmanagerConfig
    enableUserWorkload: true
    alertmanagerMain:
      enableUserAlertmanagerConfig: true
    2. Edit the "alertmanager.yaml" key in the alertmanager-main secret (see attached configuration file)
    3. Wait for a couple of minutes.

Actual results:

Monitoring ClusterOperator goes Degraded=True.

Expected results:

No error

Additional info:

The Prometheus operator logs show that it doesn't understand the proxy_from_environment field.

https://github.com/openshift/prometheus-operator/pull/300

Bug OCPBUGS-23209: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4026

Bug OCPBUGS-23923: Pipelinerun task logs switcher not working

View the Description View the linked PRs

Description of problem:

Pipelinerun task log switcher is stuck and is not loading the respective task logs when you switch from one task to another.

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Always

Steps to Reproduce:

    1. Create a pipeline with multiple tasks.
    2. Start the pipeline and go to the logs page
    3. Switch between the tasks to see its logs.

Actual results:
Not able to click the task on the left hand side and the logs widow is showing blank screen.

Expected results:

Should be able to switch between the tasks and selected task logs should be shown in the log window

Attached Video:

https://drive.google.com/file/d/1pPQm9YYyWZxfCwFnudviSCyqoPHn8D9x/view?usp=sharing

https://github.com/openshift/console/pull/13369

Bug OCPBUGS-27306: [release-4.15] Console plugin proxy changes status code to 200

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26933~~. The following is the description of the original issue:
—
Description of problem:

Console is overriding status code of HTTP requests proxied to dynamic plugin services

Version-Release number of selected component (if applicable):

4.15.0-ec.2

How reproducible:

Always

Steps to Reproduce:

1. Create an OpenShift 4.15.0-ec.2 cluster or newer
2. Install ACM 2.9.1 from OperatorHub and create a MultiClusterHub operand
3. Expose the plugin service: oc -n multicluster-engine expose service console-mce-console
4. Set tls.termination to passthrough on route/console-mce-console
5. Compare responses from curling the proxy and the service directly.

Actual results:

$ curl -k -D - -H "Cookie: <REDACTED>" -H "If-Modified-Since: Thu, 07 Dec 2023 14:45:30 GMT" https://console-openshift-console.apps.kevin-415.dev02.red-chesterfield.com/api/plugins/mce/plugin-manifest.json
HTTP/1.1 200 OK
cache-control: no-cache
date: Wed, 10 Jan 2024 21:16:10 GMT
last-modified: Thu, 07 Dec 2023 14:45:30 GMT
referrer-policy: strict-origin-when-cross-origin
x-content-type-options: nosniff
x-dns-prefetch-control: off
x-frame-options: DENY
x-xss-protection: 1; mode=block
content-length: 0 

curl -k -D - -H "If-Modified-Since: Thu, 07 Dec 2023 14:45:30 GMT" https://console-mce-console-multicluster-engine.apps.kevin-415.dev02.red-chesterfield.com/plugin/plugin-manifest.json
HTTP/2 304 
cache-control: no-cache
last-modified: Thu, 07 Dec 2023 14:45:30 GMT
date: Wed, 10 Jan 2024 21:26:33 GMT

Expected results:

Response code of 304 should be returned by the proxy route, not changed to 200.

Additional info:

Introduced by https://github.com/openshift/console/pull/13272

https://github.com/openshift/console/pull/13518

Bug OCPBUGS-29419: CPMS leaves only 2 masters during update

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29249~~. The following is the description of the original issue:
—
Observed during testing of candidate-4.15 image as of 2024-02-08.

This is an incomplete report as I haven't verified the reproducer yet or attempted to get a must-gather. I have observed this multiple times now, so I am confident it's a thing. I can't be confident that the procedure described here reliably reproduces it, or that all the described steps are required.

I have been using MCO to apply machine config to masters. This involves a rolling reboot of all masters.

During a rolling reboot I applied an update to CPMS. I observed the following sequence of events:

master-1 was NotReady as it was rebooting
I modified CPMS
CPMS immediately started provisioning a new master-0
CPMS immediately started deleting master-1
CPMS started provisioning a new master-1

At this point there were only 2 nodes in the cluster:

old master-0
old master-2

and machines provisioning:

new master-0
new master-1

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/280

Bug OCPBUGS-31461: Deleting the node with the Ingress VIP using oc delete node causes a keepalived split-brain

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25744~~. The following is the description of the original issue:
—
Description of problem:

Deleting the node with the Ingress VIP using oc delete node causes a keepalived split-brain

Version-Release number of selected component (if applicable):

4.12, 4.14

How reproducible:

100%

Steps to Reproduce:

1. In an OpenShift cluster installed via vSphere IPI, check the node with the Ingress VIP.
2. Delete the node.
3. Check the discrepancy between machines objects and nodes. There will be more machines than nodes.
4. SSH to the deleted node, and check the VIP is still mounted and keepalived pods are running.
5. Check the VIP is also mounted in another worker.
6. SSH to the node and check the VIP is still present.

Actual results:

The deleted node still has the VIP present and the ingress fails sometimes

Expected results:

The deleted node should not have the VIP present and the ingress should not fail.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4290

Bug OCPBUGS-31754: Invalid memory address or nil pointer dereference in Cloud Network Config Controller

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27422~~. The following is the description of the original issue:
—
Description of problem:

  Invalid memory address or nil pointer dereference in Cloud Network Config Controller

Version-Release number of selected component (if applicable):

    4.12

How reproducible:

    sometimes

Steps to Reproduce:

    1. Happens by itself sometimes
    2.
    3.

Actual results:

    Panic and pod restarts

Expected results:

    Panics due to Invalid memory address or nil pointer dereference  should not occur

Additional info:

    E0118 07:54:18.703891 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 93 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x203c8c0?, 0x3a27b20})
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?})
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x203c8c0, 0x3a27b20})
/usr/lib/golang/src/runtime/panic.go:884 +0x212
github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0
github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe})
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?})
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25
created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1a40b30]
goroutine 93 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?})
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x203c8c0, 0x3a27b20})
/usr/lib/golang/src/runtime/panic.go:884 +0x212
github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0
github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe})
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?})
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25
created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa

https://github.com/openshift/cloud-network-config-controller/pull/137

Bug OCPBUGS-27117: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/653

Bug OCPBUGS-31036: Mitigate AWSMachine controller leaks EC2 instances when specifying a Name tag in additionalTags

View the Description View the linked PRs

Description of problem:

Filed upstream https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/4629 affects v1beta2 AWSMachines

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Supply an additional tag with the key `Name` in AWS
2. Watch many EC2 instances get created

Actual results:

Expected results:

Only one EC2 instance gets created per AWSMachine (or the API does not allow tags with the key: 'Name' to be specified)

Additional info:

https://github.com/openshift/cluster-api-provider-aws/pull/501

Bug OCPBUGS-42301: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer/pull/907

Bug OCPBUGS-43656: Image registry operator becomes degraded when setting management state to Removed when networkAccess is set to Internal

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43555~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43350~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42732. The following is the description of the original issue:
—
Description of problem:

    The operator cannot succeed removing resources when networkAccess is set to Removed.
    It looks like the authorization error changes from bloberror.AuthorizationPermissionMismatch to bloberror.AuthorizationFailure after the storage account becomes private (networkAccess: Internal).
    This is either caused by weird behavior in the azure sdk, or in the azure api itself.
    The easiest way to solve it is to also handle bloberror.AuthorizationFailure here: https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/storage/azure/azure.go?plain=1#L1145

    The error condition is the following:

status:
  conditions:
  - lastTransitionTime: "2024-09-27T09:04:20Z"
    message: "Unable to delete storage container: DELETE https://imageregistrywxj927q6bpj.blob.core.windows.net/wxj-927d-jv8fc-image-registry-rwccleepmieiyukdxbhasjyvklsshhee\n--------------------------------------------------------------------------------\nRESPONSE
      403: 403 This request is not authorized to perform this operation.\nERROR CODE:
      AuthorizationFailure\n--------------------------------------------------------------------------------\n\uFEFF<?xml
      version=\"1.0\" encoding=\"utf-8\"?><Error><Code>AuthorizationFailure</Code><Message>This
      request is not authorized to perform this operation.\nRequestId:ababfe86-301e-0005-73bd-10d7af000000\nTime:2024-09-27T09:10:46.1231255Z</Message></Error>\n--------------------------------------------------------------------------------\n"
    reason: AzureError
    status: Unknown
    type: StorageExists
  - lastTransitionTime: "2024-09-27T09:02:26Z"
    message: The registry is removed
    reason: Removed
    status: "True"
    type: Available

Version-Release number of selected component (if applicable):

    4.18, 4.17, 4.16 (needs confirmation), 4.15 (needs confirmation)

How reproducible:

    Always

Steps to Reproduce:

    1. Get an Azure cluster
    2. In the operator config, set networkAccess to Internal
    3. Wait until the operator reconciles the change (watch networkAccess in status with `oc get configs.imageregistry/cluster -oyaml |yq '.status.storage'`)
    4. In the operator config, set management state to removed: `oc patch configs.imageregistry/cluster -p '{"spec":{"managementState":"Removed"}}' --type=merge`
    5. Watch the cluster operator conditions for the error

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1146

Bug OCPBUGS-18339: Machine API Operator vSphere controller references retired KCS for HW Version Migrations

View the Description View the linked PRs

Description of problem:

The vSphere code references a Red Hat solution that has been retired in favour of the code being merged into the official documentation.

https://github.com/openshift/machine-api-operator/blob/master/pkg/controller/vsphere/reconciler.go#L827

Version-Release number of selected component (if applicable):

4.11-4.13 + main

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

UI presents a message with solution customers can not access.

Hardware lower than 15 is not supported, clone stopped. Detected machine template version is 13. Please update machine template: https://access.redhat.com/articles/6090681

Expected results:

Should referenced official documentation: https://docs.openshift.com/container-platform/4.12/updating/updating-hardware-on-nodes-running-on-vsphere.html

Additional info:

Bug OCPBUGS-18862: Update 4.15 ironic-agent image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-agent-image/pull/88

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-agent-image/pull/88

Bug OCPBUGS-27155: Switch to using new image for KAS container bootstrap

View the Description View the linked PRs

Description of problem:

Manifests will be removed from CCO image so we have to start using CCA(cluster-config-api) image for bootstrap

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

  KAS bootstrap container fails

Expected results:

    KAS bootstrap container suceeds

Additional info:

https://github.com/openshift/hypershift/pull/3423

Bug OCPBUGS-32329: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4324

Bug OCPBUGS-21830: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/308

Bug OCPBUGS-27199: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2203

Bug OCPBUGS-29418: HCP Has No Signer For SRE Break-Glass Access

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29104~~. The following is the description of the original issue:
—
Description of problem:

    Only customers have a break-glass certificate signer.

Version-Release number of selected component (if applicable):

    4.16.0

How reproducible:

    Always

Steps to Reproduce:

    1.create CSR with any other signer chosen
    2.does not work
    3.

Actual results:

    does not work

Expected results:

    should work

Additional info:

https://github.com/openshift/hypershift/pull/3566

Bug OCPBUGS-29092: Add probes to node-network-operator

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24436~~. The following is the description of the original issue:
—
Description of problem:

    The node-network-identity deployment should conform to hypershift control plane expectations that all applicable containers should have a liveness probe, and a readiness probe if it is an endpoint for a service.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    No liveness or readiness probes

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2253

Bug OCPBUGS-31830: [4.15] /redfish/v1/Managers/1/VirtualMedia has gone from Lenovo servers

View the Description View the linked PRs

Description of problem:

Attempting to add Lenovo ThinkSystem SR675 V3 server to NodePool for future add it into Hosted Control PLanes Openshift node pool. ACM attempts to reboot server and checks if VirtualMedia ejected. Ironic checks if /redfish/v1/Managers/1 has VirtualMedia attribute and crashes.

2024-03-01 10:04:23.812 1 DEBUG sushy.connector [None req-1dc15b4e-24c3-49ff-a93f-5dfa55351560 - - - - - -] HTTP response for GET https://129.40.92.61:443/redfish/v1/Managers/1: status code: 200 _op /usr/lib/python3.9/site-packages/sushy/connector.py:283[00m2024-03-01 10:04:23.812 1 DEBUG sushy.resources.base [None req-1dc15b4e-24c3-49ff-a93f-5dfa55351560 - - - - - -] Received representation of Manager /redfish/v1/Managers/1: {'_actions': {'reset': {'allowed_values': ['GracefulRestart', 'ForceRestart'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Managers/1/Actions/Manager.Reset'}}, '_oem_vendors': ['Lenovo'], 'auto_dst_enabled': False, 'command_shell': {'connect_types_supported': ['SSH'], 'max_concurrent_sessions': 2, 'service_enabled': True}, 'description': 'This resource is used to represent a management subsystem for a Redfish implementation.', 'firmware_version': 'QGX314J 3.10 2023-09-15', 'graphical_console': {'connect_types_supported': ['KVMIP'], 'max_concurrent_sessions': 6, 'service_enabled': True}, 'identity': '1', 'links': {'oem_vendors': None}, 'manager_type': <ManagerType.BMC: 'BMC'>, 'model': 'Lenovo XClarity Controller 2', 'name': 'Manager', 'serial_console': {'connect_types_supported': ['IPMI', 'SSH'], 'max_concurrent_sessions': 2, 'service_enabled': True}, 'uuid': '6F1D76DE-BE9E-11EE-8CC4-0A3A88FFF8E0'} refresh /usr/lib/python3.9/site-packages/sushy/resources/base.py:694[00m2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager [None req-1dc15b4e-24c3-49ff-a93f-5dfa55351560 - - - - - -] Error in tear_down of node a16830d1-6ce2-44fa-ae6b-5a6fcad5ca14: The attribute VirtualMedia is missing from the resource /redfish/v1/Managers/1: sushy.exceptions.MissingAttributeError: The attribute VirtualMedia is missing from the resource /redfish/v1/Managers/12024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager Traceback (most recent call last):2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager   File "/usr/lib/python3.9/site-packages/ironic/conductor/manager.py", line 1083, in _do_node_tear_down2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager     task.driver.deploy.clean_up(task)2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager   File "/usr/lib/python3.9/site-packages/ironic_lib/metrics.py", line 60, in wrapped2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager     result = f(*args, **kwargs)2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager   File "/usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_base.py", line 773, in clean_up2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager     task.driver.boot.clean_up_ramdisk(task)2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager   File "/usr/lib/python3.9/site-packages/ironic/drivers/modules/redfish/boot.py", line 638, in clean_up_ramdisk2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager     self._eject_all(task)2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager   File "/usr/lib/python3.9/site-packages/ironic/drivers/modules/redfish/boot.py", line 728, in _eject_all2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager     _eject_vmedia(task, managers, sushy.VIRTUAL_MEDIA_CD)2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager   File "/usr/lib/python3.9/site-packages/ironic/drivers/modules/redfish/boot.py", line 267, in _eject_vmedia2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager     for v_media in manager.virtual_media.get_members():2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager   File "/usr/lib/python3.9/site-packages/sushy/utils.py", line 233, in func_wrapper2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager     cache_attr_val = res_accessor_method(res_selfie)2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager   File "/usr/lib/python3.9/site-packages/sushy/resources/manager/manager.py", line 196, in virtual_media2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager     self._conn, utils.get_sub_resource_path_by(self, 'VirtualMedia'),2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager   File "/usr/lib/python3.9/site-packages/sushy/utils.py", line 105, in get_sub_resource_path_by2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager     raise exceptions.MissingAttributeError(2024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager sushy.exceptions.MissingAttributeError: The attribute VirtualMedia is missing from the resource /redfish/v1/Managers/12024-03-01 10:04:23.813 1 ERROR ironic.conductor.manager [00m

Version-Release number of selected component (if applicable):

    OCP 4.14

How reproducible:

    Everytime

Steps to Reproduce:

    1. Add Lenovo ThinkSystem SR675 V3 / BMC Version 3.10 (Build ID: QGX314J) into the NodePool
    2.
    3.

Actual results:

    Host Inventory UI changes node state in Provisioning fisrt and then - into Error state

Expected results:

Additional info:

    Host is Provisioned

https://github.com/openshift/ironic-image/pull/470

Bug OCPBUGS-16482: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-17203: Mock apis of git repo for "test serverless function" tests

View the Description View the linked PRs

Description of problem:

Getting rate limit issue and other failures while running "test serverless function" tests

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13064

Bug OCPBUGS-19367: The console handler panics on baremetal 4.14.0-rc.0 ipv6 sno cluster

View the Description View the linked PRs

Description of problem:

baremetal 4.14.0-rc.0 ipv6 sno cluster, login as admin user to admin console, there is not Observe menu on the left navigation bar, see picture, https://drive.google.com/file/d/13RAXPxtKhAElN9xf8bAmLJa0GI8pP0fH/view?usp=sharing, monitoring-plugin status is Failed, see: https://drive.google.com/file/d/1YsSaGdLT4bMn-6E-WyFWbOpwvDY4t6na/view?usp=sharing, error is

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
r: Bad Gateway

checked console logs, 9443: connect: connection refused

$ oc -n openshift-console logs console-6869f8f4f4-56mbj
...
E0915 12:50:15.498589       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:15 http: panic serving [fd01:0:0:1::2]:39156: runtime error: invalid memory address or nil pointer dereference
goroutine 183760 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0003b5760, 0x2?, {0xc0009bc7d1, 0x11}, {0x3a41fa0, 0xc0002f6c40}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xaa00000000000010?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0002f6c40?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc0001f7500?, {0x3a41fa0?, 0xc0002f6c40?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0x5120938?, {0x3a41fa0?, 0xc0002f6c40?}, 0x7ffb6ea27f18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc001102c00?, {0x3a41fa0?, 0xc0002f6c40?}, 0xc000655a00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0002f6c40}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0002f6c40?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc0008201e0?}, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc0009b4120, {0x3a43e70, 0xc001223500})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed
I0915 12:50:24.267777       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
I0915 12:50:24.267813       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
E0915 12:50:30.155515       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:30 http: panic serving [fd01:0:0:1::2]:42990: runtime error: invalid memory address or nil pointer dereference

9443 port is Connection refused

$ oc -n openshift-monitoring get pod -o wide
NAME                                                     READY   STATUS    RESTARTS   AGE     IP                  NODE    NOMINATED NODE   READINESS GATES
alertmanager-main-0                                      6/6     Running   6          3d22h   fd01:0:0:1::564     sno-2   <none>           <none>
cluster-monitoring-operator-6cb777d488-nnpmx             1/1     Running   4          7d16h   fd01:0:0:1::12      sno-2   <none>           <none>
kube-state-metrics-dc5f769bc-p97m7                       3/3     Running   12         7d16h   fd01:0:0:1::3b      sno-2   <none>           <none>
monitoring-plugin-85bfb98485-d4g5x                       1/1     Running   4          7d16h   fd01:0:0:1::55      sno-2   <none>           <none>
node-exporter-ndnnj                                      2/2     Running   8          7d16h   2620:52:0:165::41   sno-2   <none>           <none>
openshift-state-metrics-78df59b4d5-j6r5s                 3/3     Running   12         7d16h   fd01:0:0:1::3a      sno-2   <none>           <none>
prometheus-adapter-6f86f7d8f5-ttflf                      1/1     Running   0          4h23m   fd01:0:0:1::b10c    sno-2   <none>           <none>
prometheus-k8s-0                                         6/6     Running   6          3d22h   fd01:0:0:1::566     sno-2   <none>           <none>
prometheus-operator-7c94855989-csts2                     2/2     Running   8          7d16h   fd01:0:0:1::39      sno-2   <none>           <none>
prometheus-operator-admission-webhook-7bb64b88cd-bvq8m   1/1     Running   4          7d16h   fd01:0:0:1::37      sno-2   <none>           <none>
thanos-querier-5bbb764599-vlztq                          6/6     Running   6          3d22h   fd01:0:0:1::56a     sno-2   <none>           <none>

$  oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::f735   <none>        9443/TCP   7d16h


$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::f735...
* TCP_NODELAY set
* connect to fd02::f735 port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7

no such issue in other 4.14.0-rc.0 ipv4 cluster, but issue reproduced on other 4.14.0-rc.0 ipv6 cluster.
4.14.0-rc.0 ipv4 cluster,

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         20m     Cluster version is 4.14.0-rc.0

$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-85bfb98485-nh428                       1/1     Running   0          4m      10.128.0.107   ci-ln-pby4bj2-72292-l5q8v-master-0   <none>           <none>

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k  'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
...
{
  "name": "monitoring-plugin",
  "version": "1.0.0",
  "displayName": "OpenShift console monitoring plugin",
  "description": "This plugin adds the monitoring UI to the OpenShift web console",
  "dependencies": {
    "@console/pluginAPI": "*"
  },
  "extensions": [
    {
      "type": "console.page/route",
      "properties": {
        "exact": true,
        "path": "/monitoring",
        "component": {
          "$codeRef": "MonitoringUI"
        }
      }
    },
...

meet issue "9443: Connection refused" in 4.14.0-rc.0 ipv6 cluster(launched cluster-bot cluster: launch 4.14.0-rc.0 metal,ipv6) and login console

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         44m     Cluster version is 4.14.0-rc.0
$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-bd6ffdb5d-b5csk                        1/1     Running   0          53m   fd01:0:0:4::b             worker-0.ostest.test.metalkube.org   <none>           <none>
monitoring-plugin-bd6ffdb5d-vhtpf                        1/1     Running   0          53m   fd01:0:0:5::9             worker-2.ostest.test.metalkube.org   <none>           <none>
$ oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::402d   <none>        9443/TCP   59m

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::402d...
* TCP_NODELAY set
* connect to fd02::402d port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7$ oc -n openshift-console get pod | grep console
console-5cffbc7964-7ljft     1/1     Running   0          56m
console-5cffbc7964-d864q     1/1     Running   0          56m$ oc -n openshift-console logs console-5cffbc7964-7ljft
...
E0916 14:34:16.330117       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::402d]:9443: connect: connection refused
2023/09/16 14:34:16 http: panic serving [fd01:0:0:4::2]:37680: runtime error: invalid memory address or nil pointer dereference
goroutine 3985 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0008f6780, 0x2?, {0xc000665211, 0x11}, {0x3a41fa0, 0xc0009221c0}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xfe00000000000010?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0009221c0?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc000d8d600?, {0x3a41fa0?, 0xc0009221c0?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0xc000653830?, {0x3a41fa0?, 0xc0009221c0?}, 0x7f824506bf18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc00007e800?, {0x3a41fa0?, 0xc0009221c0?}, 0xc000b2da00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0009221c0}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0009221c0?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc000db9b00?}, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc000653680, {0x3a43e70, 0xc000676f30})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed

Version-Release number of selected component (if applicable):

baremetal 4.14.0-rc.0 ipv6 sno cluster,
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=virt_platform'  | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "virt_platform",
          "baseboard_manufacturer": "Dell Inc.",
          "baseboard_product_name": "01J4WF",
          "bios_vendor": "Dell Inc.",
          "bios_version": "1.10.2",
          "container": "kube-rbac-proxy",
          "endpoint": "https",
          "instance": "sno-2",
          "job": "node-exporter",
          "namespace": "openshift-monitoring",
          "pod": "node-exporter-ndnnj",
          "prometheus": "openshift-monitoring/k8s",
          "service": "node-exporter",
          "system_manufacturer": "Dell Inc.",
          "system_product_name": "PowerEdge R750",
          "system_version": "Not Specified",
          "type": "none"
        },
        "value": [
          1694785092.664,
          "1"
        ]
      }
    ]
  }
}

How reproducible:

only seen on this cluster

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

no Observe menu on admin console, monitoring-plugin is failed

Expected results:

no error

https://github.com/openshift/console/pull/13166

Bug OCPBUGS-23970: vSphere CSI controller restarting in multi datacenter environments

View the Description View the linked PRs

In multiple datacenter/zonal deployments, the csi driver seems to be crashing with - https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_vsphere-problem-detector/139/pull-ci-openshift-vsphere-problem-detector-master-e2e-vsphere-zones/1728081801684979712/artifacts/e2e-vsphere-zones/gather-extra/artifacts/pods/openshift-cluster-csi-drivers_vmware-vsphere-csi-driver-controller-574c8c86db-cs8gh_csi-driver.log

with error being:

{"level":"error","time":"2023-11-24T17:16:30.532383276Z","caller":"service/driver.go:203","msg":"failed to run the driver. Err: +failed to update cache with topology information. Error: failed to get vCenterInstance for vCenter Host: \"vcs8e-vc.ocp2.dev.cluster.com\". Error: virtual center was already registered","TraceId":"da5779b6-e99a-475b-b300-350dfa441f1e","stacktrace":"..."}

Link to failing build - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_vsphere-problem-detector/139/pull-ci-openshift-vsphere-problem-detector-master-e2e-vsphere-zones/1728081801684979712

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/192

Bug OCPBUGS-25948: Set the correct kubelet wrapper selinux permissions within MCO

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25362~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4090

Bug OCPBUGS-43876: Missing runbook for the TelemeterClientFailures alerting rule

View the Description View the linked PRs

This is a clone of issue OCPBUGS-18007. The following is the description of the original issue:
—
Description of problem:

When the TelemeterClientFailures alert fires, there's no runbook link explaining the meaning of the alert and what to do about it.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Check the TelemeterClientFailures alerting rule's annotations
2.
3.

Actual results:

No runbook_url annotation.

Expected results:

runbook_url annotation is present.

Additional info:

This is a consequence of a telemeter server outage that triggered questions from customers about the alert:
https://issues.redhat.com/browse/OHSS-25947
https://issues.redhat.com/browse/OCPBUGS-17966
Also in relation to https://issues.redhat.com/browse/OCPBUGS-17797

https://github.com/openshift/cluster-monitoring-operator/pull/2509

Bug OCPBUGS-16920: [ibm-vpc-block-csi-driver] xfs volume snapshot volume mount failed of "Filesystem has duplicate UUID"

View the Description View the linked PRs

Description of problem:

[ibm-vpc-block-csi-driver] xfs volume snapshot volume mount failed of "Filesystem has duplicate UUID"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-26-132453

How reproducible:

Always

Steps to Reproduce:

1. Install an openshift cluster on ibmcloud;
2. Create a pvc with the ibm-vpc-block csi storageclass and one pod consume the pvc;
3. Write some data to the pod's volume and sync;
4. Create a volumesnapshot and wait it ReadyToUse;
5. Create a pvc restore the volumesnapshot and create one pod consume the restored pvc;

Actual results:

In step5: the volume mount failed of 
07-27 21:36:08.572    Mounting command: mount
07-27 21:36:08.572    Mounting arguments: -t xfs -o defaults /dev/disk/by-id/virtio-0787-6ec22828-ec32-4 /var/lib/kubelet/plugins/kubernetes.io/csi/vpc.block.csi.ibm.io/ecef50d905ba489935099cad29a3773220fec45334e7546951706454894073e7/globalmount
07-27 21:36:08.572    Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/vpc.block.csi.ibm.io/ecef50d905ba489935099cad29a3773220fec45334e7546951706454894073e7/globalmount: wrong fs type, bad option, bad superblock on /dev/vde, missing codepage or helper program, or other error.

Check the dmesg ->
[14530.520622] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14531.119703] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14532.229388] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14534.348809] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14538.396705] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14546.472831] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14562.523028] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14594.636819] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14658.749442] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount
[14780.863678] XFS (vde): Filesystem has duplicate UUID a758102c-fbdd-41ef-b4de-60a546bf554b - can't mount

Expected results:

In step5: the restored volume should mount successfully and the pod become Running

Additional info:

looks like a bug in the CSI driver, it mount without `-o nouuid`

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/45

Bug OCPBUGS-20105: HyperShift Operator does not guarantee that there are two nodes with labels for serving nodes

View the Description View the linked PRs

Description of problem:

The HyperShift Operator does not guarantee that two request serving nodes will be labeled with the HCP's namespace-name. It is likely that it labels the nodes initially and then doesn't notice if the nodes get deleted by something else.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create a HCP with dedicated request serving nodes
2. Delete one of the request serving nodes (via deleting the node directly or its machine)
3. Observe that the replacement node does not have the required label for scheduling its request-serving pods

Actual results:

HCP's can exist without two nodes labeled with the HCP's name, causing the kube-apiserver pods to be unschedulable

❯ k get no -lhypershift.openshift.io/cluster=ocm-staging-26ljge23ub1112ve884u0opvkj2c4lpc-perf-rhcp-0012
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-34-188.us-east-2.compute.internal   Ready    worker   9h    v1.27.6+1648878

❯ k get po -n ocm-staging-26ljge23ub1112ve884u0opvkj2c4lpc-perf-rhcp-0012 -lapp=kube-apiserver -owide   
NAME                             READY   STATUS    RESTARTS   AGE    IP             NODE                                        NOMINATED NODE   READINESS GATES
kube-apiserver-54854bcb7-v88dq   0/5     Pending   0          151m   <none>         <none>                                      <none>           <none>
kube-apiserver-54854bcb7-x5jqt   5/5     Running   0          3h2m   10.128.236.6   ip-10-0-34-188.us-east-2.compute.internal   <none>           <none>

Expected results:

Every HCP has two nodes labeled with the HCP's name

❯ k get po -n ocm-staging-26ljip0ck3d2i1bejp2sipio4okhgttn-perf-rhcp-0017 -l app=kube-apiserver -owide
NAME                            READY   STATUS    RESTARTS   AGE    IP             NODE                                        NOMINATED NODE   READINESS GATES
kube-apiserver-5f85cd4b-l57qr   5/5     Running   0          169m   10.128.218.6   ip-10-0-114-35.us-east-2.compute.internal   <none>           <none>
kube-apiserver-5f85cd4b-lqfsx   5/5     Running   0          169m   10.128.129.6   ip-10-0-59-232.us-east-2.compute.internal   <none>           <none>

❯ k get no -lhypershift.openshift.io/cluster=ocm-staging-26ljip0ck3d2i1bejp2sipio4okhgttn-perf-rhcp-0017
NAME                                        STATUS   ROLES    AGE    VERSION
ip-10-0-114-35.us-east-2.compute.internal   Ready    worker   24h    v1.27.6+1648878
ip-10-0-59-232.us-east-2.compute.internal   Ready    worker   5d2h   v1.27.6+1648878

Additional info:

https://github.com/openshift/hypershift/pull/3077

Bug OCPBUGS-25606: pinned packages in ironic-image breaks ART pipeline

View the Description View the linked PRs

because of the pin in the packages list the ART pipeline is rebuilding packages all the time
unfortunately we need to remove the strong pins and move back to relaxed ones

once that's done we need to merge https://github.com/openshift-eng/ocp-build-data/pull/4097

https://github.com/openshift/ironic-image/pull/442

Bug OCPBUGS-35832: [release-4.15] Misleading alert regarding high control plane CPU utilization in Single Node OpenShift (SNO) cluster

View the Description View the linked PRs

The monitoring system for Single Node OpenShift (SNO) cluster is triggering an alert named "HighOverallControlPlaneCPU" related to excessive control plane CPU utilization. However, this alert is misleading as it assumes a multi-node setup with high availability (HA) considerations, which do not apply to SNO deployment.

The customer is receiving MNO alerts in the SNO cluster. Below are the details:

The vDU with 2xRINLINE card is installed on the SNO node with OCP 4.14.14.
Used hardware: Airframe OE22 2U server CPU Intel(R) Xeon Intel(R) Xeon(R) Gold 6428N SPR-SP S3, (32 cores 64 threads) with 128GB memory.

After all vDU pods became running, a few minutes later the following alert was triggered:

"labels":

{ "alertname": "HighOverallControlPlaneCPU", "namespace": "openshift-kube-apiserver", "openshift_io_alert_source": "platform", "prometheus": "openshift-monitoring/k8s", "severity": "warning" }

,
"annotations": {
"description": "Given three control plane nodes, the overall CPU utilization may only be about 2/3 of all available capacity.
This is because if a single control plane node fails, the remaining two must handle the load of the cluster in order to be HA.
If the cluster is using more than 2/3 of all capacity, if one control plane node fails, the remaining two are likely to fail when they take the load.
To fix this, increase the CPU and memory on your control plane nodes.",
"runbook_url": https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-apiserver-operator/ExtremelyHighIndividualControlPlaneCPU.md,
"summary": "CPU utilization across all three control plane nodes is higher than two control plane nodes can sustain;
a single control plane node outage may cause a cascading failure; increase available CPU."

The alert description is misleading since this cluster is SNO, there is no HA in this cluster.
Increasing CPU capacity in SNO cluster is not an option.
Although the CPU usage is high, this alarm is not correct.
MNO and SNO clusters should have separate alert descriptions.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1706

Bug OCPBUGS-17041: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/556

Bug OCPBUGS-19640: [AWS EFS][HCP] should not support ARN mode installation in web console

View the Description View the linked PRs

Description of problem:

[4.14][AWS EFS][HCP] should not support ARN mode installation in web console

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-20-033502

How reproducible:

Always

Steps to Reproduce:

1. Install hypershift cluster with below mentioned details.
Flexy template: aos-4_15/ipi-on-aws/versioned-installer-ovn-hypershift-ci 2. Install AWS EFS operator from operator hub which asks for ARN value to add.

Actual results:

It asks for ARN value in web console

Expected results:

It should not ask for ARN value to add as currently not supportable.

Additional info:

Epic: https://issues.redhat.com/browse/STOR-1347 
Discussion: 
1. https://redhat-internal.slack.com/archives/C01C8502FMM/p1695208537708359
2. https://redhat-internal.slack.com/archives/GK0DA0JR5/p1695305164755109
3. https://redhat-internal.slack.com/archives/CS05TR7BK/p1695357879885239 

Attaching screen shot for the same. 
https://drive.google.com/file/d/11wjzz8-1kFDMKQ4Y2MWdJjjnfLaLmWD5/view?usp=sharing

https://github.com/openshift/console/pull/13191

Bug OCPBUGS-25990: dual-stack UPI: IPv6 security group rules created for single-stack cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25132~~. The following is the description of the original issue:
—
Description of problem:

security-groups.yaml playbook runs the IPv6 security group rules creation tasks regardless of the os_subnet6 value.
The when clause is not considering the os_subnet6 [1] value and is always executed.

It works with:

  - name: 'Create security groups for IPv6'
    block:
    - name: 'Create master-sg IPv6 rule "OpenShift API"'
    [...]
    when: os_subnet6 is defined

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-11-033133

How reproducible:

Always

Steps to Reproduce:

1. Don't set the os_subnet6 in the inventory file [2] (so it's not dual-stack)
2. Deploy 4.15 UPI by running the UPI playbooks

Actual results:

IPv6 security group rules are created

Expected results:

IPv6 security group rules shouldn't be created

Additional info:
[1] https://github.com/openshift/installer/blob/46fd66272538c350327880e1ed261b70401b406e/upi/openstack/security-groups.yaml#L375
[2] https://github.com/openshift/installer/blob/46fd66272538c350327880e1ed261b70401b406e/upi/openstack/inventory.yaml#L77

https://github.com/openshift/installer/pull/7863

Bug OCPBUGS-19180: Update 4.15 ose-ibm-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-ibm/pull/53

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-ibm/pull/53

Bug OCPBUGS-23939: Missing enabled_firmware_interfaces config

View the Description View the linked PRs

Description of problem:

    Ironic image downstream lacks the configuration option added upstream

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-image/pull/428

Task RHOBS-956: Handle change in subscription labels metric

View the Description View the linked PRs

What

Via https://gitlab.cee.redhat.com/service/uhc-account-manager/-/merge_requests/4233 OCM has renamed subscription_labels to ocm_subscription and some/many recording rules are likely to be effected for example https://github.com/openshift/telemeter/blob/8f091e8e7ecd3052566bd9dd20eb6991abf762c5/jsonnet/telemeter/rules.libsonnet#L34

How

Update the rules.

https://github.com/openshift/telemeter/pull/495

Bug OCPBUGS-19215: Update 4.15 ose-ibm-vpc-block-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/78

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/78

Bug OCPBUGS-28580: Move base image to RHEL9

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28579~~. The following is the description of the original issue:
—

Bug OCPBUGS-29038: Nondeterministic application of kubeletconfigs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26557~~. The following is the description of the original issue:
—
Description of problem:

ARO supplies a platform kubeletconfig to enable certain features, currently we use this to enable node sizing or enable autoSizingReserved. Customers want the ability to customize podPidsLimit and we have directed them to configure a second kubeletconfig.

When these kubeletconfigs are rendered into machineconfigs, the order of their application is nondeterministic: the MCs are suffixed by an increasing serial number based on the order the kubeletconfigs were created. This makes it impossible for the customer to ensure their PIDs limit is applied while still allowing ARO to maintain our platform defaults.

We need a way of supplying platform defaults while still allowing the customer to make supported modifications in a way that does not risk being reverted during upgrades or other maintenance.

This issue has manifested in two different ways:

During an upgrade from 4.11.31 to 4.12.40, a cluster had the order of kubeletconfig rendered machine configs reverse. We think that in older versions, the initial kubeletconfig did not get an mc-name-suffix annotation applied, but rendered to "99-worker-generated-kubelet" (no suffix). The customer-provided kubeletconfig rendered to the suffix "-1". During the upgrade, MCO saw this as a new kubeletconfig and assigned it the suffix "-2", effectively reversing their order. See the RCS document https://docs.google.com/document/d/19LuhieQhCGgKclerkeO1UOIdprOx367eCSuinIPaqXA

ARO wants to make updates to the platform defaults. We are changing from a kubeletconfig "aro-limits" to a kubeletconfig "dynamic-node". We want to be able to do this while still keeping it as defaults and if the customer has created their own kubeletconfig, the customer's should still take precedence. What we see is that the creation of a new kubeletconfig regardless of source overrides all other kubeletconfigs, causing the customer to lose their customization.

Version-Release number of selected component (if applicable):

4.12.40+

ARO's older kubeletconfig "aro-limits":

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  labels:
    aro.openshift.io/limits: ""
  name: aro-limits
spec:
  kubeletConfig:
    evictionHard:
      imagefs.available: 15%
      memory.available: 500Mi
      nodefs.available: 10%
      nodefs.inodesFree: 5%
    systemReserved:
      memory: 2000Mi
  machineConfigPoolSelector:
    matchLabels:
      aro.openshift.io/limits: ""

ARO's newer kubeletconfig, "dynamic-node"

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: dynamic-node
spec:
  autoSizingReserved: true
  machineConfigPoolSelector:
    matchExpressions:
    - key: machineconfiguration.openshift.io/mco-built-in
      operator: Exists

Customer's desired kubeletconfig:

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  labels:
    arogcd.arogproj.io/instance: cluster-config
  name: default-pod-pids-limit
spec:
  kubeletConfig:
    podPidsLimit: 2000000
  machineConfigPoolSelector:
    matchExpressions:
    - key: pools.operator.machineconfiguration.io/worker
      operator: Exists

https://github.com/openshift/machine-config-operator/pull/4166

Bug OCPBUGS-29345: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13599

Bug OCPBUGS-19226: Update 4.15 ose-cluster-control-plane-machine-set-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/241

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/241

Bug OCPBUGS-21720: Use centos stream to build libvirt images

View the Description View the linked PRs

Description of problem:

Use centos stream to build libvirt images

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6813

Bug OCPBUGS-30572: 4.15 HCP clusters are using default catalog sources of v4.14

View the Description View the linked PRs

Description of problem:

Hosted control plane clusters of OCP 4.15 are using default catalog sources (redhat-operators, certified-operators, community-operators and redhat-marketplace) pointing to the 4.14, thus 4.15 operators are not available and this can't be updated from within the guest.

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

100%

Steps to Reproduce:

1. check the .spec.image of the default catalog sources in openshift-marketplace namespace

Actual results:

the default catalogs are pointing to :v4.14

Expected results:

they should point to :v4.15 instead

Additional info:

https://github.com/openshift/hypershift/pull/3696

Bug OCPBUGS-3403: Console crashes when clicked on "Sort by" table header on "Resources" tab of an Operand's instance page

View the Description View the linked PRs

Description of problem:

Console crashes when clicked on "Sort by" table header on "Resources" tab of an Operand's instance page.

Version-Release number of selected component (if applicable):

4.13.0-0.ci-2022-11-07-202549

How reproducible:

100% (tested with 3 different Operands from 3 different Operators)

Steps to Reproduce:

1. Go to OperatorHub and install an Operator (e.g. Red Hat Integration - AMQ Streams)
2. After Operator is installed, create an Operand instance (e.g. Kafka)
3. Wait until Operand instance created successfully, go to instance's Details page --> Resource tab (e.g. Installed Operatorsamqstreams.v2.2.0-2Kafka details)
4. Click on any of Table Header to sort the resouece table

Actual results:

Console crashed

Expected results:

Resource table sorted accordingly.

Additional info:

I was testing this specifically with "OLM copiedCSVsDisabled" feature; however, I could still reproduce this crash after I set that feature back to `false`.  Hence, not sure if it relates to that feature.  Did cross-check with 4.12 nightly and can't reproduce this with 4.12 nightly

https://github.com/openshift/console/pull/13103

Bug OCPBUGS-43338: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/768

Bug MGMT-16052: Re-creating AgentServiceConfig after deploying VSphere platform spoke results in assisted-service crashing

View the Description View the linked PRs

Description of the problem:

After installing a VSphere platform spoke from the infrastructure operator, deleting and re-creating the agentserviceconfig results in the assisted-service pod continually crashing and being unable to recover

How reproducible:

100%

Steps to reproduce:

1. Install a spoke cluster with platformType: VSphere

2. Delete and re-create the agentserviceconfig

Actual results:

The assisted-service pod panics due to accessing a nil pointer

Expected results:

The assisted-service pod starts correctly and the vsphere cluster can continue to be managed

Workaround:
Delete all of the cluster resources related to the VSphere spoke cluster

https://github.com/openshift/assisted-service/pull/5659

Bug OCPBUGS-26544: Ingress operator should use granular roles on GCP

View the Description View the linked PRs

Description of problem

The Ingress Operator should use granular roles in its CredentialsRequest per ~~CCO-249~~. A change to use granular roles merged after the release-4.15 branch cut. This change needs to be backported for 4.15.0.

Version-Release number of selected component (if applicable)

4.15.0

How reproducible

Easily.

Steps to Reproduce

1. Launch an OCP 4.15 cluster on GCP.
2. Check the ingress operator's CredentialsRequest: oc get -n openshift-cloud-credential-operator credentialsrequests/openshift-ingress-gcp -o yaml

Actual results

The CredentialsRequest uses a predefined role:

spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: GCPProviderSpec
    predefinedRoles:
    - roles/dns.admin

Expected results

The CredentialsRequest should specify the individual permissions that the operator requires:

spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: GCPProviderSpec
    permissions:
    - dns.changes.create
    - dns.resourceRecordSets.create
    - dns.resourceRecordSets.update
    - dns.resourceRecordSets.delete
    - dns.resourceRecordSets.list

Additional info

https://github.com/openshift/cluster-ingress-operator/pull/844 merged in the master branch for 4.16 and needs to be backported to the release-4.15 branch.

https://github.com/openshift/cluster-ingress-operator/pull/1015

Bug OCPBUGS-29495: GCP: unhelpful error message when using env credentials

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28590~~. The following is the description of the original issue:
—
Description of problem:

 Facing error while creating manifests:

./openshift-install create manifests --dir openshift-config
FATAL failed to fetch Master Machines: failed to generate asset "Master Machines": failed to create master machine objects: failed to create provider: unexpected end of JSON input

Using below document :

https://docs.openshift.com/container-platform/4.14/installing/installing_gcp/installing-gcp-vpc.html#installation-gcp-config-yaml_installing-gcp-vpc

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8024

Bug OCPBUGS-43626: Load Red Hat keys in FIPS mode with Go 1.22

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43467~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35528~~. The following is the description of the original issue:
—

Description of problem

Cluster-update keys has some old Red Hat keys which are self-signed with SHA-1. The keys that we use have recently been resigned with SHA256. We don't rely on the self-signing to establish trust in the keys (that trust is established by baking a ConfigMap manifest into release images, where it can be read by the cluster-version operator), but we do need to avoid spooking the key-loading library. Currently Go-1.22-build CVOs in FIPS mode fail to bootstrap,
like this aws-ovn-fips run ~~> Artifacts~~ > install artifacts:

$ curl -s [https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-fips/1800906552731766784/artifacts/e2e-aws-ovn-fips/ipi-install-install/artifacts/log-bundle-20240612161314.tar] | tar -tvz | grep 'cluster-version.*log' -rw-r--r-- core/core 54653 2024-06-12 09:13 log-bundle-20240612161314/bootstrap/containers/cluster-version-operator-bd9f61984afa844dcd284f68006ffc9548377c045eff840096c74bcdcbe5cca3.log $ curl -s [https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-fips/1800906552731766784/artifacts/e2e-aws-ovn-fips/ipi-install-install/artifacts/log-bundle-20240612161314.tar] | tar -xOz log-bundle-20240612161314/bootstrap/containers/cluster-version-operator-bd9f61984afa844dcd284f68006ffc9548377c045eff840096c74bcdcbe5cca3.log | grep GPG I0612 16:06:15.952567 1 start.go:256] Failed to initialize from payload; shutting down: the config map openshift-config-managed/release-verification has an invalid key "verifier-public-key-redhat" that must be a GPG public key: openpgp: invalid data: tag byte does not have MSB set: openpgp: invalid data: tag byte does not have MSB set E0612 16:06:15.952600 1 start.go:309] Collected payload initialization goroutine: the config map openshift-config-managed/release-verification has an invalid key "verifier-public-key-redhat" that must be a GPG public key: openpgp: invalid data: tag byte does not have MSB set: openpgp: invalid data: tag byte does not have MSB set

That's this code attempting to call ReadArmoredKeyRing (which fails with a currently-unlogged openpgp: invalid data: user ID self-signature invalid: openpgp: invalid signature: RSA verification failure complaining about the SHA-1 signature, and then a fallback to ReadKeyRing, which fails on the reported openpgp: invalid data: tag byte does not have MSB set.

To avoid these failures, we should:

Improve the library-go function, so we get both the ReadArmoredKeyRing error and the ReadKeyRing error back on load failures.
Update our keys in cluster-update-keys to ones with SHA256 or other still-acceptable digest algorithm.
Drop verifier-public-key-redhat-release-auxiliary, which we have versioned in cluster-update-keys despite no known users ever.

Version-Release number of selected component

Only 4.17 will use Go 1.22, so that's the only release that needs patching. But the changes would be fine to backport if we wanted.

How reproducible

100%.

Steps to Reproduce

1. Build the CVO with Go 1.22
2. Launch a FIPS cluster.

Actual results

Fails to bootstrap, with the bootstrap CVO complaining, as shown in the Description of problem section.

Expected results

Successful install

https://github.com/openshift/cluster-update-keys/pull/65

Bug OCPBUGS-19444: AgentClusterInstall changes on load aren't respected

View the Description View the linked PRs

When we Load the AgentClusterInstall manifest from disk, we sometimes make changes to it.

e.g. after the fix for ~~OCPBUGS-7495~~ we rewrite any lowercase platform name to mixed case, because for a while we required lowercase even when mixed case is correct.

In 4.14, we set the userManagedNetworking to true when platform:none is used, even if the user didn't specify it in the ZTP manifests, because the controller in ZTP similarly defaults it.

However, these changes aren't taking effect, because they aren't passed through to the manifest that is included in the Agent ISO.

https://github.com/openshift/installer/pull/7506

Bug OCPBUGS-19625: Multus per node certificates: CNO integration

View the Description View the linked PRs

Description of problem: Multus should implement per node certificates via integration in the CNO

https://github.com/openshift/cluster-network-operator/pull/2009

Bug OCPBUGS-21922: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1141

Bug OCPBUGS-20213: [azure-stack-upi] worker nodes are not added into public lb backendpool

View the Description View the linked PRs

Description of problem:

Install 4.14 UPI cluster on azure stack hub, console could not be accessed outside cluster.

$ curl -L -k https://console-openshift-console.apps.jimawwt.installer.redhat.wwtatc.com -vv
*   Trying 10.255.96.76:443...
* connect to 10.255.96.76 port 443 failed: Connection timed out
* Failed to connect to console-openshift-console.apps.jimawwt.installer.redhat.wwtatc.com port 443: Connection timed out
* Closing connection 0
curl: (28) Failed to connect to console-openshift-console.apps.jimawwt.installer.redhat.wwtatc.com port 443: Connection timed out


Worker nodes are missing in public lb backend pool
$ az network lb address-pool list --lb-name jimawwt-jhvtn -g jimawwt-jhvtn-rg
[
  {
    "backendIPConfigurations": [
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/networkInterfaces/jimawwt-jhvtn-master-1-nic/ipConfigurations/pipConfig",
        "resourceGroup": "jimawwt-jhvtn-rg"
      },
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/networkInterfaces/jimawwt-jhvtn-master-0-nic/ipConfigurations/pipConfig",
        "resourceGroup": "jimawwt-jhvtn-rg"
      },
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/networkInterfaces/jimawwt-jhvtn-master-2-nic/ipConfigurations/pipConfig",
        "resourceGroup": "jimawwt-jhvtn-rg"
      }
    ],
    "etag": "W/\"7a9d24a2-ff06-4108-9aac-a277595792e3\"",
    "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/loadBalancers/jimawwt-jhvtn/backendAddressPools/jimawwt-jhvtn",
    "loadBalancingRules": [
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/loadBalancers/jimawwt-jhvtn/loadBalancingRules/api-public",
        "resourceGroup": "jimawwt-jhvtn-rg"
      },
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/loadBalancers/jimawwt-jhvtn/loadBalancingRules/a1a1c7bfe78c14a41a9149d42d698824-TCP-80",
        "resourceGroup": "jimawwt-jhvtn-rg"
      },
      {
        "id": "/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/jimawwt-jhvtn-rg/providers/Microsoft.Network/loadBalancers/jimawwt-jhvtn/loadBalancingRules/a1a1c7bfe78c14a41a9149d42d698824-TCP-443",
        "resourceGroup": "jimawwt-jhvtn-rg"
      }
    ],
    "name": "jimawwt-jhvtn",
    "provisioningState": "Succeeded",
    "resourceGroup": "jimawwt-jhvtn-rg"
  }
]

Similar bug OCPBUGS-14762 detected on Azure UPI. On installer side, we checked that public lb name and backendpool name for UPI are the same as ASH IPI.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-06-234925

How reproducible:

Always when installing Azure Stack UPI on 4.14

Steps to Reproduce:

1. Install UPI on Azure Stack Hub on 4.14
2.
3.

Actual results:

Worker nodes are missing in public lb backendpool

Expected results:

worker nodes are added into public lb backendpool and application can be accessed outside cluster

Additional info:

Issue is only detected on 4.14 azure stack hub UPI.
It works on ASH IPI and 4.13/4.12 ASH UPI.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/291

Bug OCPBUGS-31335: Compute server group policy is not honoured

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31050~~. The following is the description of the original issue:
—
Description of problem:

The install-config.yaml file lets a user set a server group policy for Control plane nodes, and one for Compute nodes, choosing from affinity, soft-affinity, anti-affinity, soft-anti-affinity. Installer will then create the server group if it doesn't exist.

The server group policy defined in install-config for Compute nodes is ignored. The worker server group always has the same policy as the Control plane's.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. openshift-install create install-config
    2. set Compute's serverGroupPolicy to soft-affinity in install-config.yaml
    3. openshift-install create cluster
    4. watch the server groups

Actual results:

both master and worker server groups have the default soft-anti-affinity policy

Expected results:

the worker server group should have soft-affinity as its policy

Additional info:

https://github.com/openshift/installer/pull/8202

Bug OCPBUGS-33575: [release-4.15] `oc mirror --config` command failed with exit status 1

View the Description View the linked PRs

Description of problem:

    While mirroring with the following command[1], it is observed that the command fails with error[2] as shown below:
~~~
[1] oc mirror --config=imageSet-config.yaml docker://<registry_url>:<Port>/<repository>
~~~

~~~
[2] error: error rebuilding catalog images from file-based catalogs: error regenerating the cache for <registry_url>:<Port>/<repository>/community-operator-index:v4.15: exit status 1
~~~

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1. Download `oc mirror` v:4.15.0 binary
    2. Create ImageSet-config.yaml
    3. Use the following command:
~~~
oc mirror --config=imageSet-config.yaml docker://<registry_url>:<Port>/<repository>
~~~
    4. Observe the mentioned error

Actual results:

    Command failed to complete with the mentioned error.

Expected results:

   ICSP and mapping.txt file should be created.

Additional info:

https://github.com/openshift/oc-mirror/pull/849

Bug OCPBUGS-33929: Tuned devices_udev_regex=^INTERFACE=(?!!usb0) double exclamation mark

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30111~~. The following is the description of the original issue:
—

What were you trying to do that didn't work?

Trying to apply the following PAO configuration [^master-profile-pao.yaml]

Please provide the package NVR for which bug is seen:

How reproducible:

100%

Steps to reproduce

Apply the provided PAO CR [^master-profile-pao.yaml]
Check the tuned status

Expected results

[labadmin@TestSrv testautomation]$ oc get profile -A
NAMESPACE                                NAME                                     TUNED                                       APPLIED   DEGRADED   AGE
openshift-cluster-node-tuning-operator   master0   openshift-node-performance-master-profile   True      False      3m21s
openshift-cluster-node-tuning-operator   master1   openshift-node-performance-master-profile   True      False      5d21h
openshift-cluster-node-tuning-operator   master2   openshift-node-performance-master-profile   True      False      5d22h
openshift-cluster-node-tuning-operator   worker0   openshift-node-performance-worker-profile   True      False      5d22h
openshift-cluster-node-tuning-operator   worker1   openshift-node-performance-worker-profile   True      False      5d22h
openshift-cluster-node-tuning-operator   worker2   openshift-node-performance-worker-profile   True      False      5d22h

Actual results

[labadmin@TestSrv pao]$ oc get tuned -n openshift-cluster-node-tuning-operator openshift-node-performance-master-profile -o yaml

[^openshift-node-performance-master-profile.yaml]

[net]
type=net
devices_udev_regex=^INTERFACE=(?!!usb0)
channels=combined 10
nf_conntrack_hashsize=131072

https://github.com/openshift/cluster-node-tuning-operator/pull/1066

Bug OCPBUGS-38561: discoverOpenIDURLs and checkOIDCPasswordGrantFlow fail if endpoints are private to the data plane

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38131~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-37753~~. The following is the description of the original issue:
—
discoverOpenIDURLs and checkOIDCPasswordGrantFlow fail if endpoints are private to the data plane.

This enabled the oauth server traffic to flow through the dataplane to enable reaching private endpoints e.g ldap https://issues.redhat.com/browse/HOSTEDCP-421

This enabled fallback to the management cluster network so for public endpoints we are not blocking on having data plane, e.g. github https://issues.redhat.com/browse/OCPBUGS-8073

This issue is to enable the CPO oidc checks to flow through the data plane and fallback to the management side to satisfy both cases above.

This woudl cover https://issues.redhat.com/browse/RFE-5638

https://github.com/openshift/hypershift/pull/4564

Story TRT-1354: Implement solution for structured intervals with a row differentiator in origin

View the Description View the linked PRs

In spyglass charts rows sometimes require an additional field added to the locator to make things appear on separate lines. (node state is a great example where we need os update, phases, and notready, all on separate lines, otherwise they would overlap and we wouldn't be able to see anything). This will also be useful for pod logs and similar.

Our goal is origin being able to add new intervals, without requiring an update to the js (which will be in sippy) to get things to display properly. We need a way to differentiate structured intervals into separate rows within the same group.

Leaning towards row/foo in the locator, as this value for each row is the locator.

https://github.com/openshift/origin/pull/28376

Bug OCPBUGS-18137: [GCP 4.14] [Azure/AWS <=4.13] Pod didn't trigger arm64 machineset scale out from 0 when a required node selector term on non-amd64 nodes is set

View the Description View the linked PRs

Description of problem:

When a workload includes a node selector term on the label kubernetes.io/arch and the allowed values do not include amd64, the auto scaler does not trigger the scale out of a valid, non-amd64, machine set if its current replicas are 0 and (for 4.14+) no architecture capacity annotation is set (ref ~~MIXEDARCH-129~~).

The issue is due to https://github.com/openshift/kubernetes-autoscaler/blob/f0ceeacfca57014d07f53211a034641d52d85cfd/cluster-autoscaler/cloudprovider/utils.go#L33

This bug should be considered at first on clusters having the same architecture for the control plane and the data plane.

In the case of multi-arch compute clusters, there is probably no alternative than letting the capacity annotation to be properly set in the machine set either manually or by the cloud provider actuator, as already discussed in the ~~MIXEDARCH-129~~ works, otherwise relying to the control plane architecture.

Version-Release number of selected component (if applicable):

- ARM64 IPI on GCP 4.14
- ARM64 IPI on Aws and Azure <=4.13
- In general, non-amd64 single-arch clusters supporting autoscale from 0

How reproducible:

Always

Steps to Reproduce:

1. Create an arm64 IPI cluster on GCP
2. Set one of the machinesets to have 0 replicas: 
    oc scale -n openshift-machine-api machineset/adistefa-a1-zn8pg-worker-f
3. Deploy the default autoscaler
4. Deploy the machine autoscaler for the given machineset
5. Deploy a workload with node affinity to arm64 only nodes, large resource requests and enough number of replicas.

Actual results:

From the pod events: 

pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector

Expected results:

The cluster autoscaler scales the machineset with 0 replicas in order to provide resources for the pending pods.

Additional info:

---
apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec: {}
---
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: worker-us-east-1a
  namespace: openshift-machine-api
spec:
  minReplicas: 0
  maxReplicas: 12
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: adistefa-a1-zn8pg-worker-f
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: openshift-machine-api
  name: 'my-deployment'
  annotations: {}
spec:
  selector:
    matchLabels:
      app: name
  replicas: 3
  template:
    metadata:
      labels:
        app: name
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/arch
                  operator: In
                  values:
                    - "arm64"
      containers:
        - name: container
          image: >-
            image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest
          ports:
            - containerPort: 8080
              protocol: TCP
          env: []
          resources:
              requests:
                cpu: "2"
      imagePullSecrets: []
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
  paused: false

Bug OCPBUGS-20519: hosted cluster upgrade failure from 4.13 stable to 4.14 nightly

View the Description View the linked PRs

This is just a placeholder bug in 4.15.
the original bug ( https://issues.redhat.com/browse/OCPBUGS-20472 ) does not exist in 4.15 release.

===

Description of problem:

prow CI job: periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-4.14-upgrade-from-stable-4.13-aws-ipi-ovn-hypershift-replace-f7 failed in the step of upgrading the HCP image of the hosted cluster.

one failed job link: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/opens[…]-hypershift-replace-f7/1712338041915314176

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

* retrigger/rehearsal the job
or 
* create a 4.13 stable hosted cluster and upgrade it to 4.14 nightly manually

Actual results:

the upgrade failed using 4.14 nightly image for `hostedcluster`

Expected results:

upgrade for hostedcluster/nodepool successfully

Additional info:

we could get dump file from the job artifacts

https://github.com/openshift/cluster-network-operator/pull/2065

Bug OCPBUGS-24330: Update 4.15 ose-csi-snapshot-validation-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/122

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/124

Bug OCPBUGS-44259: [IBMCLOUD] New VPC regions not GA'd cause failures during resource lookup

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36290~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-14963~~. The following is the description of the original issue:
—
Description of problem:

When using IPI for IBM Cloud to create a Private BYON cluster, the installer attempts to fetch the VPC resource to verify if it is already a PermittedNetwork for the DNS Services Zone.
However, currently there is a new VPC Region that is listed in IBM Cloud, eu-es, which is not yet GA'd. This means while it is listed in available VPC Regions, to search for resources, requests to eu-es fail. Any attempts to use VPC Regions alphabetically after eu-es (appears they are returned in this order), fail due to requests made to eu-es. This includes, eu-gb, us-east, and us-south, causing a golang panic.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. Create IBM Cloud BYON resources in us-east or us-south
2. Attempt to create a Private BYON based cluster in us-east or us-south

Actual results:

DEBUG   Fetching Common Manifests...               
DEBUG   Reusing previously-fetched Common Manifests 
DEBUG Generating Terraform Variables...            
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x2bdb706]

goroutine 1 [running]:
github.com/openshift/installer/pkg/asset/installconfig/ibmcloud.(*Metadata).IsVPCPermittedNetwork(0xc000e89b80, {0x1a8b9918, 0xc00007c088}, {0xc0009d8678, 0x8})
	/go/src/github.com/openshift/installer/pkg/asset/installconfig/ibmcloud/metadata.go:175 +0x186
github.com/openshift/installer/pkg/asset/cluster.(*TerraformVariables).Generate(0x1dc55040, 0x5?)
	/go/src/github.com/openshift/installer/pkg/asset/cluster/tfvars.go:606 +0x3a5a
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000ca0d80, {0x1a8ab280, 0x1dc55040}, {0x0, 0x0})
	/go/src/github.com/openshift/installer/pkg/asset/store/store.go:227 +0x5fa
github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffd948754cc?, {0x1a8ab280, 0x1dc55040}, {0x1dc32840, 0x8, 0x8})
	/go/src/github.com/openshift/installer/pkg/asset/store/store.go:77 +0x48
main.runTargetCmd.func1({0x7ffd948754cc, 0xb})
	/go/src/github.com/openshift/installer/cmd/openshift-install/create.go:261 +0x125
main.runTargetCmd.func2(0x1dc38800?, {0xc000ca0a80?, 0x3?, 0x3?})
	/go/src/github.com/openshift/installer/cmd/openshift-install/create.go:291 +0xe7
github.com/spf13/cobra.(*Command).execute(0x1dc38800, {0xc000ca0a20, 0x3, 0x3})
	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0xc000bc8000)
	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:918
main.installerMain()
	/go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0
main.main()
	/go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff

Expected results:

Successful Private cluster creation using BYON on IBM Cloud

Additional info:

IBM Cloud development has identified the issue and is working on a fix to all affected supported releases (4.12, 4.13, 4.14+)

https://github.com/openshift/installer/pull/9183

Bug OCPBUGS-18187: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7518

Vulnerability OCPBUGS-47068: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/806

Bug OCPBUGS-19092: Enable console on OCI

View the Description View the linked PRs

When creating an Agent ISO for OCI, we should add the kernel argument console=ttyS0 to the ISO/PXE kargs.

CoreOS does not include a console arg by default when using metal as the platform because different hardware has different consoles and specifying one can cause booting to fail on some, but it does on many cloud platforms. Since we know when the user is definitely using OCI (there are validations in assisted that ensure it) and we know the correct settings for OCI, we should set them up automatically.

https://github.com/openshift/installer/pull/7511

Bug OCPBUGS-34727: [4.15] Live migration pre-migration validation

View the Description View the linked PRs

The SDN live migration can not work properly in a cluster with specific configurations. CNO shall refuse proceeding the live migration in such a case. We need to add the pre-migration validation to CNO

The live migration shall be blocked for clusters with the following configuration

OpenShiftSDN multitenat mode.
Egress Router
cluster network or service network ranges conflict with the OVN-K internal subnets

https://github.com/openshift/cluster-network-operator/pull/2393

Bug OCPBUGS-35255: Azure CPMS periodics are failing due to non-retryable API errors

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35227~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35069~~. The following is the description of the original issue:
—
Description of problem:

Reviewing https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=operator-conditions&component=Cloud%20Compute%20%2F%20Other%20Provider&confidence=95&environment=ovn%20no-upgrade%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=azure&platform=azure&sampleEndTime=2024-06-05%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-05-30%2000%3A00%3A00&testId=Operator%20results%3A6d9ee55972f66121016367d07d52f0a9&testName=operator%20conditions%20control-plane-machine-set&upgrade=no-upgrade&upgrade=no-upgrade&variant=standard&variant=standard, it appears that the Azure tests are failing frequently with "Told to stop trying". Check failed before until passed.

Reviewing this, it appears that the rollout happened as expected, but the until function got a non-retryable error and exited, while the check saw that the Deletion timestamp was set and the Machine went into Running, which caused it to fail.

We should investigate why the until failed in this case as it should have seen the same machines and therefore should have seen a Running machine and passed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/301

Bug OCPBUGS-38377: snyk: google.golang.org/grpc/metadata [4.15]

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38376~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-38375~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37782. The following is the description of the original issue:
—
Description of problem:

    ci/prow/security is failing on google.golang.org/grpc/metadata

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

always

Steps to Reproduce:

    1. run ci/pro/security job on 4.15 pr
    2.
    3.

Actual results:

    Medium severity vulnerability found in google.golang.org/grpc/metadata

Expected results:

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/749

Bug OCPBUGS-42726: oc adm prune deployments` does not work and giving panic when using --replica-set option

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42720~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42164~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42143. The following is the description of the original issue:
—
Description of problem:

    There is another panic occurred in https://issues.redhat.com/browse/OCPBUGS-34877?focusedId=25580631&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-25580631 which should be fixed

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1893

Bug OCPBUGS-23300: Internal NLB issue (OCPBUGS-9026) causes random failures on HCP private cluster without infra nodes

View the Description View the linked PRs

Description of problem:

Actually the issue is same root cause of https://issues.redhat.com/browse/OCPBUGS-9026 but I'd like to open new one since the issue becomes very critical after ROSA using NLB as default since 4.14, HCP(HyperShift) private cluster that without infra nodes is the serious victim because it has worker nodes only and no available workaround for it now.

But if we think we could use the old bug to track the issue, then please close this one.

Version-Release number of selected component (if applicable):

4.14.1
HyperShift Private cluster

How reproducible:

100%

Steps to Reproduce:

1. create ROSA HCP(HyperShift) cluster
2. run qe-e2e-test on this cluster, or curl route from one pod inside the cluster
3.

Actual results:

1. co/console status is flapping since route is intermittently accessible 
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.1    True        False         4h56m   Error while reconciling 4.14.1: the cluster operator console is not available


2. check node and router pods running on both worker nodes
$ oc get node
NAME                          STATUS   ROLES    AGE    VERSION
ip-10-0-49-184.ec2.internal   Ready    worker   5h5m   v1.27.6+f67aeb3
ip-10-0-63-210.ec2.internal   Ready    worker   5h8m   v1.27.6+f67aeb3

$ oc -n openshift-ingress get pod -owide
NAME                              READY   STATUS    RESTARTS   AGE    IP           NODE                          NOMINATED NODE   READINESS GATES
router-default-86d569bf84-bq66f   1/1     Running   0          5h8m   10.130.0.7   ip-10-0-49-184.ec2.internal   <none>           <none>
router-default-86d569bf84-v54hp   1/1     Running   0          5h8m   10.128.0.9   ip-10-0-63-210.ec2.internal   <none>           <none>

3. check ingresscontroller LB setting, it uses Internal NLB

spec:
  endpointPublishingStrategy:
    loadBalancer:
      dnsManagementPolicy: Managed
      providerParameters:
        aws:
          networkLoadBalancer: {}
          type: NLB
        type: AWS
      scope: Internal
    type: LoadBalancerService

4. continue to curl the route from a pod inside the cluster
$ oc rsh console-operator-86786df488-w6fks
Defaulted container "console-operator" out of: console-operator, conversion-webhook-server

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
HTTP/1.1 200 OK

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
Connection timed out

Expected results:

1. co/console should be stable, curl console route should be always OK.
2. qe-e2e-test should not fail

Additional info:

qe-e2e-test on the cluster:

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/45369/rehearse-45369-periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-stable-aws-rosa-sts-hypershift-sec-guest-prod-private-link-full-f2/1724307074235502592

https://github.com/openshift/console-operator/pull/815

Bug OCPBUGS-27380: hypershift needs different default APIs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27366~~. The following is the description of the original issue:
—
To support external OIDC on hypershift, but not on self-managed, we need different schemas for the authentication CRD on a default-hypershift versus a default-self-managed. This requires us to change rendering so that it honors the clusterprofile.

Then we have to update the installer to match, then update hypershift, then update the manifests.

Bug OCPBUGS-23149: Wrong IP for deploying IPv6 BMCs

View the Description View the linked PRs

The final iteration (of 3) of the fix for ~~OCPBUGS-4248~~ - https://github.com/openshift/cluster-baremetal-operator/pull/341 - uses the (IPv6) API VIP as the IP address for IPv6 BMCs to contact Apache to download the image to mount via virtualmedia.

Since Apache runs as part of the metal3 Deployment, it exists on only one node. There is no guarantee that the API VIP will land (or stay) on the same node, so this fails to work more often than not. Kube-proxy does not do anything to redirect traffic to pods with host networking enabled, such as the metal3 Deployment.

The IPv6 is passed to the baremetal-operator. This has been split into its own Deployment since the first iteration of ~~OCPBUGS-4228~~, in which we collected the IP address of the host from the deployed metal3 Pod. At the time that caused a circular dependency of the Deployment on its own Pod, but this would no longer be the case. However, a backport beyond 4.14 would require the Deployment split to also be backported.

Alternatively, ironic-proxy could be adapted to also proxy the images produced by ironic. This would be new functionality that would also need to be backported.

Finally, we could determine the host IP from inside the baremetal-operator container instead of from cluster-baremetal-operator. However, this approach has not been tried and would only work in backports because it relies on baremetal-operator continuing to run within same Pod as ironic.

https://github.com/openshift/cluster-baremetal-operator/pull/380

Bug OCPBUGS-32793: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13792

Bug OCPBUGS-18549: Significant 12 minute pod-to-host disruption detected on aws ovn minor upgrades

View the Description View the linked PRs

DISCLAIMER: The code for measuring disruption in-cluster is extremely new, we cannot be 100% confident what we're seeing is real, however the below bug is demonstrating a problem that is occurring in a very specific configuration, all others are unaffected, so this helps us gain some confidence what we're seeing is real.

https://grafana-loki.ci.openshift.org/d/ISnBj4LVk/disruption?orgId=1&var-platform=aws&var-percentile=P50&var-backend=pod-to-host-new-connections&var-releases=4.14&var-upgrade_type=minor&var-networks=sdn&var-networks=ovn&var-topologies=ha&var-architectures=amd64&var-min_job_runs=10&var-lookback=1&var-master_node_updated=Y&from=now-7d&to=now

affects pod-to-host-new-connections
affects aws minor upgrades are seeing over 14000s of disruption for the P50
does not affect pod-to-host-reused-connections
does not affect any other clouds
does not affect micro upgrades
does not affect pod-to-service or pod-to-pod backends
does not affect sdn

The total disruption comes from a number of pods which are added together, the actual duration of the disruption is roughly / 14. The actual disruption appears to be about 12 minutes and hits all pods doing pod-to-host monitoring simultaneously.

Sample job: (taken from expanding the "Most Recent Runs" panel in grafana)

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade/1698740856976052224

In the first spyglass chart for upgrade you can see the batch of disruption: 7:28:19 - 7:40:03

We do not have data prior to ovn interconnect landing, so we cannot say if this started at that time or not.

https://github.com/openshift/ovn-kubernetes/pull/1907

Bug OCPBUGS-18640: Cluster fails to install at day-0 with PerformanceProfile

View the Description View the linked PRs

Description of problem:

Picked up 4.14-ec-4 (which uses cgroups v1 as default) and trying to create a cluster with following PerformanceProfile (and corresponding mcp) by placing them in the manifests folder,

 
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: clusterbotpp
spec:
  cpu:
    isolated: "1-3"
    reserved: "0"
  realTimeKernel:
    enabled: false
  nodeSelector:
    node-role.kubernetes.io/worker: ""
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/worker: ""

and,

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker 
spec:
  machineConfigSelector:
    matchLabels:
      machineconfiguration.openshift.io/role: worker
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: ""

The cluster often fails to install because bootkube spends a lot of time chasing this error,

 
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Created "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Failed to update status for the "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 11f98d74-af1b-4a4c-9692-6dce56ee5cd9, UID in object meta:
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: [#1717] failed to create some manifests:
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: "clusterbotpp_kubeletconfig.yaml": failed to update status for kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 11f98d74-af1b-4a4c-9692-6dce56ee5cd9, UID in object meta:
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Created "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Failed to update status for the "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 597dfcf3-012d-4730-912a-78efabb920ba, UID in object meta:

This leads to worker nodes not getting ready in time, which leads to installer marking the cluster installation failed. Ironically, even after the cluster installer returns with failure, if you wait long enough (sometimes) I have observed the cluster eventually reconciles and the worker nodes get provisioned.

I am attaching the installation logs from one such run with this issue.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Often

Steps to Reproduce:

1. Try to install new cluster by placing PeformanceProfile in the manifests folder
2.
3.

Actual results:

Cluster installation failed.

Expected results:

Cluster installation should succeed.

Additional info:

Also, I didn't observe this occurring in 4.13.9.

https://github.com/openshift/cluster-node-tuning-operator/pull/854

Bug OCPBUGS-42180: Adding more tested arm instances to Tested instance types for AWS on 64-bit ARM infrastructures

View the Description View the linked PRs

This is a clone of issue OCPBUGS-41929. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41896. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41776. The following is the description of the original issue:
—
Description of problem:

the section is: https://docs.openshift.com/container-platform/4.16/installing/installing_aws/ipi/installing-aws-vpc.html#installation-aws-arm-tested-machine-types_installing-aws-vpc  

all tesed arm instances for 4.14+:
c6g.*
c7g.*
m6g.*
m7g.*
r8g.*

we need to ensure all sections include "Tested instance types for AWS on 64-bit ARM infrastructures" section been updated for 4.14+

Additional info:

https://github.com/openshift/installer/pull/9040

Bug OCPBUGS-25808: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-gcp/pull/55

Bug OCPBUGS-35552: Bump to kubernetes 1.28.11

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.11:

Changelog:
v1.28.11: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v12810

https://github.com/openshift/kubernetes/pull/1994

Bug OCPBUGS-26542: cleanup cluster-config-operator image

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26541~~. The following is the description of the original issue:
—
Description of problem:

    manifests are duplicated with cluster-config-api image

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-config-operator/pull/395

Bug OCPBUGS-29299: [4.15] OCP 4.13.30 - allow-from-ingress NetworkPolicy does not consistently allow traffic from HostNetworked pods or from node IP's (packet timeout)

View the Description View the linked PRs

Description of problem:

- Observed that after upgrade to 4.13.30 (from 4.13.24) On all nodes/projects (replicated on two clusters that underwent the same upgrade) - traffic routed from HostNetworked pods (router-default) calling to backends intermittently timeout/fail to reach their destination.

This manifests as the router pods marking backends as DOWN and dropping traffic; but The behavior can be replicated with curl outside of the HAProxy pods via entering a debug shell to a host node (or SSH) and curling the pod IP directly. A significant percentage of packets time out to the target backend on intermittent subsequent calls.
We narrowed the behavior down to the moment we applied the NetworkPolicy for `allow-from-ingress` as outlined below - immediately the namespace began to drop packets on a curl loop running from an infra node directly against the pod IP (some 2-3% of all calls timed out).

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
  metadata:
    name: allow-from-openshift-ingress
    namespace: testing
spec:
  ingress:
    - from:
      - namespaceSelector:
          matchLabels:
             policy-group.network.openshift.io/ingress: ""
    podSelector: {}
    policyTypes:
    - Ingress

Version-Release number of selected component (if applicable):

How reproducible:

every time, all namespaces with this network policy on this clusterversion (replicated on two clusters that underwent the same upgrade).

Steps to Reproduce:

1. Upgrade cluster to 4.13.30

2. Apply test pod running basic HTTP instance at random port

3. Apply networkpolicy to allow-from-ingress and begin curl loop against target pod directly from ingressnode (or other worker node) at host chroot level (nodeIP).

4. Observe that curls time out intermittently --> replicator curl loop is below (note inclusion of --connect-timeout flag to help allow loop to continue more rapidly without waiting for full 2m connect timeout on typical syn failure).

$ while true; do curl --connect-timeout 5 --noproxy '*' -k -w "dnslookup: %{time_namelookup} | connect: %{time_connect} | appconnect: %{time_appconnect} | pretransfer: %{time_pretransfer} | starttransfer: %{time_starttransfer} | total: %{time_total} | size: %{size_download} | response: %{response_code}\n" -o /dev/null -s https://<POD>:<PORT>; done

Actual results:

- Traffic to all backends is dropped/degraded as a result of this intermittent failure marking valid/healthy pods as unavailable due to the connection failure to the backends.

Expected results:

- traffic should not be iimpeded, especially when the application of the networkpolicy to allow said traffic is implemented.

Additional info:

This behavior began immediately after completed upgrade from 4.13.24 to 4.13.30 and has been replicated on two separate clusters.
Customer has been forced to reinstall a cluster at downgraded version to ensure stability/deliverables for their user-base and this is a critical impact outage scenario for them

additional required template details in first comment below.

RCA UPDATE:
So the problem is that host-network namespace is not labeled by ingress controller and if router pods are hostNetworked, network policy with `policy-group.network.openshift.io/ingress: ""` selector won't allow incoming connections. To reproduce, we need to run ingress controller with `EndpointPublishingStrategy=HostNetwork` https://docs.openshift.com/container-platform/4.14/networking/nw-ingress-controller-endpoint-publishing-strategies.html and then check host-network namespace labels with

oc get ns openshift-host-network --show-labels
# expected this
kubernetes.io/metadata.name=openshift-host-network,network.openshift.io/policy-group=ingress,policy-group.network.openshift.io/host-network=,policy-group.network.openshift.io/ingress=

# but before the fix you will see 
kubernetes.io/metadata.name=openshift-host-network,policy-group.network.openshift.io/host-network=

Another way to verify this is the same problem (disruptive, only recommended for test environments) is to make CNO unmanaged

oc scale deployment cluster-version-operator -n openshift-cluster-version --replicas=0
oc scale deployment network-operator -n openshift-network-operator --replicas=0

and then label openshift-host-network namespace manually based on expected labels ^ and see if the problem disappears

Potentially affected versions (may need to reproduce to confirm)

4.16.0, 4.15.0, 4.14.0 since https://issues.redhat.com//browse/OCPBUGS-8070

4.13.30 https://issues.redhat.com/browse/OCPBUGS-22293

4.12.48 https://issues.redhat.com/browse/OCPBUGS-24039

Mitigation/support KCS:
https://access.redhat.com/solutions/7055050

https://github.com/openshift/cluster-network-operator/pull/2265

Bug OCPBUGS-42169: Bump to kubernetes 1.28.14

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.14: Changelog: v1.28.14: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v12813

https://github.com/openshift/kubernetes/pull/2091

Bug OCPBUGS-23309: oc-mirror should failed but not panic when falied to band port

View the Description View the linked PRs

Description of problem:

When use oc-mirror try to band port failed will panic

Version-Release number of selected component (if applicable):

./oc-mirror version 
Logging to .oc-mirror.log
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.15.0-202311101707.p0.g1c8f538.assembly.stream-1c8f538", GitCommit:"1c8f538897c88011c51ab53ea5073547521f0676", GitTreeState:"clean", BuildDate:"2023-11-10T18:49:00Z", GoVersion:"go1.20.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

run command : oc-mirror --from file://out docker://localhost:5000/ocptest --v2 --config config.yaml --dest-tls-verify=false

Actual results:
oc-mirror --from file://out docker://localhost:5000/ocptest --v2 --config config.yaml --dest-tls-verify=false
--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used.
2023/11/15 13:04:47 [INFO] : mode diskToMirror
2023/11/15 13:04:47 [INFO] : local storage registry will log to /app1/1106/logs/registry.log
2023/11/15 13:04:47 [INFO] : starting local storage on :5000
panic: listen tcp :5000: bind: address already in use

goroutine 67 [running]:
github.com/openshift/oc-mirror/v2/pkg/cli.panicOnRegistryError(0x0?)
/go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:298 +0x4e
created by github.com/openshift/oc-mirror/v2/pkg/cli.(*ExecutorSchema).PrepareStorageAndLogs
/go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:286 +0x945

Expected results:

Should exit with error but not panic

https://github.com/openshift/oc-mirror/pull/744

Bug OCPBUGS-30601: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-29442~~.

https://github.com/openshift/installer/pull/8122

Bug OCPBUGS-17676: Pod Logs in OpenShift Web Console do not maintain white-space

View the Description View the linked PRs

Description of problem:

When multiple consecutive spaces are present in Pod logs, the spaces are collapsed and white-space is not retained when reviewing logs via the OpenShift Web Console. The white-space is retained when reviewing via the 'raw' output and via the `oc logs` command but the white-space is collapsed when reviewing via the `logs` panel in the OpenShift Web Console. 

This mangles the output of tables in the logs.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Everytime

Steps to Reproduce:

1. Create a Pod which outputs a table in the logs
2. Review the output table in the Pod logs via the OpenShift Web Console

Actual results:

The spaces in the table are collapsed

Expected results:

The table formatting should be maintained

Additional info:

- During testing, I have added the `white-space:pre` styling for the log lines and this has resolved the white space issues. The styling of the logs do not appear to styled to retain the white-space formatting
- Tested on OCP 4.10.53 and 4.13.4 and both have the issue

https://github.com/openshift/console/pull/13101

Bug OCPBUGS-23083: Cluster Network Operator needs additional RBAC permission to deploy network-node-identity when Calico is the network type

View the Description View the linked PRs

Description of problem:

When the network type is Calico for a hosted cluster, the rbac policies that are laid down for CNO do not include permissions to deploy network-node-identity

Version-Release number of selected component (if applicable):

How reproducible: IBM Satellite environment

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3172

Bug OCPBUGS-36012: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-node-label-updater/pull/41

Bug OCPBUGS-17380: IPSec enablement is broken on OVNK

View the Description View the linked PRs

Description of problem:

Enable IPSec pre/post install on OVN IC cluster

$ oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'
network.operator.openshift.io/cluster patched


ovn-ipsec containers complaining:

ovs-monitor-ipsec | ERR | Failed to import certificate into NSS.
b'certutil:  unable to open "/etc/openvswitch/keys/ipsec-cacert.pem" for reading (-5950, 2).\n'



$ oc rsh ovn-ipsec-d7rx9
Defaulted container "ovn-ipsec" out of: ovn-ipsec, ovn-keys (init)
sh-5.1# certutil -L -d /var/lib/ipsec/nss Certificate Nickname                                         Trust Attributes
                                                             SSL,S/MIME,JAR/XPIovs_certkey_db961f9a-7de4-4f1d-a2fb-a8306d4079c5             u,u,u 

sh-5.1# cat /var/log/openvswitch/libreswan.log
Aug  4 15:12:46.808394: Initializing NSS using read-write database "sql:/var/lib/ipsec/nss"
Aug  4 15:12:46.837350: FIPS Mode: NO
Aug  4 15:12:46.837370: NSS crypto library initialized
Aug  4 15:12:46.837387: FIPS mode disabled for pluto daemon
Aug  4 15:12:46.837390: FIPS HMAC integrity support [disabled]
Aug  4 15:12:46.837541: libcap-ng support [enabled]
Aug  4 15:12:46.837550: Linux audit support [enabled]
Aug  4 15:12:46.837576: Linux audit activated
Aug  4 15:12:46.837580: Starting Pluto (Libreswan Version 4.9 IKEv2 IKEv1 XFRM XFRMI esp-hw-offload FORK PTHREAD_SETSCHEDPRIO GCC_EXCEPTIONS NSS (IPsec profile) (NSS-KDF) DNSSEC SYSTEMD_WATCHDOG LABELED_IPSEC (SELINUX) SECCOMP LIBCAP_NG LINUX_AUDIT AUTH_PAM NETWORKMANAGER CURL(non-NSS) LDAP(non-NSS)) pid:147
Aug  4 15:12:46.837583: core dump dir: /run/pluto
Aug  4 15:12:46.837585: secrets file: /etc/ipsec.secrets
Aug  4 15:12:46.837587: leak-detective enabled
Aug  4 15:12:46.837589: NSS crypto [enabled]
Aug  4 15:12:46.837591: XAUTH PAM support [enabled]
Aug  4 15:12:46.837604: initializing libevent in pthreads mode: headers: 2.1.12-stable (2010c00); library: 2.1.12-stable (2010c00)
Aug  4 15:12:46.837664: NAT-Traversal support  [enabled]
Aug  4 15:12:46.837803: Encryption algorithms:
Aug  4 15:12:46.837814:   AES_CCM_16         {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_ccm, aes_ccm_c
Aug  4 15:12:46.837820:   AES_CCM_12         {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_ccm_b
Aug  4 15:12:46.837826:   AES_CCM_8          {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_ccm_a
Aug  4 15:12:46.837831:   3DES_CBC           [*192]         IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS NSS(CBC)     3des
Aug  4 15:12:46.837837:   CAMELLIA_CTR       {256,192,*128} IKEv1:     ESP     IKEv2:     ESP                      
Aug  4 15:12:46.837843:   CAMELLIA_CBC       {256,192,*128} IKEv1: IKE ESP     IKEv2: IKE ESP          NSS(CBC)     camellia
Aug  4 15:12:46.837849:   AES_GCM_16         {256,192,*128} IKEv1:     ESP     IKEv2: IKE ESP     FIPS NSS(GCM)     aes_gcm, aes_gcm_c
Aug  4 15:12:46.837855:   AES_GCM_12         {256,192,*128} IKEv1:     ESP     IKEv2: IKE ESP     FIPS NSS(GCM)     aes_gcm_b
Aug  4 15:12:46.837861:   AES_GCM_8          {256,192,*128} IKEv1:     ESP     IKEv2: IKE ESP     FIPS NSS(GCM)     aes_gcm_a
Aug  4 15:12:46.837867:   AES_CTR            {256,192,*128} IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS NSS(CTR)     aesctr
Aug  4 15:12:46.837872:   AES_CBC            {256,192,*128} IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS NSS(CBC)     aes
Aug  4 15:12:46.837878:   NULL_AUTH_AES_GMAC {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_gmac
Aug  4 15:12:46.837883:   NULL               []             IKEv1:     ESP     IKEv2:     ESP                      
Aug  4 15:12:46.837889:   CHACHA20_POLY1305  [*256]         IKEv1:             IKEv2: IKE ESP          NSS(AEAD)    chacha20poly1305
Aug  4 15:12:46.837892: Hash algorithms:
Aug  4 15:12:46.837896:   MD5                               IKEv1: IKE         IKEv2:                  NSS         
Aug  4 15:12:46.837901:   SHA1                              IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha
Aug  4 15:12:46.837906:   SHA2_256                          IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha2, sha256
Aug  4 15:12:46.837910:   SHA2_384                          IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha384
Aug  4 15:12:46.837915:   SHA2_512                          IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha512
Aug  4 15:12:46.837919:   IDENTITY                          IKEv1:             IKEv2:             FIPS             
Aug  4 15:12:46.837922: PRF algorithms:
Aug  4 15:12:46.837927:   HMAC_MD5                          IKEv1: IKE         IKEv2: IKE              native(HMAC) md5
Aug  4 15:12:46.837931:   HMAC_SHA1                         IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha, sha1
Aug  4 15:12:46.837936:   HMAC_SHA2_256                     IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha2, sha256, sha2_256
Aug  4 15:12:46.837950:   HMAC_SHA2_384                     IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha384, sha2_384
Aug  4 15:12:46.837955:   HMAC_SHA2_512                     IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha512, sha2_512
Aug  4 15:12:46.837959:   AES_XCBC                          IKEv1:             IKEv2: IKE              native(XCBC) aes128_xcbc
Aug  4 15:12:46.837962: Integrity algorithms:
Aug  4 15:12:46.837966:   HMAC_MD5_96                       IKEv1: IKE ESP AH  IKEv2: IKE ESP AH       native(HMAC) md5, hmac_md5
Aug  4 15:12:46.837984:   HMAC_SHA1_96                      IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha, sha1, sha1_96, hmac_sha1
Aug  4 15:12:46.837995:   HMAC_SHA2_512_256                 IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha512, sha2_512, sha2_512_256, hmac_sha2_512
Aug  4 15:12:46.837999:   HMAC_SHA2_384_192                 IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha384, sha2_384, sha2_384_192, hmac_sha2_384
Aug  4 15:12:46.838005:   HMAC_SHA2_256_128                 IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha2, sha256, sha2_256, sha2_256_128, hmac_sha2_256
Aug  4 15:12:46.838008:   HMAC_SHA2_256_TRUNCBUG            IKEv1:     ESP AH  IKEv2:         AH                   
Aug  4 15:12:46.838014:   AES_XCBC_96                       IKEv1:     ESP AH  IKEv2: IKE ESP AH       native(XCBC) aes_xcbc, aes128_xcbc, aes128_xcbc_96
Aug  4 15:12:46.838018:   AES_CMAC_96                       IKEv1:     ESP AH  IKEv2:     ESP AH  FIPS              aes_cmac
Aug  4 15:12:46.838023:   NONE                              IKEv1:     ESP     IKEv2: IKE ESP     FIPS              null
Aug  4 15:12:46.838026: DH algorithms:
Aug  4 15:12:46.838031:   NONE                              IKEv1:             IKEv2: IKE ESP AH  FIPS NSS(MODP)    null, dh0
Aug  4 15:12:46.838035:   MODP1536                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH       NSS(MODP)    dh5
Aug  4 15:12:46.838039:   MODP2048                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh14
Aug  4 15:12:46.838044:   MODP3072                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh15
Aug  4 15:12:46.838048:   MODP4096                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh16
Aug  4 15:12:46.838053:   MODP6144                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh17
Aug  4 15:12:46.838057:   MODP8192                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh18
Aug  4 15:12:46.838061:   DH19                              IKEv1: IKE         IKEv2: IKE ESP AH  FIPS NSS(ECP)     ecp_256, ecp256
Aug  4 15:12:46.838066:   DH20                              IKEv1: IKE         IKEv2: IKE ESP AH  FIPS NSS(ECP)     ecp_384, ecp384
Aug  4 15:12:46.838070:   DH21                              IKEv1: IKE         IKEv2: IKE ESP AH  FIPS NSS(ECP)     ecp_521, ecp521
Aug  4 15:12:46.838074:   DH31                              IKEv1: IKE         IKEv2: IKE ESP AH       NSS(ECP)     curve25519
Aug  4 15:12:46.838077: IPCOMP algorithms:
Aug  4 15:12:46.838081:   DEFLATE                           IKEv1:     ESP AH  IKEv2:     ESP AH  FIPS             
Aug  4 15:12:46.838085:   LZS                               IKEv1:             IKEv2:     ESP AH  FIPS             
Aug  4 15:12:46.838089:   LZJH                              IKEv1:             IKEv2:     ESP AH  FIPS             
Aug  4 15:12:46.838093: testing CAMELLIA_CBC:
Aug  4 15:12:46.838096:   Camellia: 16 bytes with 128-bit key
Aug  4 15:12:46.838162:   Camellia: 16 bytes with 128-bit key
Aug  4 15:12:46.838201:   Camellia: 16 bytes with 256-bit key
Aug  4 15:12:46.838243:   Camellia: 16 bytes with 256-bit key
Aug  4 15:12:46.838280: testing AES_GCM_16:
Aug  4 15:12:46.838284:   empty string
Aug  4 15:12:46.838319:   one block
Aug  4 15:12:46.838352:   two blocks
Aug  4 15:12:46.838385:   two blocks with associated data
Aug  4 15:12:46.838424: testing AES_CTR:
Aug  4 15:12:46.838428:   Encrypting 16 octets using AES-CTR with 128-bit key
Aug  4 15:12:46.838464:   Encrypting 32 octets using AES-CTR with 128-bit key
Aug  4 15:12:46.838502:   Encrypting 36 octets using AES-CTR with 128-bit key
Aug  4 15:12:46.838541:   Encrypting 16 octets using AES-CTR with 192-bit key
Aug  4 15:12:46.838576:   Encrypting 32 octets using AES-CTR with 192-bit key
Aug  4 15:12:46.838613:   Encrypting 36 octets using AES-CTR with 192-bit key
Aug  4 15:12:46.838651:   Encrypting 16 octets using AES-CTR with 256-bit key
Aug  4 15:12:46.838687:   Encrypting 32 octets using AES-CTR with 256-bit key
Aug  4 15:12:46.838724:   Encrypting 36 octets using AES-CTR with 256-bit key
Aug  4 15:12:46.838763: testing AES_CBC:
Aug  4 15:12:46.838766:   Encrypting 16 bytes (1 block) using AES-CBC with 128-bit key
Aug  4 15:12:46.838801:   Encrypting 32 bytes (2 blocks) using AES-CBC with 128-bit key
Aug  4 15:12:46.838841:   Encrypting 48 bytes (3 blocks) using AES-CBC with 128-bit key
Aug  4 15:12:46.838881:   Encrypting 64 bytes (4 blocks) using AES-CBC with 128-bit key
Aug  4 15:12:46.838928: testing AES_XCBC:
Aug  4 15:12:46.838932:   RFC 3566 Test Case 1: AES-XCBC-MAC-96 with 0-byte input
Aug  4 15:12:46.839126:   RFC 3566 Test Case 2: AES-XCBC-MAC-96 with 3-byte input
Aug  4 15:12:46.839291:   RFC 3566 Test Case 3: AES-XCBC-MAC-96 with 16-byte input
Aug  4 15:12:46.839444:   RFC 3566 Test Case 4: AES-XCBC-MAC-96 with 20-byte input
Aug  4 15:12:46.839600:   RFC 3566 Test Case 5: AES-XCBC-MAC-96 with 32-byte input
Aug  4 15:12:46.839756:   RFC 3566 Test Case 6: AES-XCBC-MAC-96 with 34-byte input
Aug  4 15:12:46.839937:   RFC 3566 Test Case 7: AES-XCBC-MAC-96 with 1000-byte input
Aug  4 15:12:46.840373:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 16)
Aug  4 15:12:46.840529:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 10)
Aug  4 15:12:46.840698:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 18)
Aug  4 15:12:46.840990: testing HMAC_MD5:
Aug  4 15:12:46.840997:   RFC 2104: MD5_HMAC test 1
Aug  4 15:12:46.841200:   RFC 2104: MD5_HMAC test 2
Aug  4 15:12:46.841390:   RFC 2104: MD5_HMAC test 3
Aug  4 15:12:46.841582: testing HMAC_SHA1:
Aug  4 15:12:46.841585:   CAVP: IKEv2 key derivation with HMAC-SHA1
Aug  4 15:12:46.842055: 8 CPU cores online
Aug  4 15:12:46.842062: starting up 7 helper threads
Aug  4 15:12:46.842128: started thread for helper 0
Aug  4 15:12:46.842174: helper(1) seccomp security disabled for crypto helper 1
Aug  4 15:12:46.842188: started thread for helper 1
Aug  4 15:12:46.842219: helper(2) seccomp security disabled for crypto helper 2
Aug  4 15:12:46.842236: started thread for helper 2
Aug  4 15:12:46.842258: helper(3) seccomp security disabled for crypto helper 3
Aug  4 15:12:46.842269: started thread for helper 3
Aug  4 15:12:46.842296: helper(4) seccomp security disabled for crypto helper 4
Aug  4 15:12:46.842311: started thread for helper 4
Aug  4 15:12:46.842323: helper(5) seccomp security disabled for crypto helper 5
Aug  4 15:12:46.842346: started thread for helper 5
Aug  4 15:12:46.842369: helper(6) seccomp security disabled for crypto helper 6
Aug  4 15:12:46.842376: started thread for helper 6
Aug  4 15:12:46.842390: using Linux xfrm kernel support code on #1 SMP PREEMPT_DYNAMIC Thu Jul 20 09:11:28 EDT 2023
Aug  4 15:12:46.842393: helper(7) seccomp security disabled for crypto helper 7
Aug  4 15:12:46.842707: selinux support is NOT enabled.
Aug  4 15:12:46.842728: systemd watchdog not enabled - not sending watchdog keepalives
Aug  4 15:12:46.843813: seccomp security disabled
Aug  4 15:12:46.848083: listening for IKE messages
Aug  4 15:12:46.848252: Kernel supports NIC esp-hw-offload
Aug  4 15:12:46.848534: adding UDP interface ovn-k8s-mp0 10.129.0.2:500
Aug  4 15:12:46.848624: adding UDP interface ovn-k8s-mp0 10.129.0.2:4500
Aug  4 15:12:46.848654: adding UDP interface br-ex 169.254.169.2:500
Aug  4 15:12:46.848681: adding UDP interface br-ex 169.254.169.2:4500
Aug  4 15:12:46.848713: adding UDP interface br-ex 10.0.0.8:500
Aug  4 15:12:46.848740: adding UDP interface br-ex 10.0.0.8:4500
Aug  4 15:12:46.848767: adding UDP interface lo 127.0.0.1:500
Aug  4 15:12:46.848793: adding UDP interface lo 127.0.0.1:4500
Aug  4 15:12:46.848824: adding UDP interface lo [::1]:500
Aug  4 15:12:46.848853: adding UDP interface lo [::1]:4500
Aug  4 15:12:46.851160: loading secrets from "/etc/ipsec.secrets"
Aug  4 15:12:46.851214: no secrets filename matched "/etc/ipsec.d/*.secrets"
Aug  4 15:12:47.053369: loading secrets from "/etc/ipsec.secrets"

sh-4.4# tcpdump -i any esp
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes^C
0 packets capturedsh-5.1# ovn-nbctl --no-leader-only get nb_global . ipsec
false

Version-Release number of selected component (if applicable):

openshift/cluster-network-operator#1874

How reproducible:

Always

Steps to Reproduce:

1.Install OVN cluster and enable IPSec in runtime
2.
3.

Actual results:

no esp packets seen across the nodes

Expected results:

esp traffic should be seen across the nodes

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1996

Bug OCPBUGS-18852: Update 4.15 atomic-openshift-cluster-autoscaler image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/260

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/260

Bug OCPBUGS-27012: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13501

Bug OCPBUGS-28848: EgressIP cannot be applied to egress node(rhcos) on clusters with Windows nodes existing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23430~~. The following is the description of the original issue:
—
Description of problem:

On a hybrid cluster with Windows nodes and coreOS nodes mixed, egressIP cannot be applied to coreOS anymore. 
QE testing profile: 53_IPI on AWS & OVN & WindowsContainer

Version-Release number of selected component (if applicable):

4.14.3

How reproducible:

Always

Steps to Reproduce:

1.  Setup cluster with template aos-4_14/ipi-on-aws/versioned-installer-ovn-winc-ci
2.  Label on coreOS node as egress node 
% oc describe node ip-10-0-59-132.us-east-2.compute.internal
Name:               ip-10-0-59-132.us-east-2.compute.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m6i.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-2
                    failure-domain.beta.kubernetes.io/zone=us-east-2b
                    k8s.ovn.org/egress-assignable=
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-59-132.us-east-2.compute.internal
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=m6i.xlarge
                    node.openshift.io/os_id=rhcos
                    topology.ebs.csi.aws.com/zone=us-east-2b
                    topology.kubernetes.io/region=us-east-2
                    topology.kubernetes.io/zone=us-east-2b
Annotations:        cloud.network.openshift.io/egress-ipconfig:
                      [{"interface":"eni-0c661bbdbb0dde54a","ifaddr":{"ipv4":"10.0.32.0/19"},"capacity":{"ipv4":14,"ipv6":15}}]
                    csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0629862832fff4ae3"}
                    k8s.ovn.org/host-cidrs: ["10.0.59.132/19"]
                    k8s.ovn.org/hybrid-overlay-distributed-router-gateway-ip: 10.129.2.13
                    k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 0a:58:0a:81:02:0d
                    k8s.ovn.org/l3-gateway-config:
                      {"default":{"mode":"shared","interface-id":"br-ex_ip-10-0-59-132.us-east-2.compute.internal","mac-address":"06:06:e2:7b:9c:45","ip-address...
                    k8s.ovn.org/network-ids: {"default":"0"}
                    k8s.ovn.org/node-chassis-id: fa1ac464-5744-40e9-96ca-6cdc74ffa9be
                    k8s.ovn.org/node-gateway-router-lrp-ifaddr: {"ipv4":"100.64.0.7/16"}
                    k8s.ovn.org/node-id: 7
                    k8s.ovn.org/node-mgmt-port-mac-address: a6:25:4e:55:55:36
                    k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.59.132/19"}
                    k8s.ovn.org/node-subnets: {"default":["10.129.2.0/23"]}
                    k8s.ovn.org/node-transit-switch-port-ifaddr: {"ipv4":"100.88.0.7/16"}
                    k8s.ovn.org/remote-zone-migrated: ip-10-0-59-132.us-east-2.compute.internal
                    k8s.ovn.org/zone-name: ip-10-0-59-132.us-east-2.compute.internal
                    machine.openshift.io/machine: openshift-machine-api/wduan-debug-1120-vtxkp-worker-us-east-2b-z6wlc
                    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-5a29871efb344f7e3a3dc51c42c21113
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-5a29871efb344f7e3a3dc51c42c21113
                    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-5a29871efb344f7e3a3dc51c42c21113
                    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-5a29871efb344f7e3a3dc51c42c21113
                    machineconfiguration.openshift.io/lastSyncedControllerConfigResourceVersion: 22806
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 20 Nov 2023 09:46:53 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-59-132.us-east-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Mon, 20 Nov 2023 14:01:05 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Mon, 20 Nov 2023 13:57:33 +0800   Mon, 20 Nov 2023 09:46:53 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Mon, 20 Nov 2023 13:57:33 +0800   Mon, 20 Nov 2023 09:46:53 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Mon, 20 Nov 2023 13:57:33 +0800   Mon, 20 Nov 2023 09:46:53 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Mon, 20 Nov 2023 13:57:33 +0800   Mon, 20 Nov 2023 09:47:34 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.59.132
  InternalDNS:  ip-10-0-59-132.us-east-2.compute.internal
  Hostname:     ip-10-0-59-132.us-east-2.compute.internal
Capacity:
  cpu:                4
  ephemeral-storage:  125238252Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16092956Ki
  pods:               250
Allocatable:
  cpu:                3500m
  ephemeral-storage:  114345831029
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             14941980Ki
  pods:               250
System Info:
  Machine ID:                             ec21151a2a80230ce1e1926b4f8a902c
  System UUID:                            ec21151a-2a80-230c-e1e1-926b4f8a902c
  Boot ID:                                cf4b2e39-05ad-4aea-8e53-be669b212c4f
  Kernel Version:                         5.14.0-284.41.1.el9_2.x86_64
  OS Image:                               Red Hat Enterprise Linux CoreOS 414.92.202311150705-0 (Plow)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.27.1-13.1.rhaos4.14.git956c5f7.el9
  Kubelet Version:                        v1.27.6+b49f9d1
  Kube-Proxy Version:                     v1.27.6+b49f9d1
ProviderID:                               aws:///us-east-2b/i-0629862832fff4ae3
Non-terminated Pods:                      (21 in total)
  Namespace                               Name                                                      CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                               ----                                                      ------------  ----------  ---------------  -------------  ---
  openshift-cluster-csi-drivers           aws-ebs-csi-driver-node-tlw5h                             30m (0%)      0 (0%)      150Mi (1%)       0 (0%)         4h14m
  openshift-cluster-node-tuning-operator  tuned-4fvgv                                               10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         4h14m
  openshift-dns                           dns-default-z89zl                                         60m (1%)      0 (0%)      110Mi (0%)       0 (0%)         11m
  openshift-dns                           node-resolver-v9stn                                       5m (0%)       0 (0%)      21Mi (0%)        0 (0%)         4h14m
  openshift-image-registry                image-registry-67b88dc677-76hfn                           100m (2%)     0 (0%)      256Mi (1%)       0 (0%)         4h14m
  openshift-image-registry                node-ca-hw62n                                             10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         4h14m
  openshift-ingress-canary                ingress-canary-9r9f8                                      10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         4h13m
  openshift-ingress                       router-default-5957f4f4c6-tl9gs                           100m (2%)     0 (0%)      256Mi (1%)       0 (0%)         4h18m
  openshift-machine-config-operator       machine-config-daemon-h7fx4                               40m (1%)      0 (0%)      100Mi (0%)       0 (0%)         4h14m
  openshift-monitoring                    alertmanager-main-1                                       9m (0%)       0 (0%)      120Mi (0%)       0 (0%)         4h12m
  openshift-monitoring                    monitoring-plugin-68995cb674-w2wr9                        10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         4h13m
  openshift-monitoring                    node-exporter-kbq8z                                       9m (0%)       0 (0%)      47Mi (0%)        0 (0%)         4h13m
  openshift-monitoring                    prometheus-adapter-54fc7b9c87-sg4vt                       1m (0%)       0 (0%)      40Mi (0%)        0 (0%)         4h13m
  openshift-monitoring                    prometheus-k8s-1                                          75m (2%)      0 (0%)      1104Mi (7%)      0 (0%)         4h12m
  openshift-monitoring                    prometheus-operator-admission-webhook-84b7fffcdc-x8hsz    5m (0%)       0 (0%)      30Mi (0%)        0 (0%)         4h18m
  openshift-monitoring                    thanos-querier-59cbd86d58-cjkxt                           15m (0%)      0 (0%)      92Mi (0%)        0 (0%)         4h13m
  openshift-multus                        multus-7gjnt                                              10m (0%)      0 (0%)      65Mi (0%)        0 (0%)         4h14m
  openshift-multus                        multus-additional-cni-plugins-gn7x9                       10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         4h14m
  openshift-multus                        network-metrics-daemon-88tf6                              20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         4h14m
  openshift-network-diagnostics           network-check-target-kpv5v                                10m (0%)      0 (0%)      15Mi (0%)        0 (0%)         4h14m
  openshift-ovn-kubernetes                ovnkube-node-74nl9                                        80m (2%)      0 (0%)      1630Mi (11%)     0 (0%)         3h51m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                619m (17%)    0 (0%)
  memory             4296Mi (29%)  0 (0%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:              <none>

 % oc get node -l k8s.ovn.org/egress-assignable=             
NAME                                        STATUS   ROLES    AGE     VERSION
ip-10-0-59-132.us-east-2.compute.internal   Ready    worker   4h14m   v1.27.6+b49f9d1
3.  Create egressIP object

Actual results:

% oc get egressip        
NAME         EGRESSIPS     ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   10.0.59.101        

% oc get cloudprivateipconfig
No resources found

Expected results:

The egressIP should be applied to egress node

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2040

Bug OCPBUGS-32246: Cannot re-use ipv6 LBs in dualstack clusters

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29605~~. The following is the description of the original issue:
—
On ipv6-primary dualstack, it is observed that the test:

"[sig-installer][Suite:openshift/openstack][lb][Serial] The Openstack platform should re-use an existing UDP Amphora LoadBalancer when new svc is created on Openshift with the proper annotation"

fails, because CCM is considering it as "internal":

I0216 10:13:07.053922       1 loadbalancer.go:2113] "EnsureLoadBalancer" cluster="kubernetes" service="e2e-test-openstack-sprfn/udp-lb-shared2-svc"
E0216 10:13:07.124915       1 controller.go:298] error processing service e2e-test-openstack-sprfn/udp-lb-shared2-svc (retrying with exponential backoff): failed to ensure load balancer: internal Service cannot share a load balancer
I0216 10:13:07.125445       1 event.go:307] "Event occurred" object="e2e-test-openstack-sprfn/udp-lb-shared2-svc" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: internal Service cannot share a load balancer"

However, both LBs do not have the below annotation:

"service.beta.kubernetes.io/openstack-internal-load-balancer": "true"

Versions:
4.15.0-0.nightly-2024-02-14-052317
RHOS-16.2-RHEL-8-20230510.n.1

https://github.com/openshift/cloud-provider-openstack/pull/277

Bug OCPBUGS-34539: virtual hosted-style doesn't work since 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34166~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-32710~~. The following is the description of the original issue:
—
Description of problem:

    When enabled virtualHostedStyle with regionEndpoint set in config.image/cluster , image registry failed to be running. errors throw:

time="2024-04-22T14:14:31.057192227Z" level=error msg="s3aws: RequestError: send request failed\ncaused by: Get \"https://s3-fips.us-west-1.amazonaws.com/ci-ln-67zbmzk-76ef8-4n6wb-image-registry-us-west-1-xjyfbabyboc?list-type=2&max-keys=1&prefix=\": dial tcp: lookup s3-fips.us-west-1.amazonaws.com on 172.30.0.10:53: no such host" go.version="go1.20.12 X:strictfipsruntime"

Version-Release number of selected component (if applicable):

    4.14.18

How reproducible:

    always

Steps to Reproduce:

    1.
$ oc get config.imageregistry/cluster -ojsonpath="{.status.storage}"|jq 
{
  "managementState": "Managed",
  "s3": {
    "bucket": "ci-ln-67zbmzk-76ef8-4n6wb-image-registry-us-west-1-xjyfbabyboc",
    "encrypt": true,
    "region": "us-west-1",
    "regionEndpoint": "https://s3-fips.us-west-1.amazonaws.com",
    "trustedCA": {
      "name": ""
    },
    "virtualHostedStyle": true
  }
}     
    2. Check registry pod
$ oc get co image-registry
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.15.5    True        True          True       79m     Degraded: Registry deployment has timed out progressing: ReplicaSet "image-registry-b6c58998d" has timed out progressing

Actual results:

$ oc get pods image-registry-b6c58998d-m8pnb -oyaml| yq '.spec.containers[0].env'
- name: REGISTRY_STORAGE_S3_REGIONENDPOINT
  value: https://s3-fips.us-west-1.amazonaws.com
[...]
- name: REGISTRY_STORAGE_S3_VIRTUALHOSTEDSTYLE
  value: "true"
[...]

$ oc logs image-registry-b6c58998d-m8pnb
[...]
time="2024-04-22T14:14:31.057192227Z" level=error msg="s3aws: RequestError: send request failed\ncaused by: Get \"https://s3-fips.us-west-1.amazonaws.com/ci-ln-67zbmzk-76ef8-4n6wb-image-registry-us-west-1-xjyfbabyboc?list-type=2&max-keys=1&prefix=\": dial tcp: lookup s3-fips.us-west-1.amazonaws.com on 172.30.0.10:53: no such host" go.version="go1.20.12 X:strictfipsruntime"

Expected results:

    virtual hosted-style should work

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1048

Bug OCPBUGS-19246: Update 4.15 ose-csi-driver-shared-resource-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource-operator/pull/84

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource-operator/pull/84

Bug OCPBUGS-19370: lack of hypershift labels for hcp components ovn,cloud-network-config,multus-admission controllers

View the Description View the linked PRs

Description of problem:

For hcp resources:
  "cloud-network-config-controller"
  "multus-admission-controller"
  "ovnkube-control-plane"

no `hypershift.openshift.io/hosted-control-plane:{hostedcluster resource namespace}-{cluster-name}` found in the above hcp resources

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. create a hosted cluster 
2. check the labels of those resources
e.g. `$ oc get pod multus-admission-controller-7c677c745c-l4dbc  -oyaml` to check the labels of it.

Or refer testcase: ocp-44988

Actual results:

no expected label found

Expected results:

the pods have the label:
`hypershift.openshift.io/hosted-control-plane:{hostedcluster resource namespace}-{cluster-name}`

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2048

Bug OCPBUGS-23102: Metal jobs failing due to inability to reach thanos

View the Description View the linked PRs

All metal jobs failed a bunch of tests with errors about looking up thanos DNS record.

Example job
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-ovn-ipv6/1722507545064509440

{ fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:106]: Failed to fetch alerting rules: unable to query https://thanos-querier-openshift-monitoring.apps.ostest.test.metalkube.org/api/v1/rules: Get "https://thanos-querier-openshift-monitoring.apps.ostest.test.metalkube.org/api/v1/rules": dial tcp: lookup thanos-querier-openshift-monitoring.apps.ostest.test.metalkube.org on 172.30.0.10:53: no such host: %!w(<nil>) Ginkgo exit error 1: exit with code 1}

[sig-instrumentation][Late] OpenShift alerting rules [apigroup:image.openshift.io] should link to an HTTP(S) location if the runbook_url annotation is defined [Suite:openshift/conformance/parallel]

https://github.com/openshift/origin/pull/28389

Bug OCPBUGS-25395: [4.15] namespace port group is cleaned up on restart

View the Description View the linked PRs

Description of problem:

The problem was that namespace handler on initial sync would delete all ports (because logical port cache where it got lsp UUIDs wasn't populated) and all acls (they were just set to nil). Even though both ports and acls will be re-added by the corresponding handlers, it may cause disruption.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. create a namespace with at least 1 pod and egress firewall in it

2. pick any ovnkube-node pod, find namespace port group UUID in nbdb by external_ids["name"]=<namespace name>, e.g. for "test" namespace

_uuid               : 6142932d-4084-4bc3-bdcb-1990fc71891b
acls                : [ab2be619-1266-41c2-bb1d-1052cb4e1e97, b90a4b4a-ceee-41ee-a801-08c37a9bf3e7, d314fa8d-7b5a-40a5-b3d4-31091d7b9eae]
external_ids        : {name=test}
name                : a18007334074686647077
ports               : [55b700e4-8176-42e7-97a6-8b32a82fefe5, cb71739c-ad6c-4436-8fd6-0643a5417c7d, d8644bf1-6bed-4db7-abf8-7aaab0625324]

3. restart chosen ovn-k pod

4. check logs on restart that update chosen port group to have zero ports and zero acls

Update operations generated as: [{Op:update Table:Port_Group Row:map[acls:{GoSet:[]} external_ids:{GoMap:map[name:test]} ports:{GoSet:[]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {6142932d-4084-4bc3-bdcb-1990fc71891b}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUID: UUIDName:}]

Actual results:

Expected results:

On restart port group stays the same, no extra update with empty ports and acls is generated

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/1998

Bug OCPBUGS-19080: SNO failed upgrade (4.13-> 4.14) because console operator is not available

View the Description View the linked PRs

Description of problem:

Attempted upgrade of 3480 SNOs that were deployed from 4.13.11 to 4.14.0-rc.0 and 15 SNOs ended up stuck in partial upgrade because the cluster console operator was not available

# cat 4.14.0-rc.0-partial.console | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers"
vm00255 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00320 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00327 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00405 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00705 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01224 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01310 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01320 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01928 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02052 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02588 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02704 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: wait has exceeded 40 minutes for these operators: console
vm02835 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm03110 version   4.13.11   True   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm03322 version   4.13.11   True   True   15h   Unable to apply 4.14.0-rc.0: wait has exceeded 40 minutes for these operators: console

Version-Release number of selected component (if applicable):

SNO OCP (managed clusters being upgraded) 4.13.11 upgraded to 4.14.0-rc.0
Hub OCP 4.13.12
ACM - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

15 out of 3489 SNos being upgraded however represented 15 out of the 41 partial upgrade failures group (~36% of the failures)

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/796

Bug OCPBUGS-25802: olm-operator pod always restart due to "detected that every object is labelled, exiting to re-start the process..." when upgrading OCP to 4.15 from 4.14.6

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25448~~. The following is the description of the original issue:
—
Description of problem:

When upgrading OCP 4.14.6 to 4.15.0-0.nightly-2023-12-13-032512, olm-operator pod always restarts, which blocks the cluster upgrading.

MacBook-Pro:~ jianzhang$ omg get clusterversion 
2023-12-15 16:24:34.977 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version  4.14.6   True       True         4h47m  Working towards 4.15.0-0.nightly-2023-12-13-032512: 701 of 873 done (80% complete), waiting on operator-lifecycle-manager

MacBook-Pro:~ jianzhang$ omg get pods 
2023-12-15 16:47:36.383 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
NAME                                     READY  STATUS     RESTARTS  AGE
catalog-operator-564b666f96-6nmq8        1/1    Running    1         1h59m
collect-profiles-28375140-n9f2p          0/1    Succeeded  0         42m
collect-profiles-28375155-sf2qj          0/1    Succeeded  0         27m
collect-profiles-28375170-xkbxf          0/1    Succeeded  0         12m
olm-operator-6bfd5f76bc-xb5lk            0/1    Running    27        1h59m
package-server-manager-5b7969559f-68nn7  2/2    Running    0         1h59m
packageserver-5ffcb95bff-fvvpx           1/1    Running    0         1h58m
packageserver-5ffcb95bff-hgvxt           1/1    Running    0         1h58m

MacBook-Pro:~ jianzhang$ omg logs olm-operator-6bfd5f76bc-xb5lk --previous
2023-12-15 16:23:02.300 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
2023-12-13T23:38:05.452697228Z time="2023-12-13T23:38:05Z" level=info msg="log level info"
2023-12-13T23:38:05.452950096Z time="2023-12-13T23:38:05Z" level=info msg="TLS keys set, using https for metrics"
2023-12-13T23:38:05.515929950Z time="2023-12-13T23:38:05Z" level=info msg="found nonconforming items" gvr="rbac.authorization.k8s.io/v1, Resource=rolebindings" nonconforming=1
2023-12-13T23:38:05.588194624Z time="2023-12-13T23:38:05Z" level=info msg="found nonconforming items" gvr="/v1, Resource=services" nonconforming=1
2023-12-13T23:38:06.116654658Z time="2023-12-13T23:38:06Z" level=info msg="detected ability to filter informers" canFilter=false
2023-12-13T23:38:06.118496116Z time="2023-12-13T23:38:06Z" level=info msg="registering labeller" gvr="apps/v1, Resource=deployments" index=0
...
...
2023-12-13T23:38:06.381370939Z time="2023-12-13T23:38:06Z" level=info msg="labeller complete" gvr="rbac.authorization.k8s.io/v1, Resource=clusterrolebindings" index=0
2023-12-13T23:38:06.381424190Z time="2023-12-13T23:38:06Z" level=info msg="starting clusteroperator monitor loop" monitor=clusteroperator
2023-12-13T23:38:06.381467749Z time="2023-12-13T23:38:06Z" level=info msg="detected that every object is labelled, exiting to re-start the process..."

Version-Release number of selected component (if applicable):

MacBook-Pro:~ jianzhang$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2023-12-13-032512 |grep olm 
  operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         b4d2b70c34e9654afe30cf724f1dc85a1ce5c683
  operator-registry                              https://github.com/openshift/operator-framework-olm                         b4d2b70c34e9654afe30cf724f1dc85a1ce5c683

How reproducible:

 always

Steps to Reproduce:

1, rerun this prow job: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-ibmcloud-ipi-f28/

Actual results:

    Cluster failed to upgrade due to olm pods crash.

Expected results:

    Cluster upgraded successfully.

Additional info:

Must gather log in https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-ibmcloud-ipi-f28/1734995337258471424/artifacts/ibmcloud-ipi-f28/gather-must-gather/artifacts/

https://github.com/openshift/operator-framework-olm/pull/643

Bug OCPBUGS-30017: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4227

Bug OCPBUGS-17654: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1665

Bug OCPBUGS-24399: Remove unwanted list style bullets from dropdown menus

View the Description View the linked PRs

Description of problem:

After the PF5 upgrade, older components using PF4 dropdown menus had list style bullets appear for unordered lists

Version-Release number of selected component (if applicable):

How reproducible:

Metrics Plugin still uses PF4 components and styling

Additional info:

PatterFly removes list-style bullets or numbers from the <ul>/<ol> elements by default and then adds them where needed. 

The OCP console chose to override this because of the amount of <ul>/<ol> elements in our codebase that expect the default bullet or numbers to be present.

Bug screenshots
https://drive.google.com/drive/folders/1rP6Ls1R2GJoTArHg0oild5SWIWvNaMUv

https://github.com/openshift/console/pull/13406

Bug OCPBUGS-30304: cert-syncer is forcibly changing secret type without retaining content

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30119~~. The following is the description of the original issue:
—
Description of problem:

`ensureSigningCertKeyPair` and `ensureTargetCertKeyPair` are always updating secret type. if the secret requires metadata update, its previous content will not be retained

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Install 4.6 cluster (or make sure installer-generated secrets have `type: SecretTypeTLS` instead of `type: kubernetes.io/tls`
    2. Run secret sync
    3. Check secret contents

Actual results:

    Secret was regenerated with new content

Expected results:

Existing content should be preserved, content is not modified

Additional info:

    This causes api-int CA update for clusters born in 4.6 or earlier.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1653

Bug OCPBUGS-18188: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13111

Bug OCPBUGS-27772: [4.15] Ironic inspection fails due to unexpected LLDP packet: Unexpected exception UnicodeDecodeError during processing: 'utf-8' codec can't decode byte 0xf7 in position 13: invalid start byte

View the Description View the linked PRs

We have a lab system that suddenly stopped being able to be deployed by the Assisted Installer; the BMH object is stuck in the "inspecting" state and has an event as follows:

Unexpected exception UnicodeDecodeError during processing: 'utf-8' codec can't decode byte 0xf7 in position 13: invalid start byte

Further investigation shows this is due to an error in processing an LLDP packet:

 2023-11-22 18:27:04.675 1 ERROR ironic_inspector.process ESC[00m
2023-11-22 18:27:04.685 1 DEBUG ironic_inspector.node_cache [-] [node: f38df2c7-3a19-4f38-a3b7-385b2a971f53 state error] Executing fsm(error).process_event(error) fsm_event /usr/lib/python3.9/site-packages/ironic_inspector/node_cache.py:200ESC[00m
2023-11-22 18:27:04.686 1 DEBUG ironic_inspector.node_cache [-] [node: f38df2c7-3a19-4f38-a3b7-385b2a971f53 state error] Committing fields: {'finished_at': datetime.datetime(2023, 11, 22, 18, 27, 4, 681856), 'error': "Unexpected exception UnicodeDecodeError during processing: 'utf-8' codec can't decode byte 0xf7 in position 13: invalid start byte"} _commit /usr/lib/python3.9/site-packages/ironic_inspector/node_cache.py:142ESC[00m
2023-11-22 18:27:04.689 1 INFO ironic_inspector.process [-] [node: f38df2c7-3a19-4f38-a3b7-385b2a971f53 state error BMC 10.16.230.10] Ramdisk logs were stored in file f38df2c7-3a19-4f38-a3b7-385b2a971f53_20231122-182704.688370.tar.gzESC[00m
2023-11-22 18:27:04.689 1 ERROR ironic_inspector.utils [-] [node: f38df2c7-3a19-4f38-a3b7-385b2a971f53 state error BMC 10.16.230.10] Unexpected exception UnicodeDecodeError during processing: 'utf-8' codec can't decode byte 0xf7 in position 13: invalid start byte: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 13: invalid start byteESC[00m
2023-11-22 18:27:04.689 1 DEBUG ironic_inspector.main [None req-e587f885-ce93-45c0-86d2-0012f5bd4431 - - - - - -] Returning error to client: Unexpected exception UnicodeDecodeError during processing: 'utf-8' codec can't decode byte 0xf7 in position 13: invalid start byte error_response /usr/lib/python3.9/site-packages/ironic_inspector/main.py:139ESC[00m

Reported upstream as: https://bugs.launchpad.net/ironic/+bug/2044793

https://github.com/openshift/ironic-image/pull/450

Story MGMT-15860: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/629

Bug OCPBUGS-18788: kube-apiserver bound to port 60000 prevented metal3-baremetal-operator from starting

View the Description View the linked PRs

Description of problem:

metal3-baremetal-operator-7ccb58f44b-xlnnd pod failed to start on the SNO baremetal dualstack cluster:

Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               34m                    default-scheduler  Successfully assigned openshift-machine-api/metal3-baremetal-operator-7ccb58f44b-xlnnd to sno.ecoresno.lab.eng.tlv2.redha
t.com
  Warning  FailedScheduling        34m                    default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are availabl
e: 1 node(s) didn't have free ports for the requested pod ports..
  Warning  FailedCreatePodSandBox  34m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to add hostport mapping for sandbox k8s_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0(c4a8b353e3ec105d2bff2eb1670b82a0f226ac1088b739a256deb9dfae6ebe54): cannot open hostport 60000 for pod k8s
_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0_: listen tcp4 :60000: bind: address already in use
  Warning  FailedCreatePodSandBox  34m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to add hostport mapping for sandbox k8s_metal3-bare
metal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0(9e6960899533109b02fbb569c53d7deffd1ac8185cef3d8677254f9ccf9387ff): cannot open hostport 60000 for pod k8s
_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0_: listen tcp4 :60000: bind: address already in use

Version-Release number of selected component (if applicable):

4.14.0-rc.0

How reproducible:

so far once

Steps to Reproduce:

1. Deploy disconnected baremetal SNO node with dualstack networking with agent-based installer
2.
3.

Actual results:

metal3-baremetal-operator pod fails to start

Expected results:

metal3-baremetal-operator pod is running

Additional info:

Checking the pots on node showed it was `kube-apiserver` process bound to the port:

tcp   ESTAB      0      0                                                [::1]:60000                        [::1]:2379    users:(("kube-apiserver",pid=43687,fd=455))


After rebooting the node all pods started as expected

https://github.com/openshift/cluster-baremetal-operator/pull/361

Bug OCPBUGS-19224: Update 4.15 ose-csi-external-resizer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-resizer/pull/144

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-resizer/pull/144

Bug OCPBUGS-24131: Update 4.15 csi-driver-nfs-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-nfs/pull/135

Bug OCPBUGS-24121: Update 4.15 operator-registry-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/621

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/621

Bug OCPBUGS-33711: Bump to kubernetes 1.28.10

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.10:

Changelog:
v1.28.10: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1289

https://github.com/openshift/kubernetes/pull/1969

Bug OCPBUGS-36563: PrometheusOperatorRejectedResources should link its runbook

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36482~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-36406~~. The following is the description of the original issue:
—

Description of problem

Seen in a 4.15.19 cluster, the PrometheusOperatorRejectedResources alert was firing, but did not link a runbook, despite the runbook existing since ~~MON-2358~~.

Version-Release number of selected component

Seen in 4.15.19, but likely applies to all versions where the PrometheusOperatorRejectedResources alert exists.

How reproducible

Every time.

Steps to Reproduce:

Check the cluster console at /monitoring/alertrules?rowFilter-alerting-rule-source=platform&name=PrometheusOperatorRejectedResources, and click through to the alert definition.

Actual results

No mention of runbooks.

Expected results

A Runbook section linking the runbook.

Additional info

I haven't dug into the upstream/downstream sync process, but the runbook information likely needs to at least show up here, although that may or may not be the root location for injecting our canonical runbook into the upstream-sourced alert.

https://github.com/openshift/cluster-monitoring-operator/pull/2406

Bug OCPBUGS-17391: Rollout of ovnk pods is taking more time

View the Description View the linked PRs

the pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration job started failing recently when the
ovnkube-master daemonset would not finish rolling out after 360s.

taking the must gather to debug which happens a few minutes after the test
failure you can see that the daemonset is still not ready, so I believe that
increasing the timeout is not the answer.

some debug info:

➜ static-kas git:(master) oc --kubeconfig=/tmp/kk get daemonsets -A 
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
openshift-cluster-csi-drivers aws-ebs-csi-driver-node 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-cluster-node-tuning-operator tuned 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-dns dns-default 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-dns node-resolver 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-image-registry node-ca 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-ingress-canary ingress-canary 3 3 3 3 3 kubernetes.io/os=linux 8h
openshift-machine-api machine-api-termination-handler 0 0 0 0 0 kubernetes.io/os=linux,machine.openshift.io/interruptible-instance= 8h
openshift-machine-config-operator machine-config-daemon 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-machine-config-operator machine-config-server 3 3 3 3 3 node-role.kubernetes.io/master= 8h
openshift-monitoring node-exporter 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-multus multus 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-multus multus-additional-cni-plugins 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-multus network-metrics-daemon 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-network-diagnostics network-check-target 6 6 6 6 6 beta.kubernetes.io/os=linux 9h
openshift-ovn-kubernetes ovnkube-master 3 3 2 2 2 beta.kubernetes.io/os=linux,node-role.kubernetes.io/master= 9h
openshift-ovn-kubernetes ovnkube-node 6 6 6 6 6 beta.kubernetes.io/os=linux 9h
Name: ovnkube-master
Selector: app=ovnkube-master
Node-Selector: beta.kubernetes.io/os=linux,node-role.kubernetes.io/master=
Labels: networkoperator.openshift.io/generates-operator-status=stand-alone
Annotations: deprecated.daemonset.template.generation: 3
kubernetes.io/description: This daemonset launches the ovn-kubernetes controller (master) networking components.
networkoperator.openshift.io/cluster-network-cidr: 10.128.0.0/14
networkoperator.openshift.io/hybrid-overlay-status: disabled
networkoperator.openshift.io/ip-family-mode: single-stack
release.openshift.io/version: 4.14.0-0.ci.test-2023-08-04-123014-ci-op-c6fp05f4-latest
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=ovnkube-master
component=network
kubernetes.io/os=linux
openshift.io/component=network
ovn-db-pod=true
type=infra
Annotations: networkoperator.openshift.io/cluster-network-cidr: 10.128.0.0/14
networkoperator.openshift.io/hybrid-overlay-status: disabled
networkoperator.openshift.io/ip-family-mode: single-stack
target.workload.openshift.io/management:
{"effect": "PreferredDuringScheduling"}
Service Account: ovn-kubernetes-controller

it seems there is one pod that is not coming up all the way and that pod has
two containers not ready (sbdb and nbdb). logs from those containers below:

➜ static-kas git:(master) oc --kubeconfig=/tmp/kk describe pod ovnkube-master-7qlm5 -n openshift-ovn-kubernetes | rg '^ [a-z].*:|Ready'
northd:
Ready: True
nbdb:
Ready: False
kube-rbac-proxy:
Ready: True
sbdb:
Ready: False
ovnkube-master:
Ready: True
ovn-dbchecker:
Ready: True
➜ static-kas git:(master) oc --kubeconfig=/tmp/kk logs ovnkube-master-7qlm5 -n openshift-ovn-kubernetes -c sbdb
2023-08-04T13:08:49.127480354Z + [[ -f /env/_master ]]
2023-08-04T13:08:49.127562165Z + trap quit TERM INT
2023-08-04T13:08:49.127609496Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2023-08-04T13:08:49.127637926Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2023-08-04T13:08:49.127637926Z + transport=ssl
2023-08-04T13:08:49.127645167Z + ovn_raft_conn_ip_url_suffix=
2023-08-04T13:08:49.127682687Z + [[ 10.0.42.108 == \: ]]
2023-08-04T13:08:49.127690638Z + db=sb
2023-08-04T13:08:49.127690638Z + db_port=9642
2023-08-04T13:08:49.127712038Z + ovn_db_file=/etc/ovn/ovnsb_db.db
2023-08-04T13:08:49.127854181Z + [[ ! ssl:10.0.102.2:9642,ssl:10.0.42.108:9642,ssl:10.0.74.128:9642 =~ .:10\.0\.42\.108:. ]]
2023-08-04T13:08:49.128199437Z ++ bracketify 10.0.42.108
2023-08-04T13:08:49.128237768Z ++ case "$1" in
2023-08-04T13:08:49.128265838Z ++ echo 10.0.42.108
2023-08-04T13:08:49.128493242Z + OVN_ARGS='--db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.0.42.108 --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt'
2023-08-04T13:08:49.128535253Z + CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:49.128819438Z ++ date -Iseconds
2023-08-04T13:08:49.130157063Z 2023-08-04T13:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:49.130170893Z + echo '2023-08-04T13:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.0.102.2'
2023-08-04T13:08:49.130170893Z + initialize=false
2023-08-04T13:08:49.130179713Z + [[ ! -e /etc/ovn/ovnsb_db.db ]]
2023-08-04T13:08:49.130318475Z + [[ false == \t\r\u\e ]]
2023-08-04T13:08:49.130406657Z + wait 9
2023-08-04T13:08:49.130493659Z + exec /usr/share/ovn/scripts/ovn-ctl -db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.0.42.108 --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '-ovn-sb-log=-vconsole:info -vfile:off -vPATTERN:console:%D
{%Y-%m-%dT%H:%M:%S.###Z}
|%05N|%c%T|%p|%m' run_sb_ovsdb
2023-08-04T13:08:49.208399304Z 2023-08-04T13:08:49.208Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-sb.log
2023-08-04T13:08:49.213507987Z ovn-sbctl: unix:/var/run/ovn/ovnsb_db.sock: database connection failed (No such file or directory)
2023-08-04T13:08:49.224890005Z 2023-08-04T13:08:49Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2023-08-04T13:08:49.224912156Z 2023-08-04T13:08:49Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connection attempt failed (No such file or directory)
2023-08-04T13:08:49.255474964Z 2023-08-04T13:08:49.255Z|00002|raft|INFO|local server ID is 7f92
2023-08-04T13:08:49.333342909Z 2023-08-04T13:08:49.333Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.1.2
2023-08-04T13:08:49.348948944Z 2023-08-04T13:08:49.348Z|00004|reconnect|INFO|ssl:10.0.102.2:9644: connecting...
2023-08-04T13:08:49.349002565Z 2023-08-04T13:08:49.348Z|00005|reconnect|INFO|ssl:10.0.74.128:9644: connecting...
2023-08-04T13:08:49.352510569Z 2023-08-04T13:08:49.352Z|00006|reconnect|INFO|ssl:10.0.102.2:9644: connected
2023-08-04T13:08:49.353870484Z 2023-08-04T13:08:49.353Z|00007|reconnect|INFO|ssl:10.0.74.128:9644: connected
2023-08-04T13:08:49.889326777Z 2023-08-04T13:08:49.889Z|00008|raft|INFO|server 2501 is leader for term 5
2023-08-04T13:08:49.890316765Z 2023-08-04T13:08:49.890Z|00009|raft|INFO|rejecting append_request because previous entry 5,1538 not in local log (mismatch past end of log)
2023-08-04T13:08:49.891199951Z 2023-08-04T13:08:49.891Z|00010|raft|INFO|rejecting append_request because previous entry 5,1539 not in local log (mismatch past end of log)
2023-08-04T13:08:50.225632838Z 2023-08-04T13:08:50Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2023-08-04T13:08:50.225677739Z 2023-08-04T13:08:50Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connected
2023-08-04T13:08:50.227772827Z Waiting for OVN_Southbound to come up.
2023-08-04T13:08:55.716284614Z 2023-08-04T13:08:55.716Z|00011|raft|INFO|ssl:10.0.74.128:43498: learned server ID 3dff
2023-08-04T13:08:55.716323395Z 2023-08-04T13:08:55.716Z|00012|raft|INFO|ssl:10.0.74.128:43498: learned remote address ssl:10.0.74.128:9644
2023-08-04T13:08:55.724570375Z 2023-08-04T13:08:55.724Z|00013|raft|INFO|ssl:10.0.102.2:47804: learned server ID 2501
2023-08-04T13:08:55.724599466Z 2023-08-04T13:08:55.724Z|00014|raft|INFO|ssl:10.0.102.2:47804: learned remote address ssl:10.0.102.2:9644
2023-08-04T13:08:59.348572779Z 2023-08-04T13:08:59.348Z|00015|memory|INFO|32296 kB peak resident set size after 10.1 seconds
2023-08-04T13:08:59.348648190Z 2023-08-04T13:08:59.348Z|00016|memory|INFO|atoms:35959 cells:31476 monitors:0 n-weak-refs:749 raft-connections:4 raft-log:1543 txn-history:100 txn-history-atoms:7100
➜ static-kas git:(master) oc --kubeconfig=/tmp/kk logs ovnkube-master-7qlm5 -n openshift-ovn-kubernetes -c nbdb 
2023-08-04T13:08:48.779743434Z + [[ -f /env/_master ]]
2023-08-04T13:08:48.779743434Z + trap quit TERM INT
2023-08-04T13:08:48.779825516Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2023-08-04T13:08:48.779825516Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2023-08-04T13:08:48.779825516Z + transport=ssl
2023-08-04T13:08:48.779825516Z + ovn_raft_conn_ip_url_suffix=
2023-08-04T13:08:48.779825516Z + [[ 10.0.42.108 == \: ]]
2023-08-04T13:08:48.779825516Z + db=nb
2023-08-04T13:08:48.779825516Z + db_port=9641
2023-08-04T13:08:48.779825516Z + ovn_db_file=/etc/ovn/ovnnb_db.db
2023-08-04T13:08:48.779887606Z + [[ ! ssl:10.0.102.2:9641,ssl:10.0.42.108:9641,ssl:10.0.74.128:9641 =~ .:10\.0\.42\.108:. ]]
2023-08-04T13:08:48.780159182Z ++ bracketify 10.0.42.108
2023-08-04T13:08:48.780167142Z ++ case "$1" in
2023-08-04T13:08:48.780172102Z ++ echo 10.0.42.108
2023-08-04T13:08:48.780314224Z + OVN_ARGS='--db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=10.0.42.108 --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt'
2023-08-04T13:08:48.780314224Z + CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:48.780518588Z ++ date -Iseconds
2023-08-04T13:08:48.781738820Z 2023-08-04T13:08:48+00:00 - starting nbdb CLUSTER_INITIATOR_IP=10.0.102.2, K8S_NODE_IP=10.0.42.108
2023-08-04T13:08:48.781753021Z + echo '2023-08-04T13:08:48+00:00 - starting nbdb CLUSTER_INITIATOR_IP=10.0.102.2, K8S_NODE_IP=10.0.42.108'
2023-08-04T13:08:48.781753021Z + initialize=false
2023-08-04T13:08:48.781753021Z + [[ ! -e /etc/ovn/ovnnb_db.db ]]
2023-08-04T13:08:48.781816342Z + [[ false == \t\r\u\e ]]
2023-08-04T13:08:48.781936684Z + wait 9
2023-08-04T13:08:48.781974715Z + exec /usr/share/ovn/scripts/ovn-ctl -db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=10.0.42.108 --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '-ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D
{%Y-%m-%dT%H:%M:%S.###Z}
|%05N|%c%T|%p|%m' run_nb_ovsdb
2023-08-04T13:08:48.851644059Z 2023-08-04T13:08:48.851Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2023-08-04T13:08:48.852091247Z ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
2023-08-04T13:08:48.861365357Z 2023-08-04T13:08:48Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2023-08-04T13:08:48.861365357Z 2023-08-04T13:08:48Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory)
2023-08-04T13:08:48.875126148Z 2023-08-04T13:08:48.875Z|00002|raft|INFO|local server ID is c503
2023-08-04T13:08:48.911846610Z 2023-08-04T13:08:48.911Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.1.2
2023-08-04T13:08:48.918864408Z 2023-08-04T13:08:48.918Z|00004|reconnect|INFO|ssl:10.0.102.2:9643: connecting...
2023-08-04T13:08:48.918934490Z 2023-08-04T13:08:48.918Z|00005|reconnect|INFO|ssl:10.0.74.128:9643: connecting...
2023-08-04T13:08:48.923439162Z 2023-08-04T13:08:48.923Z|00006|reconnect|INFO|ssl:10.0.102.2:9643: connected
2023-08-04T13:08:48.925166154Z 2023-08-04T13:08:48.925Z|00007|reconnect|INFO|ssl:10.0.74.128:9643: connected
2023-08-04T13:08:49.861650961Z 2023-08-04T13:08:49Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2023-08-04T13:08:49.861747153Z 2023-08-04T13:08:49Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connected
2023-08-04T13:08:49.875272530Z 2023-08-04T13:08:49.875Z|00008|raft|INFO|server fccb is leader for term 6
2023-08-04T13:08:49.875302480Z 2023-08-04T13:08:49.875Z|00009|raft|INFO|rejecting append_request because previous entry 6,1732 not in local log (mismatch past end of log)
2023-08-04T13:08:49.876027164Z Waiting for OVN_Northbound to come up.
2023-08-04T13:08:55.694760761Z 2023-08-04T13:08:55.694Z|00010|raft|INFO|ssl:10.0.74.128:57122: learned server ID d382
2023-08-04T13:08:55.694800872Z 2023-08-04T13:08:55.694Z|00011|raft|INFO|ssl:10.0.74.128:57122: learned remote address ssl:10.0.74.128:9643
2023-08-04T13:08:55.706904913Z 2023-08-04T13:08:55.706Z|00012|raft|INFO|ssl:10.0.102.2:43230: learned server ID fccb
2023-08-04T13:08:55.706931733Z 2023-08-04T13:08:55.706Z|00013|raft|INFO|ssl:10.0.102.2:43230: learned remote address ssl:10.0.102.2:9643
2023-08-04T13:08:58.919567770Z 2023-08-04T13:08:58.919Z|00014|memory|INFO|21944 kB peak resident set size after 10.1 seconds
2023-08-04T13:08:58.919643762Z 2023-08-04T13:08:58.919Z|00015|memory|INFO|atoms:8471 cells:7481 monitors:0 n-weak-refs:200 raft-connections:4 raft-log:1737 txn-history:72 txn-history-atoms:8165
➜ static-kas git:(master)

This seems to happen very frequently now, but was not happening before around July 21st.

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration?buildId=1684628739427667968

https://github.com/openshift/cluster-network-operator/pull/1978

Bug OCPBUGS-22924: e2e-ibmcloud-csi is failing too much

View the Description View the linked PRs

In our CI, pre-submit jobs for IBM VPC CSI driver and its operator are failing with:

[sig-arch] events should not repeat pathologically for ns/openshift-cluster-csi-drivers expand_less0s{  16 events happened too frequently

event happened 25 times, something is wrong: ns/openshift-cluster-csi-drivers pod/ibm-vpc-block-csi-node-vck82 node/ci-op-jsqf19qs-00b5a-mjg8w-master-1 hmsg/99d84ba4c3 - pathological/true reason/FailedToRetrieveImagePullSecret Unable to retrieve some image pull secrets (bluemix-default-secret, bluemix-default-secret-regional, bluemix-default-secret-international, icr-io-secret); attempting to pull the image may not succeed. From: 06:44:57Z To: 06:44:58Z result=reject

Example:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ibm-vpc-block-csi-driver-operator/88/pull-ci-openshift-ibm-vpc-block-csi-driver-operator-master-e2e-ibmcloud-csi/1720315305915322368

Operator CI:

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-ibm-vpc-block-csi-driver-operator-master-e2e-ibmcloud-csi

Driver CI:

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-ibm-vpc-block-csi-driver-master-e2e-ibmcloud-csi

The driver itself looks working, so it's probably just a transient, but annoying error.

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/87

Bug OCPBUGS-23466: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3218

Bug OCPBUGS-23779: After PatternFly5 update: YAML editor > Show tooltips let the page crash

View the Description View the linked PRs

Issue 52 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Resource YAML view: Click on "Show tooltips" let the current page crash

Screenshot: https://drive.google.com/file/d/1lT3mUAPIm0ba5tNVDW3Ztz6Hgj4D1DFz/view?usp=drive_link

https://github.com/openshift/console/pull/13382

Bug OU-417: [release-4.15] When custom datasource is not found it should not fallback to the default in cluster prometheus

View the Description View the linked PRs

when using the monitoring plugin with the console dashboards plugin, if a custom datasource defined in a dashboard is not found, the default in cluster prometheus is used to fetch data. This creates a false assumption to the user that the custom dashboard is working when in reality, it should fail.

https://github.com/openshift/monitoring-plugin/pull/118

Bug HOSTEDCP-1518: Container resource preservation is not working for some hosted control plane components

View the Description View the linked PRs

The following components do not preserve their container resource requests/limits on reconciliation when modified by an external source:

catalog-operator
olm-operator
packageserver
cluster-network-operator

The original change to add this resource preservation support doesn't appear to have accomplished the desired behavior for these specific components.

https://github.com/openshift/hypershift/pull/4032

Bug OCPBUGS-21610: Monitoring-plugin can not start on IPv6 disabled cluster

View the Description View the linked PRs

Description of problem:

monitoring-plugin can not be started on IPv6 disabled cluster as the pod listen on [::]:9443. 

Monitoring-plugin should listen on [::]:9443 on IPv6 enabled cluster
Monitoring-plugin should listen on 0.0.0.0:9443 on IPv6 disabled cluster.


$oc logs monitoring-plugin-dc84478c-5rwmm2023/10/14 13:42:41 [emerg] 1#0: socket() [::]:9443 failed (97: Address family not supported by protocol)nginx: [emerg] socket() [::]:9443 failed (97: Address family not supported

Version-Release number of selected component (if applicable):

4.14.0-rc.5

How reproducible:

Always

Steps to Reproduce:

1) disable ipv6 following https://access.redhat.com/solutions/5513111

cat <<EOF |oc create -f -
apiVersion: machineconfiguration.openshift.io
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-openshift-machineconfig-master-kargs
spec:
  kernelArguments:
  - ipv6.disable=1
EOF
 
cat <<EOF |oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
machineconfiguration.openshift.io/role: worker
  name: 99-openshift-machineconfig-worker-kargs
spec:
  kernelArguments:
   -  ipv6.disable=1
EOF

2) Check the mcp status

3) Check the monitoring plugin pod status

Actual results:
1) mcp is pending as monitor-plugin pod can not be schedule

$ oc get mcp |grep worker.
worker   rendered-worker-ba1d1b8306f65bc5ff53b0c05a54143f   False     True       False      5              3                   3                     0                      3h59m

$oc logs machine-config-controller-5b96788c69-j9d7k
I1014 13:05:57.767217       1 drain_controller.go:350] Previous node drain found. Drain has been going on for 0.025260005567777778 hours
I1014 13:05:57.767228       1 drain_controller.go:173] node anlim14-c6jbb-worker-b-rgqq5.c.openshift-qe.internal: initiating drain
E1014 13:05:58.411241       1 drain_controller.go:144] WARNING: ignoring DaemonSet-managed ......
I1014 13:05:58.413116       1 drain_controller.go:144] evicting pod openshift-monitoring/monitoring-plugin-dc84478c-92xr4
E1014 13:05:58.422164       1 drain_controller.go:144] error when evicting pods/"monitoring-plugin-dc84478c-92xr4" -n "openshift-monitoring" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I1014 13:06:03.422338       1 drain_controller.go:144] evicting pod openshift-monitoring/monitoring-plugin-dc84478c-92xr4
E1014 13:06:03.433295       1 drain_controller.go:144] error when evicting pods/"monitoring-plugin-dc84478c-92xr4" -n "openshift-monitoring" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

2) monitoring-plugin pod listen on [::] which is an invalid address on IPv6 disabled cluster.

$oc extract cm/monitoring-plugin
$cat nginx.conf 
error_log /dev/stdout info;
events {}
http {
  include            /etc/nginx/mime.types;
  default_type       application/octet-stream;
  keepalive_timeout  65;
  server {
    listen              9443 ssl;
    listen              [::]:9443 ssl;
    ssl_certificate     /var/cert/tls.crt;
    ssl_certificate_key /var/cert/tls.key;
    root                /usr/share/nginx/html;
  }
}

Expected results:

Monitoring-plugin listens on [::]:9443 on IPv6 enabled cluster

Monitoring-plugin listens on 0.0.0.0:9443 on IPv6 disabled cluster.

Additional info:

The PR about how logging fix this issue. https://github.com/openshift/cluster-logging-operator/pull/2207/files#diff-dc6205a02c6c783e022ae0d4c726327bee4ef34cd1361541d1e3165ee7056b38R43

Bug OCPBUGS-25726: Dev console: Pipelines integration tests was disabled because the operator wasn't available on 4.15

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25206~~. The following is the description of the original issue:
—
We need to reenable the e2e integration tests as soon as the operator is available again.

https://github.com/openshift/console/pull/13463

Bug OCPBUGS-31951: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-apiserver/pull/109

Bug OCPBUGS-26413: [gcp] perms errors

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25654~~. The following is the description of the original issue:
—
Description of problem:

    Permission related errors in capi  capg and cluster-capi-operator  logs

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1.Install tech preview cluster with new PRs [https://issues.redhat.com/browse/OCPCLOUD-1718]
    2.Run regression suite of ClusterInfrastructure 
    
    Example run - https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/ginkgo-test/219040/testReport/

Actual results:

    Tests failing which are related to ccm , cpms

Expected results:

    tests pass

Additional info:

    Analysis of tests is done and Joel has also helped on new commits to mapi PRs to fix mapi related issues , but others repos are still wip.

Logs -
cluster capi operator errors :

[miyadav@miyadav ~]$ oc logs capi-controller-manager-74d65dd8f4-s5rlh --kubeconfig kk2 | grep -i denied
[miyadav@miyadav ~]$ oc logs capi-controller-manager-74d65dd8f4-s5rlh --kubeconfig kk2 | grep -i error
[miyadav@miyadav ~]$ oc logs cluster-capi-operator-66b7f99b9d-bbqxz --kubeconfig kk2 | grep -i error 
E1214 06:19:17.025379       1 kind.go:63] controller-runtime/source/EventHandler "msg"="if kind is a CRD, it should be installed before calling Start" "error"="failed to get restmapping: no matches for kind \"GCPCluster\" in group \"infrastructure.cluster.x-k8s.io\"" "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"GCPCluster"}
E1214 06:19:17.025874       1 kind.go:68] controller-runtime/source/EventHandler "msg"="failed to get informer from cache" "error"="failed to get restmapping: failed to find API group \"cluster.x-k8s.io\"" 
E1214 06:19:17.072299       1 kind.go:63] controller-runtime/source/EventHandler "msg"="if kind is a CRD, it should be installed before calling Start" "error"="failed to get restmapping: no matches for kind \"GCPCluster\" in group \"infrastructure.cluster.x-k8s.io\"" "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"GCPCluster"}
E1214 06:19:17.312724       1 kind.go:68] controller-runtime/source/EventHandler "msg"="failed to get informer from cache" "error"="failed to get restmapping: failed to find API group \"cluster.x-k8s.io\"" 
E1214 06:23:21.928322       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:23:43.558393       1 controller.go:324]  "msg"="Reconciler error" "error"="error during reconcile: failed to set conditions for CAPI Installer controller: Put \"https://172.30.0.1:443/apis/config.openshift.io/v1/clusteroperators/cluster-api/status\": dial tcp 172.30.0.1:443: connect: connection refused" "ClusterOperator"={"name":"cluster-api"} "controller"="clusteroperator" "controllerGroup"="config.openshift.io" "controllerKind"="ClusterOperator" "name"="cluster-api" "namespace"="" "reconcileID"="e36d1c19-dd22-4095-8d6b-50101f2bbefe"
E1214 06:23:47.931676       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:24:03.625555       1 controller.go:324]  "msg"="Reconciler error" "error"="error during reconcile: error applying CAPI provider \"cluster-api\" components: error applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - clusterclasses.cluster.x-k8s.io\" at position 0: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusterclasses.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - clusters.cluster.x-k8s.io\" at position 1: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusters.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machines.cluster.x-k8s.io\" at position 2: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machines.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machinesets.cluster.x-k8s.io\" at position 3: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machinesets.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machinedeployments.cluster.x-k8s.io\" at position 4: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machinedeployments.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machinepools.cluster.x-k8s.io\" at position 5: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machinepools.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - clusterresourcesets.addons.cluster.x-k8s.io\" at position 6: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusterresourcesets.addons.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - clusterresourcesetbindings.addons.cluster.x-k8s.io\" at position 7: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusterresourcesetbindings.addons.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - machinehealthchecks.cluster.x-k8s.io\" at position 8: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/machinehealthchecks.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - extensionconfigs.runtime.cluster.x-k8s.io\" at position 9: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/extensionconfigs.runtime.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - ipaddresses.ipam.cluster.x-k8s.io\" at position 10: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/ipaddresses.ipam.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"apiextensions.k8s.io/v1/CustomResourceDefinition - ipaddressclaims.ipam.cluster.x-k8s.io\" at position 11: Get \"https://172.30.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/ipaddressclaims.ipam.cluster.x-k8s.io\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"rbac.authorization.k8s.io/v1/ClusterRoleBinding - capi-manager-rolebinding\" at position 12: Get \"https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/capi-manager-rolebinding\": dial tcp 172.30.0.1:443: connect: connection refused\nerror applying CAPI provider component \"rbac.authorization.k8s.io/v1/ClusterRole - capi-manager-role\" at position 13: Get \"https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterroles/capi-manager-role\": dial tcp 172.30.0.1:443: connect: connection refused" "ClusterOperator"={"name":"cluster-api"} "controller"="clusteroperator" "controllerGroup"="config.openshift.io" "controllerKind"="ClusterOperator" "name"="cluster-api" "namespace"="" "reconcileID"="973b6337-9db3-4543-aa4f-e417b016e32f"
E1214 06:25:58.205862       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:29:53.798600       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:33:20.139517       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
E1214 06:34:16.142400       1 leaderelection.go:327] error retrieving resource lock openshift-cluster-api/cluster-capi-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-api/leases/cluster-capi-operator-leader": dial tcp 172.30.0.1:443: i/o timeout
E1214 06:45:15.546142       1 kubeconfig.go:81] KubeconfigController "msg"="Error reconciling kubeconfig" "error"="error generating kubeconfig: token can't be empty" "Secret"={"name":"cluster-capi-operator-secret","namespace":"openshift-cluster-api"} "controller"="secret" "controllerGroup"="" "controllerKind"="Secret" "name"="cluster-capi-operator-secret" "namespace"="openshift-cluster-api" "reconcileID"="910273fa-6f22-4326-a330-a235be2c6cc4"
E1214 06:45:15.560795       1 controller.go:324]  "msg"="Reconciler error" "error"="error generating kubeconfig: token can't be empty" "Secret"={"name":"cluster-capi-operator-secret","namespace":"openshift-cluster-api"} "controller"="secret" "controllerGroup"="" "controllerKind"="Secret" "name"="cluster-capi-operator-secret" "namespace"="openshift-cluster-api" "reconcileID"="910273fa-6f22-4326-a330-a235be2c6cc4"
E1214 06:45:15.567938       1 kubeconfig.go:81] KubeconfigController "msg"="Error reconciling kubeconfig" "error"="error generating kubeconfig: token can't be empty" "Secret"={"name":"cluster-capi-operator-secret","namespace":"openshift-cluster-api"} "controller"="secret" "controllerGroup"="" "controllerKind"="Secret" "name"="cluster-capi-operator-secret" "namespace"="openshift-cluster-api" "reconcileID"="d6e13dc5-9b90-42f3-bcbd-c451bf4359a9"

capg errors

[miyadav@miyadav ~]$ oc logs capg-controller-manager-6b54798bb9-x6vxk --kubeconfig kk2 | grep -i denied
E1214 07:26:10.892932       1 reconcile.go:152]  "msg"="Error creating an instance" "error"="googleapi: Error 400: SERVICE_ACCOUNT_ACCESS_DENIED - The user does not have access to service account 'miyadav-1412v3-28f9k-w@openshift-qe.iam.gserviceaccount.com'.  User: 'miyadav-1412-openshift-c-v5vsh@openshift-qe.iam.gserviceaccount.com'.  Ask a project owner to grant you the iam.serviceAccountUser role on the service account" "GCPMachine"={"name":"gcp-machinetemplate-6pgrk","namespace":"openshift-cluster-api"} "controller"="gcpmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="GCPMachine" "name"="gcp-machinetemplate-6pgrk" "namespace"="openshift-cluster-api" "reconcileID"="1cca1651-62b0-4939-b1fb-f7006dbef4eb" "zone"="us-central1-b"
E1214 07:26:10.892988       1 gcpmachine_controller.go:229]  "msg"="Error reconciling instance resources" "error"="googleapi: Error 400: SERVICE_ACCOUNT_ACCESS_DENIED - The user does not have access to service account 'miyadav-1412v3-28f9k-w@openshift-qe.iam.gserviceaccount.com'.  User: 'miyadav-1412-openshift-c-v5vsh@openshift-qe.iam.gserviceaccount.com'.  Ask a project owner to grant you the iam.serviceAccountUser role on the service account" "GCPMachine"={"name":"gcp-machinetemplate-6pgrk","namespace":"openshift-cluster-api"} "controller"="gcpmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="GCPMachine" "name"="gcp-machinetemplate-6pgrk" "namespace"="openshift-cluster-api" "reconcileID"="1cca1651-62b0-4939-b1fb-f7006dbef4eb"
E1214 07:26:10.911565       1 controller.go:324]  "msg"="Reconciler error" "error"="googleapi: Error 400: SERVICE_ACCOUNT_ACCESS_DENIED - The user does not have access to service account 'miyadav-1412v3-28f9k-w@openshift-qe.iam.gserviceaccount.com'.  User: 'miyadav-1412-openshift-c-v5vsh@openshift-qe.iam.gserviceaccount.com'.  Ask a project owner to grant you the iam.serviceAccountUser role on the service account" "GCPMachine"={"name":"gcp-machinetemplate-6pgrk","namespace":"openshift-cluster-api"} "controller"="gcpmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="GCPMachine" "name"="gcp-machinetemplate-6pgrk" "namespace"="openshift-cluster-api" "reconcileID"="1cca1651-62b0-4939-b1fb-f7006dbef4eb"

https://github.com/openshift/cluster-capi-operator/pull/155

Bug OCPBUGS-38198: Installer support for GCP deployments using short-lived credential formats

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38196~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-37821~~. The following is the description of the original issue:
—
Openshift Dedicated is in the process of developing an offering of GCP clusters that uses only short-lived credentials from the end user. For these clusters to be deployed, the pod running the Openshift Installer needs to function with GCP credentials that fit the short-lived credential formats. This worked in prior Installer versions, such as 4.14, but was not an explicit requirement.

https://github.com/openshift/installer/pull/8824

Bug OCPBUGS-30141: [4.15] OLM Operator packageserver Reporting Unavailable on InstallComponentFailed

View the Description View the linked PRs

(This is a clone of ~~OCPBUGS-24009~~ targeting 4.15)

TRT has picked up a somewhat rare but new failure coming out of the packageserver operator, it surfaces in this test. It appears to only be affecting Azure 4.14 -> 4.15 (aka minor) upgrades, seems to be roughly 5% of the time.

Examining job runs where this test failed in sippy we can see the error output is typically:

 operator conditions operator-lifecycle-manager-packageserver expand_less 0s
{Operator unavailable (ClusterServiceVersionNotSucceeded): ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallComponentFailed, message: install strategy failed: clusterrolebindings.rbac.authorization.k8s.io "packageserver-service-system:auth-delegator" already exists Operator unavailable (ClusterServiceVersionNotSucceeded): ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallComponentFailed, message: install strategy failed: clusterrolebindings.rbac.authorization.k8s.io "packageserver-service-system:auth-delegator" already exists}

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1729053846077968384

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1728600812575264768

or

{Operator unavailable (ClusterServiceVersionNotSucceeded): ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallComponentFailed, message: install strategy failed: could not create service packageserver-service: services "packageserver-service" already exists Operator unavailable (ClusterServiceVersionNotSucceeded): ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallComponentFailed, message: install strategy failed: could not create service packageserver-service: services "packageserver-service" already exists}

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1727785446827626496

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1727513681316548608

The failed job runs also indicate this problem appears to have started, or started occurring far more frequently, somewhere around Nov 14 - Nov 18. It's been very common since the 18th happening multiple times a day.

https://github.com/openshift/operator-framework-olm/pull/707

Bug OCPBUGS-32311: [4.15] Installed Operators in "Failed" status after upgrading to 4.15.3

View the Description View the linked PRs

Description of problem:

We upgraded our OpenShift Cluster from 4.4.16 to 4.15.3 and multiple operators are now in "Failed" status with the following CSV conditions such as:
- NeedsReinstall installing: deployment changed old hash=5f6b8fc6f7, new hash=5hFv6Gemy1Zri3J9ulXfjG9qOzoFL8FMsLNcLR
- InstallComponentFailed install strategy failed: rolebindings.rbac.authorization.k8s.io "openshift-gitops-operator-controller-manager-service-auth-reader" already exists

All other failures refer to a similar "auth-reader" rolebinding that already exist.

Version-Release number of selected component (if applicable):

OpenShift 4.15.3

How reproducible:

Happened on several installed operators but on the only cluster we upgraded (our staging cluster)

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

All operators should be up-to-date

Additional info:


This may be related to https://github.com/operator-framework/operator-lifecycle-manager/pull/3159

https://github.com/openshift/operator-framework-olm/pull/736

Bug OCPBUGS-18971: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/k8s-prometheus-adapter/pull/76

Bug OCPBUGS-25228: Remove CRI-O-update-triggered image wipe

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24743~~. The following is the description of the original issue:
—

Description of problem:

Since many 4.y ago, before 4.11 and all the minor versions that are still supported, CRI-O has wiped images when it comes up after a node reboot and notices it has a new (minor?) version. This causes redundant pulls, as seen in this 4.11-to-4.12 update run:

$ curl -s  https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade/1732741139229839360/artifacts/e2e-azure-sdn-upgrade/gather-extra/artifacts/nodes/ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4/journal | zgrep 'Starting update from rendered-\|crio-wipe\|Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2'
Dec 07 13:05:42.474144 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 13:05:42.481470 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 191ms CPU time
Dec 07 13:59:51.000686 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1498]: time="2023-12-07 13:59:51.000591203Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=a62bc972-67d7-401a-9640-884430bd16f1 name=/runtime.v1.ImageService/PullImage
Dec 07 14:00:55.745095 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 root[101294]: machine-config-daemon[99469]: Starting update from rendered-worker-ca36a33a83d49b43ed000fd422e09838 to rendered-worker-c0b3b4eadfe6cdfb595b97fa293a9204: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
Dec 07 14:05:33.274241 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 14:05:33.289605 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 216ms CPU time
Dec 07 14:14:50.277011 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1573]: time="2023-12-07 14:14:50.276961087Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=1a092fbd-7ffa-475a-b0b7-0ab115dbe173 name=/runtime.v1.ImageService/PullImage

The redundant pulls cost network and disk traffic, and avoiding them should make those update-initiated reboots quicker and cheaper. The lack of update-initiated wipes is not expected to cost much, because the Kubelet's old-image garbage collection should be along to clear out any no-longer-used images if disk space gets tight.

Version-Release number of selected component (if applicable):

At least 4.11. Possibly older 4.y; I haven't checked.

How reproducible:

Every time.

Steps to Reproduce:

1. Install a cluster.
2. Update to a release image with a different CRI-O (minor?) version.
3. Check logs on the nodes.

Actual results:

crio-wipe entries in the logs, with reports of target-release images being pulled before and after those wipes, as I quoted in the Description.

Expected results:

Target-release images pulled before the reboot, and found in the local cache if that image is needed again post-reboot.

https://github.com/openshift/machine-config-operator/pull/4072

Bug OCPBUGS-25714: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/59

Bug OCPBUGS-29361: Kube-apiserver operator is trying to delete prometheus rule that does not exists

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25894~~. The following is the description of the original issue:
—
Description of problem:

Kube-apiserver operator is trying to delete prometheus rule that does not exists leading to huge amount of unwanted audit logs, 

With the introduction of the change as a part of BUG-2004585 kube-apiserver SLO rulesare split into 2 groups kube-apiserver-slos-basic and kube-apiserver-slos-extended kube-apiserver-operator is trying to delete /apis/monitoring.coreos.com/v1/namespaces/openshift-kube-apiserver/prometheusrules/kube-apiserver-slos which no longer exist in the cluster

Version-Release number of selected component (if applicable):

4.12
4.13
4.14

How reproducible:

    Its easy to reproduce

Steps to Reproduce:

    1. install a cluster with 4.12
    2. enable cluster logging 
    3. forward the audit log to internal or external logstore using below config

apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  pipelines: 
  - name: all-to-default
    inputRefs:
    - infrastructure
    - application
    - audit
    outputRefs:
    - default     

    4. Check the audit logs in kibana, it will show the logs like below image

Actual results:

    Kube-apiserver-operator is trying to delete prometheus rule that does not exists in the cluster

Expected results:

if the rule is not there in the cluster it should not be searched for deletion

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1643

Bug OCPBUGS-20481: Multi-vcenter and wrong user/password in secret/vmware-vsphere-cloud-credentials causes the vSphere CSI Driver controller pods restarting

View the Description View the linked PRs

Multi-vcenter and wrong user/password in secret/vmware-vsphere-cloud-credentials causes the vSphere CSI Driver controller pods restarting

Description of problem:
When there are Multi-vcenter in secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers (see bug https://issues.redhat.com/browse/OCPBUGS-20478), the vSphere CSI Driver controller pods restarting are always restarting.

vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    Pending             0             0s
vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    ContainerCreating   0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-br4gs   0/13    Terminating         0             3s
vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    Terminating         0             1s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    Pending             0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    Pending             0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    ContainerCreating   0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   12/13   Terminating         0             9s
vmware-vsphere-csi-driver-controller-b946b657-7t74p     13/13   Terminating         0             9s
vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    Terminating         0             3s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             10s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   12/13   Terminating         0             9s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    ContainerCreating   0             2s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             10s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             11s

$ oc get co storage
storage                                    4.14.0-0.nightly-2023-10-10-084534   False       True          False      15s     VSphereCSIDriverOperatorCRAvailable: VMwareVSphereDriverControllerServiceControllerAvailable: Waiting for Deployment

$ oc logs -f deployment.apps/vmware-vsphere-csi-driver-controller --tail=500
{"level":"error","time":"2023-10-12T11:40:38.920487342Z","caller":"service/driver.go:189","msg":"failed to init controller. Error: ServerFaultCode: Cannot complete login due to an incorrect user name or password.","TraceId":"5e60e6c5-efeb-4080-888c-74182e4fb1f4","TraceId":"ec636d3d-1ddb-43a5-b9f7-8541dacff583","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/driver.go:189\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/driver.go:202\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
{"level":"info","time":"2023-10-12T11:40:38.920536779Z","caller":"service/driver.go:109","msg":"Configured: \"csi.vsphere.vmware.com\" with clusterFlavor: \"VANILLA\" and mode: \"controller\"","TraceId":"5e60e6c5-efeb-4080-888c-74182e4fb1f4","TraceId":"ec636d3d-1ddb-43a5-b9f7-8541dacff583"}
{"level":"error","time":"2023-10-12T11:40:38.920572294Z","caller":"service/driver.go:203","msg":"failed to run the driver. Err: +ServerFaultCode: Cannot complete login due to an incorrect user name or password.","TraceId":"5e60e6c5-efeb-4080-888c-74182e4fb1f4","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/driver.go:203\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}

$ oc logs vmware-vsphere-csi-driver-operator-b4b8d5d56-f76pc
I1012 11:43:08.973130       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"vmware-vsphere-csi-driver-operator", UID:"a8492b8c-8c13-4b15-aedc-6f3ced80618e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'DeploymentUpdateFailed' Failed to update Deployment.apps/vmware-vsphere-csi-driver-controller -n openshift-cluster-csi-drivers: Operation cannot be fulfilled on deployments.apps "vmware-vsphere-csi-driver-controller": the object has been modified; please apply your changes to the latest version and try again
E1012 11:43:08.996554       1 base_controller.go:268] VMwareVSphereDriverControllerServiceController reconciliation failed: Operation cannot be fulfilled on deployments.apps "vmware-vsphere-csi-driver-controller": the object has been modified; please apply your changes to the latest version and try again
W1012 11:43:08.999148       1 driver_starter.go:206] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver
W1012 11:43:09.390489       1 driver_starter.go:206] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver

Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-10-10-084534

How reproducible:
Always

Steps to Reproduce:
See Description

Actual results:
Storage CSI Driver pods are restarting

Expected results:
Storage CSI Driver pods should not restarting

Bug OCPBUGS-25507: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/78

Bug OCPBUGS-31599: [csi-snapshot-controller-operator] does not create suitable role and roleBinding for csi-snapshot-webhook

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31497~~. The following is the description of the original issue:
—
Description of problem:

[csi-snapshot-controller-operator] does not create suitable role and roleBinding for csi-snapshot-webhook

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.14.0-rc.0
Kustomize Version: v5.0.1
Server Version: 4.14.0-0.nightly-2024-03-28-004801
Kubernetes Version: v1.27.11+749fe1d

How reproducible:

Always

Steps to Reproduce:

    1. Create an OpenShift cluster on AWS;
    2. Check the csi-snapshot-webhook logs with no errors.

Actual results:

In step 2:
$ oc logs csi-snapshot-webhook-76bf9bd758-cxr7g
I0328 08:02:58.016020       1 certwatcher.go:129] Updated current TLS certificate
W0328 08:02:58.029464       1 reflector.go:424] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:117: failed to list *v1.VolumeSnapshotClass: volumesnapshotclasses.snapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumesnapshotclasses" in API group "snapshot.storage.k8s.io" at the cluster scope
E0328 08:02:58.029512       1 reflector.go:140] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotClass: failed to list *v1.VolumeSnapshotClass: volumesnapshotclasses.snapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumesnapshotclasses" in API group "snapshot.storage.k8s.io" at the cluster scope
W0328 08:02:58.888397       1 reflector.go:424] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:117: failed to list *v1.VolumeSnapshotClass: volumesnapshotclasses.snapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumesnapshotclasses" in API group "snapshot.storage.k8s.io" at the cluster scope

Expected results:

In step2 the csi-snapshot-webhook logs should have no cannot list resource errors

Additional info:

The issue exist on 4.15 and 4.16 as well, in addition since 4.15+ the webhook needs additional "VolumeGroupSnapshotClass" list permissions

$ oc logs csi-snapshot-webhook-794b7b54d7-b8vl9
...
E0328 12:12:06.509158       1 reflector.go:147] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:133: Failed to watch *v1alpha1.VolumeGroupSnapshotClass: failed to list *v1alpha1.VolumeGroupSnapshotClass: volumegroupsnapshotclasses.groupsnapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumegroupsnapshotclasses" in API group "groupsnapshot.storage.k8s.io" at the cluster scope
W0328 12:12:50.836582       1 reflector.go:535] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:133: failed to list *v1alpha1.VolumeGroupSnapshotClass: volumegroupsnapshotclasses.groupsnapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumegroupsnapshotclasses" in API group "groupsnapshot.storage.k8s.io" at the cluster scope
...

Bug OCPBUGS-18995: SDN: 4.14 after ec4 has a higher pod ready latency compared to 4.13.10

View the Description View the linked PRs

Description of problem:

This is to track the SDN specific issue in https://issues.redhat.com/browse/OCPBUGS-18389

4.14 nightly has a higher pod ready latency compared to 4.14 ec4 and 4.13.z in node-density (lite) test

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-11-201102

How reproducible:

Everytime

Steps to Reproduce:

1. Install a SDN cluster and scale up to 24 worker nodes, install 3 infra nodes and move monitoring, ingress, registry components to infra nodes. 
2. Run node-density (lite) test with 245 pod per node
3. Compare the pod ready latency to 4.13.z, and 4.14 ec4

Actual results:

4.14 nightly has a higher pod ready latency compared to 4.14 ec4 and 4.13.10

Expected results:

4.14 should have similar pod ready latency compared to previous release

Additional info:

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-ec.4	231559	292	087eb40c-6600-4db3-a9fd-3b959f4a434a	aws	amd64	SDN	24	245	2186	3256	https://drive.google.com/file/d/1NInCiai7WWIIVT8uL-5KKeQl9CtQN_Ck/view?usp=drive_link
4.14.0-0.nightly-2023-09-02-132842	231558	291	62404e34-672e-4168-b4cc-0bd575768aad	aws	amd64	SDN	24	245	58725	294279	https://drive.google.com/file/d/1BbVeNrWzVdogFhYihNfv-99_q8oj6eCN/view?usp=drive_link

With the new multus image provided by Dan Williams in https://issues.redhat.com/browse/OCPBUGS-18389, SDN 24 nodes's latency is similar to without the fix.

% oc -n openshift-network-operator get deployment.apps/network-operator -o yaml | grep MULTUS_IMAGE -A 1
        - name: MULTUS_IMAGE
          value: quay.io/dcbw/multus-cni:informer 
 % oc get pod -n openshift-multus -o yaml | grep image: | grep multus
      image: quay.io/dcbw/multus-cni:informer
....

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer	232389	314	f2c290c1-73ea-4f10-a797-3ab9d45e94b3	aws	amd64	SDN	24	245	61234	311776	https://drive.google.com/file/d/1o7JXJAd_V3Fzw81pTaLXQn1ms44lX6v5/view?usp=drive_link
4.14.0-ec.4	231559	292	087eb40c-6600-4db3-a9fd-3b959f4a434a	aws	amd64	SDN	24	245	2186	3256	https://drive.google.com/file/d/1NInCiai7WWIIVT8uL-5KKeQl9CtQN_Ck/view?usp=drive_link
4.14.0-0.nightly-2023-09-02-132842	231558	291	62404e34-672e-4168-b4cc-0bd575768aad	aws	amd64	SDN	24	245	58725	294279	https://drive.google.com/file/d/1BbVeNrWzVdogFhYihNfv-99_q8oj6eCN/view?usp=drive_link

Zenghui Shi Peng Liu request to modify the multus-daemon-config ConfigMap by removing readinessindicatorfile flag

scale down CNO deployment to 0
edit configmap to remove 80-openshift-network.conf (sdn) or 10-ovn-kubernetes.conf (ovn-k)
restart (delete) multus pod on each worker

Steps:

oc scale --replicas=0 -n openshift-network-operator deployments network-operator
oc edit cm multus-daemon-config -n openshift-multus, and remove the line "readinessindicatorfile": "/host/run/multus/cni/net.d/80-openshift-network.conf",
oc get po ~~n openshift-multus | grep multus~~ | egrep -v "multus-additional|multus-admission" | awk '{print $1}' | xargs oc delete po -n openshift-multus

Now the readinessindicatorfile flag is removed and And all multus pods are restarted

% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep readinessindicatorfile -c
0

Test Result: p99 is better compared to without the fix(remove readinessindicatorfile) but is stall worse than ec4, avg is still bad.

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer and remove `readinessindicatorfile` flag	232389	316	d7a754aa-4f52-49eb-80cf-907bee38a81b	aws	amd64	SDN	24	245	51775	105296	https://drive.google.com/file/d/1h-3JeZXQRO-zsgWzen6aNDQfSDqoKAs2/view?usp=drive_link

Zenghui Shi Peng Liu request to set logLever to debug in additional to removing readinessindicatorfile flag

edit the cm to set "logLevel": "verbose" -> "debug" and restart all multus pods

Now the logLever is debug and And all multus pods are restarted

% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep logLevel
        "logLevel": "debug",
% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep readinessindicatorfile -c
0

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer and remove `readinessindicatorfile` flag and logLevel=debug	232389	320	5d1d3e6a-bfa1-4a4b-bbfc-daedc5605f7d	aws	amd64	SDN	24	245	49586	105314	https://drive.google.com/file/d/1p1PDbnqm0NlWND-komc9jbQ1PyQMeWcV/view?usp=drive_link

Edit

https://github.com/openshift/multus-cni/pull/186

Bug OCPBUGS-19552: OKD: Agent-based Installer is broken for HA-deployments of OKD/FCOS when api-int.* endpoint is not defined

View the Description View the linked PRs

Description of problem:

Agent-based Installer fails to deploy a HA cluster (3x masters, 2x workers) with OKD/FCOS when the network DNS server does not resolve the api-int.* endpoint. The latter is not required for HA deployments and is actually never mentioned in OCP docs for Agent-based Installer. OCP is not affected at all.

Version-Release number of selected component (if applicable):

4.13
4.14
4.15

https://github.com/openshift/installer/pull/7516

Bug OCPBUGS-20381: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1578

Bug OCPBUGS-5571: Update API docs content based on docs review

View the Description View the linked PRs

Michael Burke reviewed the plugin API documentation as part of https://github.com/openshift/openshift-docs/pull/53103. We should update the ts-doc comments in the openshift/console repo based on this review.

cc Olivia Payne Jakub Hadvig

https://github.com/openshift/console/pull/13108

Bug OCPBUGS-16597: A Master Machine is stuck in deleting state after replacing the network by a wrong one in CPMS and updating it back

View the Description View the linked PRs

Description of problem:

After updating a CPMS CR with a non-existent network a machine is stuck in provisioning state.
The when updating the CPMS with the previous one the Master Machine is stuck in deleting state 

Logs from the machine api controller:
I0720 13:03:58.894171       1 controller.go:187] ostest-2pwfk-master-xwprn-0: reconciling Machine
I0720 13:03:58.902876       1 controller.go:231] ostest-2pwfk-master-xwprn-0: reconciling machine triggers delete
E0720 13:04:00.200290       1 controller.go:255] ostest-2pwfk-master-xwprn-0: failed to delete machine: filter matched no resources
E0720 13:04:00.200499       1 controller.go:329]  "msg"="Reconciler error" "error"="filter matched no resources" "controller"="machine-controller" "name"="ostest-2pwfk-master-xwprn-0" "namespace"="openshift-machine-api" "object"={"name":"ostest-2pwfk-master-xwprn-0","namespace":"openshift-machine-api"} "reconcileID"="9ccb5885-4b9f-4190-95a2-1120f2566c52"

Version-Release number of selected component (if applicable):

OCP 4.14.0-0.nightly-2023-07-18-085740
RHOS-17.1-RHEL-9-20230712.n.1

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-provider-openstack/pull/90

Bug OCPBUGS-23780: After PatternFly5 update: Pod status ring is missing in topology graph view

View the Description View the linked PRs

Issue 53 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Topology > Pod rings are missing for Deployments

Screenshot: https://drive.google.com/file/d/1RXCMKjvu2mdO2tQeHe-p5mLbINfmP5u4/view?usp=drive_link

https://github.com/openshift/console/pull/13376

Bug MGMT-15684: Custom Manifest - exception error for manifest containing space within name not aligned with other exceptions with invalid names

View the Description View the linked PRs

Description of the problem:
Minor Issue :

testing api functions to add manifest to cluster , noticed that for invalid file names we normally get

status=422,
reason="Unprocessable Entity",

however for the file name : "sp ce.yaml "
we get 400 not 422 , and general Bad Request entity

Reason: Bad Request
HTTP response headers: HTTPHeaderDict(

{'content-type': 'application/json', 'vary': 'Accept-Encoding,Origin', 'date': 'Thu, 31 Aug 2023 14:51:39 GMT', 'content-length': '177', 'x-envoy-upstream-service-time': '5', 'server': 'envoy', 'set-cookie': 'bd0de3dae0f495ebdb32e3693e2b9100=0f4b5982ace0eb64263ae6f95fd1452e; path=/; HttpOnly; Secure; SameSite=None'}

)
HTTP response body:

{"code":"400","href":"","id":400,"kind":"Error","reason":"Cluster manifest sp ce.yaml for cluster cbf119a6-29cc-4db8-aa76-65d2ca4b0a46 should not include a space in its name."}

I believe it is better to align this with the same exception we getting (for example when creating file with invalid file extension , or file name which already exist (422)

How reproducible:

Steps to reproduce:

1. try to create via api v2_create_cluster_manifest manifest with the name "sp ce.yaml"

2.

3.

Actual results:

getting 400 , Badrequest

Expected results:
422 , reason="Unprocessable Entity",

https://github.com/openshift/assisted-service/pull/5634

Vulnerability OCPBUGS-46926: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/940

Story TRT-1374: Hypershift broken in CI payloads on failure to update cloud-controller-manager-operator service

View the Description View the linked PRs

CVO reporting:

Could not update service "openshift-cloud-controller-manager-operator/cloud-controller-manager-operator"
(111 of 613): resource may have been deleted

Reported by hypershift team who were first to notice: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1701279683347589

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1729892601563189248

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1729892601563189248/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestCreateCluster/hostedcluster-example-wtq8t/cluster-scoped-resources/config.openshift.io/clusterversions.yaml

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/303

Bug OCPBUGS-47646: Developer Console Dashboards Page Deduplicates Data

View the Description View the linked PRs

This is a clone of issue OCPBUGS-45334. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43441~~. The following is the description of the original issue:
—
Description of problem:

    The developer perspective dashboards page will deduplicate data before showing it on the table.

How reproducible:

Steps to Reproduce:

    1. Apply this dashboard yaml (https://drive.google.com/file/d/1PcErgAKqu95yFi5YDAM5LxaTEutVtbrs/view?usp=sharing)
    2. open the dashboard on the Admin console and should be list all the rows
    3. open the dashboard on the Developer console selecting openshift-kube-scheduler projectand  
    4. See that when varying Plugin are available under the Execution Time table they are combined in the developer perspective per Pod

Actual results:

    The Developer Perspective Dashboards Table doesn't display all rows returned from a query.

Expected results:

    The Developer Perspective Dashboards Table displays all rows returned from a query.

Additional info:

Admin Console: https://drive.google.com/file/d/1EIMYHBql0ql1zYiKlqOJh7hyqG-JFjla/view?usp=sharing

Developer Console: https://drive.google.com/file/d/1jk-Fxq9I6LDYzBGLFTUDDsGqERzwWJrl/view?usp=sharing

It works as expected on OCP <= 4.14

https://github.com/openshift/console/pull/14649

Story TRT-1378: openshift-cloud-controller-manager-operator trying to pull 4.2.0 image for kube-rbac-proxy

View the Description View the linked PRs

Causing payload rejection now.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/304/files caused it

From Trevor:

https://redhat-internal.slack.com/archives/CBZHF4DHC/p1701485079971669

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27231/pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6/1730718245385670656
: [sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager-operator expand_less 0s
{ 1 events happened too frequently

event happened 374 times, something is wrong: namespace/openshift-cloud-controller-manager-operator node/master-1.ostest.test.metalkube.org pod/cluster-cloud-controller-manager-operator-5b6b87b648-rzdbc hmsg/873af7a9ec - reason/BackOff Back-off pulling image "quay.io/openshift/origin-kube-rbac-proxy:4.2.0" From: 00:53:59Z To: 00:54:00Z result=reject }
4.2 is an old-sounding tag? Seems like not-a-flake, but still gathering data

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/305

Bug MGMT-15949: Assisted-service crashes when creating agentclusterinstall without imageSetRef

View the Description View the linked PRs

Description of the problem:

Whenever creating an AgentClusterInstall without an imageSetRef, the assisted-service container crashes due to attempting to access a nil pointer

How reproducible:

100%

Steps to reproduce:

1. Create and agentclusterinstall without an imageSetRef field

Actual results:

assisted-service container crashes

Expected results:

AgentClusterInstall updates with specsynced error or sufficient defaults.

Additional Information:

Seems to be due to the fact that there is no check if spec.ImageSetRef is nil in this function: https://github.com/openshift/assisted-service/blob/91fcb5bc822de96602657efd883ed419bbb64963/internal/controller/controllers/clusterdeployments_controller.go#L1439C3-L1439C3

https://github.com/openshift/assisted-service/pull/5552

Bug OCPBUGS-19162: Update 4.15 ose-cluster-csi-snapshot-controller-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/160

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/160

Bug OCPBUGS-19918: when disabling ipsec, ds pods are deleted

View the Description View the linked PRs

Description of problem:

Issue was found when analyzing  bug https://issues.redhat.com/browse/OCPBUGS-19817

Version-Release number of selected component (if applicable):

4.15.0-0.ci-2023-09-25-165744

How reproducible:

everytime

Steps to Reproduce:

The cluster is ipsec cluster and enabled NS extension and ipsec service.
1.  enable e-w ipsec & wait for cluster to settle
2.  disable ipsec & wait for cluster to settle

you'll observer ipsec pods are deleted

Actual results:

no pods

Expected results:

pods should stay
see https://github.com/openshift/cluster-network-operator/blob/master/pkg/network/ovn_kubernetes.go#L314
	// If IPsec is enabled for the first time, we start the daemonset. If it is
	// disabled after that, we do not stop the daemonset but only stop IPsec.
	//
	// TODO: We need to do this as, by default, we maintain IPsec state on the
	// node in order to maintain encrypted connectivity in the case of upgrades.
	// If we only unrender the IPsec daemonset, we will be unable to cleanup
	// the IPsec state on the node and the traffic will continue to be
	// encrypted.

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2042

Bug OCPBUGS-23461: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1161

Bug OCPBUGS-33210: [4.15.z] ServiceAccounts can no longer be used as OAuth2 clients

View the Description View the linked PRs

Description of problem:

    OAuth-Proxy breaks when it's using Service Account as an oauth-client as documented in https://docs.openshift.com/container-platform/4.15/authentication/using-service-accounts-as-oauth-client.html

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    100%

Steps to Reproduce:

    1. install an OCP cluster without the ImageRegistry capability
    2. deploy an oauth-proxy that uses an SA as its OAuth2 client
    3. try to login to the oauth-proxy using valid credentials

Actual results:

    The login fails, the oauth-server logs:

2024-02-05T13:30:56.059910994Z E0205 13:30:56.059873       1 osinserver.go:91] internal error: system:serviceaccount:my-namespace:my-sa has no tokens

Expected results:

    The login succeeds

Additional info:

Bug OCPBUGS-19699: Remove warning about CPUPartitioning

View the Description View the linked PRs

Description of problem:


When CPUPartitioning is not set in install-config.yaml a warning message is still generated

WARNING CPUPartitioning:  is ignored

This warning is both incorrect, since the check is against "None" and the the value is an empty string when not set, and also no longer relevant now that https://issues.redhat.com//browse/OCPBUGS-18876 has been fixed.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Create an install config with CPUPartitioning not set
2. Run "openshift-install agent create image --dir cluster-manifests/ --log-level debug"

Actual results:

See the output "WARNING CPUPartitioning:  is ignored"

Expected results:

No warning

Additional info:

https://github.com/openshift/installer/pull/7527

Bug OCPBUGS-26487: [Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility setup fails frequently for OpenStack CSI jobs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25687~~. The following is the description of the original issue:
—
Description of problem:

The [Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility setup test is a frequent offender in the OpenStack CSI jobs. We're seeing it fail on 4.14 up to 4.16.

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.14-e2e-openstack-csi-cinder

Example of failed job.
Example of successful job.

It seems like the 1 min timeout is too short and does not give enough time for the pods backing the service to come up.

https://github.com/openshift/origin/blob/1e6c579/pkg/monitortests/network/disruptionpodnetwork/monitortest.go#L191

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn't need to read the entire case history.
Don't presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with "sbr-triaged"
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with "sbr-untriaged"
Note: bugs that do not meet these minimum standards will be closed with label "SDN-Jira-template"

https://github.com/openshift/origin/pull/28508

Bug OCPBUGS-30650: Hosted clusters default KAS PSA config should be consistent with OCP

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20249~~. The following is the description of the original issue:
—
Description of problem:

[Hypershift] default KAS PSA config should be consistent with OCP 
 enforce: privileged

Version-Release number of selected component (if applicable):

Cluster version is 4.14.0-0.nightly-2023-10-08-220853

How reproducible:

Always

Steps to Reproduce:

1. Install OCP cluster and hypershift operator
2. Create hosted cluster
3. Check the default kas config of the hosted cluster

Actual results:

The hosted cluster default kas PSA config enforce is 'restricted'
$ jq '.admission.pluginConfig.PodSecurity' < `oc extract cm/kas-config -n clusters-9cb7724d8bdd0c16a113 --confirm`
{
  "location": "",
  "configuration": {
    "kind": "PodSecurityConfiguration",
    "apiVersion": "pod-security.admission.config.k8s.io/v1beta1",
    "defaults": {
      "enforce": "restricted",
      "enforce-version": "latest",
      "audit": "restricted",
      "audit-version": "latest",
      "warn": "restricted",
      "warn-version": "latest"
    },
    "exemptions": {
      "usernames": [
        "system:serviceaccount:openshift-infra:build-controller"
      ]
    }
  }
}

Expected results:

The hosted cluster default kas PSA config enforce should be 'privileged' in

https://github.com/openshift/hypershift/blob/release-4.13/control-plane-operator/controllers/hostedcontrolplane/kas/config.go#L93

Additional info:

References: OCPBUGS-8710

https://github.com/openshift/hypershift/pull/3719

Bug OCPBUGS-15910: The KUBELET_NODE_IPS does not reflect in the kubelet service after the dual-stack conversion

View the Description View the linked PRs

$ oc get mc 01-master-kubelet -o json | jq -r '.spec.config.systemd.units | .[] | select(.name=="kubelet.service") | .contents'
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
After=ostree-finalize-staged.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
ExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env
EnvironmentFile=/etc/node-sizing.env

ExecStart=/usr/local/bin/kubenswrapper \
    /usr/bin/kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --runtime-cgroups=/system.slice/crio.service \
      --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
      --node-ip=${KUBELET_NODE_IP} \
      --minimum-container-ttl-duration=6m0s \
      --cloud-provider= \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
       \
      --hostname-override=${KUBELET_NODE_NAME} \
      --provider-id=${KUBELET_PROVIDERID} \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4c0a1b82501a416df4b926801bc3aa378d2762d0570a0791c6675db1a3365c62 \
      --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY},ephemeral-storage=${SYSTEM_RESERVED_ES} \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

https://github.com/openshift/machine-config-operator/blob/29b3729923273ae7f42cd20e096fa1a390d4b108/templates/master/01-master-kubelet/_base/units/kubelet.service.yaml#L33

https://github.com/openshift/machine-config-operator/pull/3909

Bug OCPBUGS-24149: Update 4.15 cluster-monitoring-operator-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2170

Bug OCPBUGS-39016: Bump to kubernetes 1.28.13

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.13:

Changelog:
v1.28.13: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v12812

https://github.com/openshift/kubernetes/pull/2063

Bug OCPBUGS-9422: Telemetry: Current page was sometimes not tracked when reloading the current page

View the Description View the linked PRs

Description of problem:
We want to understand our users, but the first page the user opens wasn't tracked.

Version-Release number of selected component (if applicable):
Saw this on Dev Sandbox with 4.10 and 4.11 with enabled telemetry

How reproducible:
Sometimes! Looks like a race condition and requires active telemetry

Steps to Reproduce:
1. Open the browser network inspector and filter for segment
2. Open the developer console

Actual results:
1-2 identity event is send, but no page event

Expected results:
At least one identity event and at least one page event should be send to segment

Additional info:

https://github.com/openshift/console/pull/13088

Bug OCPBUGS-19201: Update 4.15 ose-alibaba-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-alibaba/pull/44

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-alibaba/pull/44

Bug OCPBUGS-19961: Undiagnosed panic, pods/openshift-ovn-kubernetes_ovnkube-node-mtws2_ovnkube-controller

View the Description View the linked PRs

Description of problem:

Got undiagnosed panic:

: Undiagnosed panic detected in pod expand_less0s{  pods/openshift-ovn-kubernetes_ovnkube-node-mtws2_ovnkube-controller_previous.log.gz:E0929 20:36:20.743430    5682 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x1f9aaa0), concrete:(*runtime._type)(0x20da3e0), asserted:(*runtime._type)(0x22d0600), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.NetworkAttachmentDefinition)}

in this job:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade/1707819503263420416

Version-Release number of selected component (if applicable):

4.15 ci payload:

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.ci/release/4.15.0-0.ci-2023-09-29-180633
  https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1707819513325555712

How reproducible:

This is the first time I noticed it on the 4.15

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1935

Bug OCPBUGS-22453: OKD: ABI is broken for OKD/FCOS when disconnected registry is a subdomain of cluster domain

View the Description View the linked PRs

When using a disconnected image registry which is hosted at a subdomain of the cluster domain, then Agent-based Installer fails to install a OKD/FCOS cluster. The rendezvous host starts bootkube.sh but fails because it cannot resolve the registry DNS name:

Oct 25 12:47:03 master-0 bootkube.sh[6462]: error: unable to read image virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5: Get "https://virthost.ostest.test.metalkube.org:5000/v2/": dial tcp: lookup virthost.ostest.test.metalkube.org: no such host
Oct 25 12:47:03 master-0 systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Oct 25 12:47:03 master-0 systemd[1]: bootkube.service: Failed with result 'exit-code'.

This hit OpenShift CI jobs 'okd-e2e-agent-compact-ipv4' and 'okd-e2e-agent-sno-ipv6' based on openshift-metal3/dev-scripts. An example would be a OCP cluster domain (which contains the cluster name) of `ostest.test.metalkube.org` and a disconnected image registry at `virthost.ostest.test.metalkube.org`.

Other diagnosis from the rendezvous host:

[core@master-0 ~]$ sudo podman pull virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5
Trying to pull virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5...
Error: initializing source docker://virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5: pinging container registry virthost.ostest.test.metalkube.org:5000: Get "https://virthost.ostest.test.metalkube.org:5000/v2/": dial tcp: lookup virthost.ostest.test.metalkube.org: no such host

curl -u ocp-user:ocp-pass https://virthost.ostest.test.metalkube.org:5000/v2/_catalog 
curl: (6) Could not resolve host: virthost.ostest.test.metalkube.org

core@master-0 ~]$ dig +noall +answer virthost.ostest.test.metalkube.org
;; communications error to 127.0.0.1#53: connection refused
;; communications error to 127.0.0.1#53: connection refused
;; communications error to 127.0.0.1#53: connection refused
virthost.ostest.test.metalkube.org. 0 IN A      192.168.111.1

After stopping systemd-resolved:

[core@master-0 ~]$ curl -u ocp-user:ocp-pass https://virthost.ostest.test.metalkube.org:5000/v2/_catalog 
{"repositories":["localimages/installer","localimages/local-release-image"]}

Report and diagnosis output above from Andrea Fasano.

https://github.com/openshift/installer/pull/7634

Bug OCPBUGS-23794: Shipwright builds decorator is not visible in topology view in the local setup

View the Description View the linked PRs

Description of problem:

When creating deployments/deployment-config and associated shipwright builds, different decorators associated with node in topology is not visible

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Install pipeline and shipwright operator
    2. Create deployment with build runs
    3. Run the cluster in the local setup
    4. Go to topology where deployments are created

Actual results:

No decorator visible

Expected results:

Decorators should be visible

Additional info:

https://github.com/openshift/console/pull/13376

Bug OCPBUGS-30963: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1912

Bug OCPBUGS-43032: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/803

Bug OCPBUGS-21790: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-azure/pull/78

Story TRT-1377: nightly-4.15-e2e-metal-ipi-sdn-bm failing to bootstrap affecting nightly payloads

View the Description View the linked PRs

Metal team has filed: ~~OCPBUGS-24328~~

Seems to be permafailing for several days now. First payload https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-11-30-112918

Failure to bootstrap is quite hard to decipher for us.

https://github.com/openshift/machine-config-operator/pull/4053

Task AGENT-670: Add cluster+host status and validation information from database into agent-gather logs

View the Description View the linked PRs

The validation and status shown from "wait-for bootstrap-complete" is sometimes inadequate or difficult to decipher because of the number of lines it prints out. The status and validation information is stored in the assisted-service database. agent-gather should query the database and log out the status/status_info columns for the cluster and hosts into a separate log file. A simple glance at this file would make triaging easier and faster.

https://github.com/openshift/installer/pull/7719

Bug MGMT-15971: local-agent-cluster-cluster-deployment displayed as Detached

View the Description View the linked PRs

Description of the problem:

Right after installation, hub cluster indicated two clusters:

local-agent-cluster-cluster-deployment
local-cluster

Status of local-agent-cluster-cluster-deployment is Detached.

Also there is no information about Labels, Nodes and Add-ons.

How reproducible:

100%

Steps to reproduce:

1. Deploy OCP 4.14 x86_64

2. Open cluster management console

3. Open All clusters view

Actual results:

Status of local-agent-cluster-cluster-deployment is Detached.

Expected results:

Status of local-agent-cluster-cluster-deployment is Ready.

https://github.com/openshift/assisted-service/pull/5575

Bug OCPBUGS-18783: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/791

Bug OCPBUGS-18864: Update 4.15 ironic-static-ip-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-static-ip-manager/pull/40

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-static-ip-manager/pull/40

Bug OCPBUGS-19172: Update 4.15 ose-azure-disk-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-disk-csi-driver-operator/pull/98

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-19219: Update 4.15 ose-agent-installer-utils image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/agent-installer-utils/pull/29

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/agent-installer-utils/pull/29

Bug OCPBUGS-22459: Konnectivity container in apiserver pod should delay shutdown

View the Description View the linked PRs

Description of problem:

In HyperShift 4.14, the konnectivity server is run inside the kube-apiserver pod. When this pod is deleted for any reason, the konnectivity server container can drop before the rest of the pod terminates, which can cause network connections to drop. The following preStop definition can be added to the container to ensure it stays alive long enough for the rest of the pod to clean up.

lifecycle:
  preStop:
    exec:
      command:
        - /bin/sh
        - -c
        - sleep 70

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3250

Bug OCPBUGS-25798: Bundle Snapshot taken from wrong namespace for Deprecation Conditions

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24526~~. The following is the description of the original issue:
—
Description of problem:

Snapshots taken to gather deprecation information from bundles are from the Subscription namespace instead of the CatalogSource namespace. That means that if the Subscription is in a different namespace then no bundles will be present in the snapshot.

How reproducible:

100%

Steps to Reproduce:

1.Create CatalogSource with olm.deprecation entries
2.Create Subscription targeting a package with deprecations in a different namespace.

Actual results:

No Deprecation Conditions will be present.

Expected results:

Deprecation Conditions should be present.

Bug OCPBUGS-37202: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14073

Bug OCPBUGS-26411: Logs for PipelineRuns fetched from the Tekton Results API is not loading

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25612~~. The following is the description of the original issue:
—
Description of problem:

    Logs for PipelineRuns fetched from the Tekton Results API is not loading

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Navigate to the Log tab of PipelineRun fetched from the Tekton Results
    2.
    3.

Actual results:

    Logs window is empty with a loading indicator

Expected results:

Logs should be shown

Additional info:

https://github.com/openshift/console/pull/13486

Bug OCPBUGS-31806: Local development of console with auth is failing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31695~~. The following is the description of the original issue:
—
Description of problem:

    When trying to run console in local development with auth, the run-bridge.sh script fails out.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Follow step for local development of console with auth - https://github.com/openshift/console/tree/master?tab=readme-ov-file#openshift-with-authentication
    2.
    3.

Actual results:

The run-bridge.sh scripts fails with:

    
$ ./examples/run-bridge.sh
++ oc whoami --show-server
++ oc -n openshift-config-managed get configmap monitoring-shared-config -o 'jsonpath={.data.alertmanagerPublicURL}'
++ oc -n openshift-config-managed get configmap monitoring-shared-config -o 'jsonpath={.data.thanosPublicURL}'
+ ./bin/bridge --base-address=http://localhost:9000 --ca-file=examples/ca.crt --k8s-auth=openshift --k8s-mode=off-cluster --k8s-mode-off-cluster-endpoint=https://api.lprabhu-030420240903.devcluster.openshift.com:6443 --k8s-mode-off-cluster-skip-verify-tls=true --listen=http://127.0.0.1:9000 --public-dir=./frontend/public/dist --user-auth=openshift --user-auth-oidc-client-id=console-oauth-client --user-auth-oidc-client-secret-file=examples/console-client-secret --user-auth-oidc-ca-file=examples/ca.crt --k8s-mode-off-cluster-alertmanager=https://alertmanager-main-openshift-monitoring.apps.lprabhu-030420240903.devcluster.openshift.com --k8s-mode-off-cluster-thanos=https://thanos-querier-openshift-monitoring.apps.lprabhu-030420240903.devcluster.openshift.com
W0403 14:25:07.936281   49352 authoptions.go:99] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
F0403 14:25:07.936827   49352 main.go:539] Failed to create k8s HTTP client: failed to read token file "/var/run/secrets/kubernetes.io/serviceaccount/token": open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

Expected results:

    Bridge runs fine

Additional info:

https://github.com/openshift/console/pull/13732

Bug OCPBUGS-44275: Errors reported by tuned when using SecureBoot

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44039~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38900. The following is the description of the original issue:
—
Description of problem:

When using SecureBoot tuned reports the following error as debugfs access is restricted:

tuned.utils.commands: Writing to file '/sys/kernel/debug/sched/migration_cost_ns' error: '[Errno 1] Operation not permitted: '/sys/kernel/debug/sched/migration_cost_ns''
tuned.plugins.plugin_scheduler: Error writing value '5000000' to 'migration_cost_ns'

This issue has been reported with the following tickets:

As this is a confirmed limitation of the NTO due to the TuneD component, we should document this as a limitation in the OpenShift Docs:
https://docs.openshift.com/container-platform/4.16/nodes/nodes/nodes-node-tuning-operator.html

Expected Outcome:

Document that the NTO cannot leverage some of the Tuned features when secureboot is enabled.

https://github.com/openshift/cluster-node-tuning-operator/pull/1207

Bug OCPBUGS-19059: baremetal 4.14.0-rc.0 ipv6 sno cluster, no Observe menu on admin console, monitoring-plugin is failed

View the Description View the linked PRs

Description of problem:

baremetal 4.14.0-rc.0 ipv6 sno cluster, login as admin user to admin console, there is not Observe menu on the left navigation bar, see picture, https://drive.google.com/file/d/13RAXPxtKhAElN9xf8bAmLJa0GI8pP0fH/view?usp=sharing, monitoring-plugin status is Failed, see: https://drive.google.com/file/d/1YsSaGdLT4bMn-6E-WyFWbOpwvDY4t6na/view?usp=sharing, error is

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
r: Bad Gateway

checked console logs, 9443: connect: connection refused

$ oc -n openshift-console logs console-6869f8f4f4-56mbj
...
E0915 12:50:15.498589       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:15 http: panic serving [fd01:0:0:1::2]:39156: runtime error: invalid memory address or nil pointer dereference
goroutine 183760 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0003b5760, 0x2?, {0xc0009bc7d1, 0x11}, {0x3a41fa0, 0xc0002f6c40}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xaa00000000000010?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0002f6c40?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc0001f7500?, {0x3a41fa0?, 0xc0002f6c40?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0x5120938?, {0x3a41fa0?, 0xc0002f6c40?}, 0x7ffb6ea27f18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc001102c00?, {0x3a41fa0?, 0xc0002f6c40?}, 0xc000655a00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0002f6c40}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0002f6c40?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc0008201e0?}, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc0009b4120, {0x3a43e70, 0xc001223500})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed
I0915 12:50:24.267777       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
I0915 12:50:24.267813       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
E0915 12:50:30.155515       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:30 http: panic serving [fd01:0:0:1::2]:42990: runtime error: invalid memory address or nil pointer dereference

9443 port is Connection refused

$ oc -n openshift-monitoring get pod -o wide
NAME                                                     READY   STATUS    RESTARTS   AGE     IP                  NODE    NOMINATED NODE   READINESS GATES
alertmanager-main-0                                      6/6     Running   6          3d22h   fd01:0:0:1::564     sno-2   <none>           <none>
cluster-monitoring-operator-6cb777d488-nnpmx             1/1     Running   4          7d16h   fd01:0:0:1::12      sno-2   <none>           <none>
kube-state-metrics-dc5f769bc-p97m7                       3/3     Running   12         7d16h   fd01:0:0:1::3b      sno-2   <none>           <none>
monitoring-plugin-85bfb98485-d4g5x                       1/1     Running   4          7d16h   fd01:0:0:1::55      sno-2   <none>           <none>
node-exporter-ndnnj                                      2/2     Running   8          7d16h   2620:52:0:165::41   sno-2   <none>           <none>
openshift-state-metrics-78df59b4d5-j6r5s                 3/3     Running   12         7d16h   fd01:0:0:1::3a      sno-2   <none>           <none>
prometheus-adapter-6f86f7d8f5-ttflf                      1/1     Running   0          4h23m   fd01:0:0:1::b10c    sno-2   <none>           <none>
prometheus-k8s-0                                         6/6     Running   6          3d22h   fd01:0:0:1::566     sno-2   <none>           <none>
prometheus-operator-7c94855989-csts2                     2/2     Running   8          7d16h   fd01:0:0:1::39      sno-2   <none>           <none>
prometheus-operator-admission-webhook-7bb64b88cd-bvq8m   1/1     Running   4          7d16h   fd01:0:0:1::37      sno-2   <none>           <none>
thanos-querier-5bbb764599-vlztq                          6/6     Running   6          3d22h   fd01:0:0:1::56a     sno-2   <none>           <none>

$  oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::f735   <none>        9443/TCP   7d16h


$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::f735...
* TCP_NODELAY set
* connect to fd02::f735 port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7

no such issue in other 4.14.0-rc.0 ipv4 cluster, but issue reproduced on other 4.14.0-rc.0 ipv6 cluster.
4.14.0-rc.0 ipv4 cluster,

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         20m     Cluster version is 4.14.0-rc.0

$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-85bfb98485-nh428                       1/1     Running   0          4m      10.128.0.107   ci-ln-pby4bj2-72292-l5q8v-master-0   <none>           <none>

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k  'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
...
{
  "name": "monitoring-plugin",
  "version": "1.0.0",
  "displayName": "OpenShift console monitoring plugin",
  "description": "This plugin adds the monitoring UI to the OpenShift web console",
  "dependencies": {
    "@console/pluginAPI": "*"
  },
  "extensions": [
    {
      "type": "console.page/route",
      "properties": {
        "exact": true,
        "path": "/monitoring",
        "component": {
          "$codeRef": "MonitoringUI"
        }
      }
    },
...

meet issue "9443: Connection refused" in 4.14.0-rc.0 ipv6 cluster(launched cluster-bot cluster: launch 4.14.0-rc.0 metal,ipv6) and login console

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         44m     Cluster version is 4.14.0-rc.0
$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-bd6ffdb5d-b5csk                        1/1     Running   0          53m   fd01:0:0:4::b             worker-0.ostest.test.metalkube.org   <none>           <none>
monitoring-plugin-bd6ffdb5d-vhtpf                        1/1     Running   0          53m   fd01:0:0:5::9             worker-2.ostest.test.metalkube.org   <none>           <none>
$ oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::402d   <none>        9443/TCP   59m

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::402d...
* TCP_NODELAY set
* connect to fd02::402d port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7$ oc -n openshift-console get pod | grep console
console-5cffbc7964-7ljft     1/1     Running   0          56m
console-5cffbc7964-d864q     1/1     Running   0          56m$ oc -n openshift-console logs console-5cffbc7964-7ljft
...
E0916 14:34:16.330117       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::402d]:9443: connect: connection refused
2023/09/16 14:34:16 http: panic serving [fd01:0:0:4::2]:37680: runtime error: invalid memory address or nil pointer dereference
goroutine 3985 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0008f6780, 0x2?, {0xc000665211, 0x11}, {0x3a41fa0, 0xc0009221c0}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xfe00000000000010?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0009221c0?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc000d8d600?, {0x3a41fa0?, 0xc0009221c0?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0xc000653830?, {0x3a41fa0?, 0xc0009221c0?}, 0x7f824506bf18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc00007e800?, {0x3a41fa0?, 0xc0009221c0?}, 0xc000b2da00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0009221c0}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0009221c0?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc000db9b00?}, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc000653680, {0x3a43e70, 0xc000676f30})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed

Version-Release number of selected component (if applicable):

baremetal 4.14.0-rc.0 ipv6 sno cluster,
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=virt_platform'  | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "virt_platform",
          "baseboard_manufacturer": "Dell Inc.",
          "baseboard_product_name": "01J4WF",
          "bios_vendor": "Dell Inc.",
          "bios_version": "1.10.2",
          "container": "kube-rbac-proxy",
          "endpoint": "https",
          "instance": "sno-2",
          "job": "node-exporter",
          "namespace": "openshift-monitoring",
          "pod": "node-exporter-ndnnj",
          "prometheus": "openshift-monitoring/k8s",
          "service": "node-exporter",
          "system_manufacturer": "Dell Inc.",
          "system_product_name": "PowerEdge R750",
          "system_version": "Not Specified",
          "type": "none"
        },
        "value": [
          1694785092.664,
          "1"
        ]
      }
    ]
  }
}

How reproducible:

ipv6 cluster

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

no Observe menu on admin console, monitoring-plugin is failed

Expected results:

no error

https://github.com/openshift/cluster-monitoring-operator/pull/2090

Bug OCPBUGS-21972: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/802

Bug OCPBUGS-23996: Trust bundle CA configmap should have ownership annotations

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/300

Bug OCPBUGS-46576: clean up containernetworking/plugins vendoring in sdn [4.15]

View the Description View the linked PRs

We hacked up the vendoring of containernetworking/plugins in sdn a while back to patch a CVE (where actually updating to the latest release of that module would have been too large a change) but the way we did it causes problems any time anyone tries to update anything else (eg to fix a CVE in another package). So we should simplify it.

https://github.com/openshift/sdn/pull/646

Bug OCPBUGS-19123: Update 4.15 ose-cloud-credential-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-credential-operator/pull/600

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-credential-operator/pull/600

Bug OCPBUGS-22976: S2I Build Wizard should check for Containerfile in addition to Dockerfile

View the Description View the linked PRs

Description of problem:

A Github project with a Containerfile instead of a Dockerfile is not seen as a Buildah target, and the wizard falls through to templating as a standard (language) project.

Version-Release number of selected component (if applicable):


Server Version: 4.13.18
Kubernetes Version: v1.26.9+c7606e7

How reproducible:

Always

Steps to Reproduce:

1. Create a git application with Containerfile, e.g. https://github.com/cwilkers/jumble-c
2. Use the Developer view to add the app as a git repo
3. Observe failure as project is not built properly due to ignoring Containerfile

Actual results:

Build failure

Expected results:

Buildah includes Containerfile which includes html and other resources required for app

Additional info:

https://github.com/cwilkers/jumble-c

https://github.com/openshift/console/pull/13378

Bug OCPBUGS-23079: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28388

Bug OCPBUGS-29654: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug HYPBLD-99: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2997

Bug OCPBUGS-31105: [4.15] Clicking on quickstart tile has no response and console window refreshes every time a card is clicked in operator hub

View the Description View the linked PRs

Description of problem:

There is no response when user clicks on quickstart items.

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-02-26-013420
browser 122.0.6261.69 (Official Build) (64-bit)

How reproducible:

always

Steps to Reproduce:

    1.Go to quick starts page by clicking "View all quick starts" on Home -> Overview page.
    2. Click on any quickstart item to check its steps.
    3.

Actual results:

2. There is no response.

Expected results:

2. Should open quickstart sidepage for installation instructions.

Additional info:

The issue doesn't exist on firefox 123.0 (64-bit)

https://github.com/openshift/console/pull/13686

Bug OCPBUGS-36812: Catalog operator pod crashed during SNO cluster installation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36137~~. The following is the description of the original issue:
—
Description of problem:

    A customer is deploying SNO with lvms-operator being installed during cluster installation using assisted-service. One of the deployment failed with catalog-operator pod crashlooping.

NAME                                      READY   STATUS             RESTARTS   AGE
catalog-operator-db9dff494-pqb68          0/1     CrashLoopBackOff   56         4h

The pod logs show a panic.

$ oc logs catalog-operator-db9dff494-pqb68 -n openshift-operator-lifecycle-manager2024-05-16T13:24:46.709156999Z time="2024-05-16T13:24:46Z" level=info msg="log level info"2024-05-16T13:24:46.709232085Z time="2024-05-16T13:24:46Z" level=info msg="TLS keys set, using https for metrics"2024-05-16T13:24:46.709736948Z W0516 13:24:46.709618       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.2024-05-16T13:24:46.709855179Z time="2024-05-16T13:24:46Z" level=info msg="Using in-cluster kube client config"2024-05-16T13:24:46.710165923Z time="2024-05-16T13:24:46Z" level=info msg="Using in-cluster kube client config"2024-05-16T13:24:46.710274657Z W0516 13:24:46.710268       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.2024-05-16T13:24:46.711960302Z W0516 13:24:46.711831       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="connection established. cluster-version: v1.27.12+7bee54d"2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="operator ready"2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="starting informers..."2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="informers started"2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="waiting for caches to sync..."2024-05-16T13:24:46.921220918Z time="2024-05-16T13:24:46Z" level=info msg="starting workers..."2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="connection established. cluster-version: v1.27.12+7bee54d"2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="operator ready"2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="starting informers..."2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="informers started"2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="waiting for caches to sync..."2024-05-16T13:24:46.922300604Z time="2024-05-16T13:24:46Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=2024-05-16T13:24:47.022696884Z time="2024-05-16T13:24:47Z" level=info msg="starting workers..."2024-05-16T13:24:59.544398366Z panic: runtime error: invalid memory address or nil pointer dereference2024-05-16T13:24:59.544398366Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1d761e6]2024-05-16T13:24:59.544398366Z 2024-05-16T13:24:59.544398366Z goroutine 469 [running]:2024-05-16T13:24:59.544398366Z github.com/operator-framework/operator-lifecycle-manager/pkg/controller/bundle.sortUnpackJobs.func1(0xc002bdca20?, 0x0?)2024-05-16T13:24:59.544398366Z     /build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go:844 +0xc62024-05-16T13:24:59.544398366Z sort.insertionSort_func({0xc002b7cfb0?, 0xc0029fffe0?}, 0x0, 0x3)2024-05-16T13:24:59.544398366Z     /usr/lib/golang/src/sort/zsortfunc.go:12 +0xb12024-05-16T13:24:59.544398366Z sort.pdqsort_func({0xc002b7cfb0?, 0xc0029fffe0?}, 0x7f07987eab38?, 0x18?, 0xc001e80000?)2024-05-16T13:24:59.544398366Z     /usr/lib/golang/src/sort/zsortfunc.go:73 +0x2dd

Version-Release number of selected component (if applicable):

    4.14.22

How reproducible:

    Only sometimes

Steps to Reproduce:

    1. SNO cluster deployment using assisted service
    2. Provide lvms-operator sub, operatorgroup and namespace yamls during installation
    3. The pod crashed once the node booted after ignition

Actual results:

Pod crashed with panic

Expected results:

The pod should be running

Additional info:

https://github.com/openshift/operator-framework-olm/pull/817

Bug OCPBUGS-37194: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2413

Story HOSTEDCP-1488: Use AWS Regionalized STS Endpoints

View the Description View the linked PRs

User Story:

As a ROSA customer, I want to enforce that my workloads follow AWS best-practices by using AWS Regionalized STS Endpoints instead of the global one.

As Red Hat, I would like to follow AWS best-practices by using AWS Regionalized STS Endpoints instead of the global one.

Per AWS docs:

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html

AWS recommends using Regional AWS STS endpoints instead of the global endpoint to reduce latency, build in redundancy, and increase session token validity.

https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html

All new SDK major versions releasing after July 2022 will default to regional. New SDK major versions might remove this setting and use regional behavior. To reduce future impact regarding this change, we recommend you start using regional in your application when possible.

Acceptance Criteria:

Areas where HyperShift creates STS credentials use regionalized STS endpoints, e.g. https://github.com/openshift/hypershift/blob/ae1caa00ff3a2c2bfc1129f0168efc1e786d1d12/control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go#L1225-L1228

Engineering Details:

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html
https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html
This was done in ROSA Classic via ~~SDA-7313~~

https://github.com/openshift/hypershift/pull/3747

Bug OCPBUGS-29336: snapshot-controller logs report failure frequently (4.15)

View the Description View the linked PRs

Description of problem:

Without this fix, our e2e tests is having to be retested many times over and still doesn't guarantee success, when csi driver hits this issue, it doesn't seem to get out of it easily.

There is no impact on functionality. But there are too many of these errors.

snapshot controller failed to update ...xxxxx
the object has been modified; please apply your changes to the latest version and try again

Version-Release number of selected component (if applicable):

How reproducible:

almost every PR in oadp-operator have one or more of these "flake" errors that prevents our e2e from succeeding forcing a retest.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Please accept cherrypicks in
https://github.com/openshift/csi-external-snapshotter/pull/140
https://github.com/openshift/csi-external-snapshotter/pull/141
https://github.com/openshift/csi-external-snapshotter/pull/142
https://github.com/openshift/csi-external-snapshotter/pull/143
to help us deal with many flakes in our #forum-oadp e2e from CSI drivers failing to remove snapshots annotations

clones https://github.com/kubernetes-csi/external-snapshotter/issues/748

Please backport this to all OCP versions that OpenShift API for Data Protection is supported and tested on, currently 4.12+

slack: https://redhat-internal.slack.com/archives/CBQHQFU0N/p1707342685875549

https://github.com/openshift/csi-external-snapshotter/pull/143

Bug OCPBUGS-14322: Excessive permissions in web-console impersonating a user

View the Description View the linked PRs

Description of problem:

Excessive permissions in web-console impersonating a user

Version-Release number of selected component (if applicable):

4.10.55

How reproducible:

 when trying to impersonate a specific user ('99GU8710') in an OCP 4.10.55 cluster, we are able to see pods and logs in web console and that user is unable to access these things using the command line.

Steps to Reproduce:

1. Create a user with LDAP (example: new_user)
2. Don't give user access to check pod logs for openhshift related namespaces ( For example: new_user should not be able to see pod logs for openhsift-apiserver)
3. Try to impersonate the user (new_user)
4. Try to check openshift-apiserver pod logs through command line( you will be able to see those)
5. Try to check the same logs from command line for new_user , you won't be able to see it.

Actual results:

`Impersonate the user` feature doesn't give correct validation

Expected results:

We should not be able to see pod logs if user does not have permission

Additional info:

https://github.com/openshift/console/pull/13196

Bug OCPBUGS-22018: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-dns-operator/pull/387

Bug OCPBUGS-27060: [ OCP 4.15] IPXE connection timed out

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22699~~. The following is the description of the original issue:
—
Description of problem:

New deployment of BM IPI using provisioning network with IPV6 is showing:

http://XXXX:XXXX:XXXX:XXXX::X:6180/images/ironic-python-agernt.kernel....
connection timed out (http://ipxe.org/4c0a6092)" error

Version-Release number of selected component (if applicable):

Openshift 4.12.32
Also seen in Openshift 4.14.0-rc.5 when adding new nodes

How reproducible:

Very frequent

Steps to Reproduce:

1. Deploy cluster using BM with provided config
2.
3.

Actual results:

Consistent failures depending of the version of OCP used to deploy

Expected results:

No error, successful deployment

Additional info:

Things checked while the bootstrap host is active and the installation information is still valid (and failing):
- tried downloading the "ironic-python-agent.kernel" file from different places (bootstrap, bastion hosts, another provisioned host) and in all cases it worked:
[core@control-1-ru2 ~]$ curl -6 -v -o ironic-python-agent.kernel http://[XXXX:XXXX:XXXX:XXXX::X]:80/images/ironic-python-agent.kernel
\*   Trying XXXX:XXXX:XXXX:XXXX::X...
\* TCP_NODELAY set
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to XXXX:XXXX:XXXX:XXXX::X (xxxx:xxxx:xxxx:xxxx::x) port 80   #0)
> GET /images/ironic-python-agent.kernel HTTP/1.1
> Host: [xxxx:xxxx:xxxx:xxxx::x]
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 27 Oct 2023 08:28:09 GMT
< Server: Apache
< Last-Modified: Thu, 26 Oct 2023 08:42:16 GMT
< ETag: "a29d70-6089a8c91c494"
< Accept-Ranges: bytes
< Content-Length: 10657136
<
{ [14084 bytes data]
100 10.1M  100 10.1M    0     0   597M      0 --:--:-- --:--:-- --:--:--  597M
\* Connection #0 to host xxxx:xxxx:xxxx:xxxx::x left intact

This verifies some of the components like the network setup and the httpd service running on ironic pods.

- Also gathered listing of the contents of the ironic pod running in podman, specially in the shared directory. The contents of /shared/html/inspector.ipxe seems correct compared to a working installation, also all files look in place.

- Logs from the ironic container shows the errors coming from the node being deployed, we also show here the curl log to compare:

xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:19:55 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:19:55 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx::x - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 200 10657136 "-" "curl/7.61.1"
cxxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"

Seems like an issue with iPXE and IPV6

https://github.com/openshift/ironic-image/pull/447

Bug OCPBUGS-28232: [release-4.15] Count mismatch in Image vunerabilities reported in the Openshift Console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27455~~. The following is the description of the original issue:
—
Problem Description:

Installed the Red Hat Quay Container Security Operator on the 4.13.25 cluster .

Below are my test results :

```

sasakshi@sasakshi ~]$ oc version
Client Version: 4.12.7
Kustomize Version: v4.5.7
Server Version: 4.13.25
Kubernetes Version: v1.26.9+aa37255

[sasakshi@sasakshi ~]$ oc get csv -A | grep -i "quay" | tail -1
openshift container-security-operator.v3.10.2 Red Hat Quay Container Security Operator 3.10.2 container-security-operator.v3.10.1 Succeeded

[sasakshi@sasakshi ~]$ oc get subs -A
NAMESPACE NAME PACKAGE SOURCE CHANNEL
openshift-operators container-security-operator container-security-operator redhat-operators stable-3.10

[sasakshi@sasakshi ~]$ oc get imagemanifestvuln -A | wc -l
82
[sasakshi@sasakshi ~]$ oc get vuln --all-namespaces | wc -l
82

Console -> Administration -> Image Vulnerabitlites : 82

Home -> Overiview -> Status -> Image Vulnerabitlites : 66
```

Observations from My testing :

`oc get vuln --all-namespaces` reports the same count as `oc get imagemanifestvuln -A`

Difference in the count is reported in the following
```
Console -> Administration -> Image Vulnerabitlites : 82
Home -> Overiview -> Status -> Image Vulnerabitlites : 66
```
Why there is a difference in reporting of the above two entries?

Kindly refer to the attached screenshots for reference .

Documentation link referred:

https://docs.openshift.com/container-platform/4.14/security/pod-vulnerability-scan.html#security-pod-scan-query-cli_pod-vulnerability-scan

https://github.com/openshift/console/pull/13545

Bug OCPBUGS-26476: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/35

Bug OCPBUGS-27431: Dynamic Plugin Template: Unable to use React Developer Tools when running console as an image

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-10851~~. The following is the description of the original issue:
—
Currently, the plugin template gives you instructions for running the console using a container image, which is a lightweight to do development and avoids the need to build the console source code from scratch. The image we reference uses a production version of React, however. This means that you aren't able to use the React browser plugin to debug your application.

We should look at alternatives that allow you to use React Developer Tools. Perhaps we can publish a different image that uses a development build. Or at least we need to better document building console locally instead of using an image to allow development builds.

https://github.com/openshift/console/pull/13526

Bug OCPBUGS-33623: [4.15] e2e-vsphere-ovn-serial - alert/OVNKubernetesResourceRetryFailure should not be at or above info

View the Description View the linked PRs

Component Readiness has found a potential regression in [bz-networking][invariant] alert/OVNKubernetesResourceRetryFailure should not be at or above info.

Probability of significant regression: 96.30%

Sample (being evaluated) Release: 4.16
Start Time: 2024-04-29T00:00:00Z
End Time: 2024-05-06T23:59:59Z
Success Rate: 72.73%
Successes: 32
Failures: 12
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-05-06T23:59:59Z
Success Rate: 85.20%
Successes: 236
Failures: 41
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2024-05-06%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Alerts&component=Networking%20%2F%20cluster-network-operator&confidence=95&environment=ovn%20no-upgrade%20amd64%20vsphere%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=vsphere&sampleEndTime=2024-05-06%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-04-29%2000%3A00%3A00&testId=openshift-tests%3Ab3997eeabb330f3000872f22d6ddb618&testName=%5Bbz-networking%5D%5Binvariant%5D%20alert%2FOVNKubernetesResourceRetryFailure%20should%20not%20be%20at%20or%20above%20info&upgrade=no-upgrade&variant=serial

https://github.com/openshift/ovn-kubernetes/pull/2168

Bug MGMT-16151: [STG][Scale] Failed to download installation logs form cluster with 103 nodes

View the Description View the linked PRs

Cluster with 3 masters and 100 workers installed succefully,
Attempt to download installation logs failed - nothing happened
Error raised in Debugger console:

Access to XMLHttpRequest at 
'https://api.stage.openshift.com/api/assisted-install/v2/clusters/c7d60db0-2997-4380-813d-b504134e9920/downloads/files-presigned?file_name=logs&logs_type=all' 
from origin 'https://qaprodauth.console.redhat.com' has been blocked by CORS policy: 
No 'Access-Control-Allow-Origin' header is present on the requested resource.

src_bootstrap_tsx-src_moduleOverrides_unfetch_ts-webpack_sharing_consume_default_patternfly_r-31174d.fcbb79a89748b2f6.js:22320     
GET https://api.stage.openshift.com/api/assisted-install/v2/clusters/c7d60db0-2997-4380-813d-b504134e9920/downloads/files-presigned?file_name=logs&logs_type=all 
net::ERR_FAILED 504 (Gateway Timeout)

It happened on browsers:
Chrome 117.0.5938.92
Firefox 117.0.1 (64-bit)

See attached screenshots and logs from Assisted Service pod

I can successfully download installation logs from other clusters using the same browsers.

Steps to reproduce:
1. Install cluster with 103 nodes
2. Try download installation logs

Actual results:
Nothing happened and error raised

Expected results:
Should download installation logs

Bug OCPBUGS-25486: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1999

Bug OCPBUGS-34795: [4.15.z] SCC pinning for all workloads in platform namespaces

View the Description View the linked PRs

Backport of AUTH-482

https://github.com/openshift/cluster-authentication-operator/pull/675

Bug OCPBUGS-47799: [4.15z] Nodes to Node and subsequently pod to pod communication are repeatedly degrading despite multiple OVN DB rebuilds to fix the issue on cluster not using ipsec

View the Description View the linked PRs

This is a clone of issue OCPBUGS-47634. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-47506~~. The following is the description of the original issue:
—
Description of problem:
Bare Metal UPI cluster

Nodes lose communication with other nodes and this affects the pod communication on these nodes as well. This issue can be fixed with an OVN rebuild on the nodes db that are hitting the issue but eventually the nodes will degrade again and lose communication again. Note despite an OVN Rebuild fixing the issue temporarily Host Networking is set to True so it's using the kernel routing table. This cluster does not use ipsec
Version-Release number of selected component (if applicable):
4.14.7, 4.14.30
How reproducible:
Can't reproduce locally but reproducible and repeatedly occurring in customer environment
Steps to Reproduce:

identify a host node who's pods can't be reached from other hosts in default namespaces ( tested via openshift-dns). observe curls to that peer pod consistently timeout. TCPdumps to target pod observe that packets are arriving and are acknowledged, but never route back to the client pod successfully. (SYN/ACK seen at pod network layer, not at geneve; so dropped before hitting geneve tunnel).

Actual results:
Nodes will repeatedly degrade and lose communication despite fixing the issue with a ovn db rebuild (db rebuild only provides days of respite, no permanent resolve).

Expected results:
Nodes should not be losing communication and even if they did it should not happen repeatedly
Additional info:
What's been tried so far
========================

Multiple OVN rebuilds on different nodes (works but node will eventually hit issue again)

Flushing the conntrack (Doesn't work)

Restarting nodes (doesn't work)

Data gathered
=============

Tcpdump from all interfaces for dns-pods going to port 7777 (to segregate traffic)

ovnkube-trace

SOSreports of two nodes having communication issues before an OVN rebuild

SOSreports of two nodes having communication issues after an OVN rebuild

OVS trace dumps of br-int and br-ex
Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD <--------------------------
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
For guidance on using this template please see
OCPBUGS Template Training for Networking components

https://github.com/openshift/ovn-kubernetes/pull/2407

Bug OCPBUGS-23912: add missing vulnerabilities column and Signed icon in PAC repository PLR list

View the Description View the linked PRs

Description of problem:

add missing vulnerabilities column and Signed icon in PAC repository PLR list. Same as what we have in PipelineRuns list page

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13364

Bug OCPBUGS-24812: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13433

Bug OCPBUGS-6515: [CI Watcher] ConsoleExternalLogLink CRD creates, displays, modifies, and deletes a new ConsoleExternalLogLink - CypressError: Timed out

View the Description View the linked PRs

Description of problem:

ConsoleExternalLogLink CRD.ConsoleExternalLogLink CRD creates, displays, modifies, and deletes a new ConsoleExternalLogLink instance
AssertionError: Timed out retrying after 30000ms: Expected to find element: `[data-test-id=test-nubya-cell]`, but never found it.

https://search.ci.openshift.org/?search=creates%2C+displays%2C+modifies%2C+and+deletes+a+new+ConsoleExternalLogLink+instance&maxAge=168h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/console/pull/13110

Bug MGMT-15598: System generated manifests are not gathered by assisted-test-infra

View the Description View the linked PRs

Description of the problem:
When gathering manifests for a cluster from assisted-installer using assisted-test-infra any 'system generated' manifests are not listed.

How reproducible:
Look at any triage ticket that has recently been created, you will notice that the `system-generated` manifests are missing.

Actual results:
Only user-generated manifests are shown by assisted-test-infra

Expected results:
System generated manifests as well as user generated manifests should be listed by assisted-test-infra

https://github.com/openshift/assisted-service/pull/5498

Bug OCPBUGS-19086: Give instruction to install nmstate package in error message

View the Description View the linked PRs

Description of problem:

If nmstatectl is not present, print "install nmstate" in error message

Version-Release number of selected component (if applicable):

4.13

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

FATAL   * failed to validate network yaml for host 0, failed to execute 'nmstatectl gc', error: exec: "nmstatectl": executable file not found in $PATH

Expected results:

FATAL   * failed to validate network yaml for host 0, install nmstate package, exec: "nmstatectl": executable file not found in $PATH

Additional info:

https://github.com/openshift/installer/pull/7492

Bug OCPBUGS-23555: OAuthClient 'openshift-cli-client' is missing for HyperShift Guest Clusters causing `oc login --web` fails

View the Description View the linked PRs

Description of problem:

The oc login --web command fails when used with a Hypershift Guest Cluster. The web console returns an error message stating that the client is unauthorized to request a token using this method.
Error Message:
{  "error": "unauthorized_client",  
"error_description": "The client is not authorized to request a token using this method."
}

OCP does not have such issue.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-11-21-212406
4.14
4.15

How reproducible:

always

Steps to Reproduce:

1.Install a Hypershift Guest Cluster.
2. Configure the Any OpenID Identity Provider for the Hypershift Guest Cluster eg. https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-62511
3. Execute the oc login --web $URL command.

4. After adding openshift-cli-client manually it's works
# cat oauth.yaml
apiVersion: oauth.openshift.io/v1
grantMethod: auto
kind: OAuthClient
metadata:
  name: openshift-cli-client
redirectURIs:
- http://127.0.0.1/callback,http://[::1]/callback
respondWithChallenges: false

# oc create -f oauth.yaml
oauthclient.oauth.openshift.io/openshift-cli-client created

$ oc login --web $URL
Opening login URL in the default browser: https://oauth-clusters-hypershift-ci-28276.apps.xxxxxxxxxxxxxxxx.com:443/oauth/authorize?client_id=openshift-cli-client&code_challenge=mixnB73nR_yzL58e0lEd4soQH1sn0GjvWEfnX4PNrCg&code_challenge_method=S256&redirect_uri=http%3A%2F%2F127.0.0.1%3A45055%2Fcallback&response_type=code
Login successful.

Actual results:

Step 3: The web login process fails and redirects to an error page displaying the error message "error_description": "The client is not authorized to request a token using this method."

Expected results:

OAuthClient 'openshift-cli-client' should not be missing for HyperShift Guest Clusters so that the oc login --web $URL command should work without any issues. As OCP 4.13+ has the OAuthClient 'openshift-cli-client' by default.

Additional info:

The issue can be tracked at the following URL: https://issues.redhat.com/browse/AUTH-444

Root Cause :
Default 'openshift-cli-client' OAuthClient should not be missing for HyperShift Guest Clusters.

Bug OCPBUGS-30916: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13672

Bug OCPBUGS-31947: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/319

Story API-1674: Add ownership annotations to new and existing olm-managed secrets

View the Description View the linked PRs

OLM creates certs and secrets for operators that it installs. Those secrets need to have ownership annotations.

https://github.com/openshift/operator-framework-olm/pull/626

Bug OCPBUGS-31471: HyperShift: Minimize container ephemeral storage usage when auditing is enabled

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31106~~. The following is the description of the original issue:
—
Description of problem:

    HyperShift control plane pods that support auditing (i.e. Kubernetes API server, OpenShift API server, and OpenShift oauth API server) maintain auditing log files that may consume many GB of container ephemeral storage in short period of time.

We need to reduce the size of logs in these containers by modifying audit-log-maxbackup and audit-log-maxsize. This should not change the functionality of the audit logs since all we do is output to stdout in the containerd logs.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3816

Bug OCPBUGS-41373: OpenID IDP endpoint verification fails when hostname can only be resolved by data plane

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41372~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-41371~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38349. The following is the description of the original issue:
—
Description of problem:

When using configuring an OpenID idp that can only be accessed via the data plane, if the hostname of the provider can only be resolved by the data plane, reconciliation of the idp fails.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    always

Steps to Reproduce:

    1. Configure an OpenID idp on a HostedCluster with a URL that points to a service in the dataplane (like https://keycloak.keycloak.svc)

Actual results:

    The oauth server fails to be reconciled

Expected results:

    The oauth server reconciles and functions properly

Additional info:

    Follow up to OCPBUGS-37753

https://github.com/openshift/hypershift/pull/4746

Bug OCPBUGS-30971: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4264

Bug OCPBUGS-32340: Increase Max node limit for topology page to 200

View the Description View the linked PRs

Description of problem:

    Increase MAX_NODES_LIMIT to 300 for 4.16 and 200 for 4.15 so that users don't see alert "Loading is taking longer than expected" in topology page

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1. Create more than 100 nodes in a namespace

Additional info:

https://github.com/openshift/console/pull/13767

Story TRT-1339: Ignore openshift-dns TopologyAwareHintsDisabled when nodes tainted in serial jobs

View the Description View the linked PRs

Payload https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-10-26-222533 failed on no successful runs of techpreview-serial. Looks like all failed on:

[sig-arch] events should not repeat pathologically for ns/openshift-dns

{  2 events happened too frequently

event happened 22 times, something is wrong: ns/openshift-dns service/dns-default hmsg/6f6ed749fd - pathological/true reason/TopologyAwareHintsDisabled Unable to allocate minimum required endpoints to each zone without exceeding overload threshold (4 endpoints, 2 zones), addressType: IPv4 From: 23:11:05Z To: 23:11:06Z result=reject 
event happened 23 times, something is wrong: ns/openshift-dns service/dns-default hmsg/6f6ed749fd - pathological/true reason/TopologyAwareHintsDisabled Unable to allocate minimum required endpoints to each zone without exceeding overload threshold (4 endpoints, 2 zones), addressType: IPv4 From: 23:11:06Z To: 23:11:07Z result=reject }

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-aws-sdn-techpreview-serial/1717669829478977536

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-sdn-techpreview-serial/1717669755634061312

https://github.com/openshift/origin/pull/28381

Bug OCPBUGS-19111: Update 4.15 ose-alibaba-disk-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/61

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/61

Bug OCPBUGS-23764: After PatternFly5 update: Form/YAML switchers are missaligned

View the Description View the linked PRs

Issue 29 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Form View and Yaml view switches are aligned horizontally before, now it is vertical

This happens at least on

Helm form

Screenshot: https://drive.google.com/file/d/1nzFHCeorlVIMbwlnjzEc1fCW0GXQa1KT/view

https://github.com/openshift/console/pull/13380

Bug OCPBUGS-24027: forbidden access to resource on shared-resource-csi-driver-operator

View the Description View the linked PRs

e2e-aws-serial-techpreview lane under openshift/api is falling:
shared-resource-csi-driver-operator fails with:

failed to list *v1.APIServer: apiservers.config.openshift.io is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:shared-resource-csi-driver-operator" cannot list resource "apiservers" in API group "config.openshift.io" at the cluster scope

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_api/[…]f-rzg8q_shared-resource-csi-driver-operator.log

https://github.com/openshift/cluster-storage-operator/pull/425

Bug OCPBUGS-24087: Update 4.15 cluster-etcd-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-etcd-operator/pull/1169

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-etcd-operator/pull/1169

Bug OCPBUGS-25367: OLM pod panics when EnsureSecretOwnershipAnnotations runs

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/operator-framework-olm/pull/636

Bug OCPBUGS-23388: Pipeline Name gets changed to "new-pipeline" on the Edit Pipeline YAML/Builder

View the Description View the linked PRs

Description of problem:

Pipeline Name gets changed to "new-pipeline" on the Edit Pipeline YAML/Builder

Version-Release number of selected component (if applicable):

Openshift 4.15
Pipelines Operator: 1.12.1

How reproducible:

Always when you are creating the tasks using YAML and then creating Pipeline with the tasks. 
(NOT OBSERVED WHEN USING THE PIPELINE BUILDER)

Steps to Reproduce:

1. Create Task 1: https://tekton.dev/docs/getting-started/tasks/#create-and-run-a-basic-task
2. Create Task 2: https://tekton.dev/docs/getting-started/pipelines/#create-and-run-a-second-task
3. Create Pipeline: https://tekton.dev/docs/getting-started/pipelines/#create-and-run-a-pipeline
4. Click "Edit Pipeline" from the Actions Menu

Actual results:

Pipeline Name gets changed to "new-pipeline" on the Edit Pipeline YAML/Builder, and cannot update the Pipeline.

Expected results:

The pipeline name shouldnot change.

Additional info:

Video : https://drive.google.com/file/d/19-dI8lSdH6tAZm3T8CQHw78P2AzdSIRv/view?usp=sharing

https://github.com/openshift/console/pull/13344

Bug OCPBUGS-31064: ibmcloud KMS: enable KMS v2

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30968~~. The following is the description of the original issue:
—
Description of problem:

   Enable KMS v2 in the ibmcloud KMS provider

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3774

Bug OCPBUGS-19817: The traffic between worker node and external host got broken after delete ipsec-host pods

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2087

Bug OCPBUGS-26209: PipelineRuns details page get active on Task selection on logs page and logs page get empty on logs tab selection

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25898~~. The following is the description of the original issue:
—
Description of problem:

PipelineRun logs page navigation is broken on navigate through the task on the PiplineRun log tab.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Navigate to PipelineRuns details page and select the Logs tab.
    2. Navigate through the tasks of the PipelineRun tasks

Actual results:

- Details tab gets active on selection of any task
- Logs page gets empty on seldction of Logs tab again
- Last task is not selected for completed PipelineRuns

Expected results:

- Logs tab should be active when user is not the Logs tab
- Last task should be selected in case of the completed PipelineRuns

Additional info:

  It is a regression after change in logic of tab selection in HorizontalNav component.

https://github.com/openshift/console/pull/13216/files#diff-267d61f330ad6cd9b0f2d743d9ff27929fbe7001780d73e1ec88599d3778eb96R177-R190

Video- https://drive.google.com/file/d/15fx9GWO2dRh4uaibRmZ4VTk4HFxQ7NId/view?usp=sharing

https://github.com/openshift/console/pull/13484

Bug OCPBUGS-26223: PKI Operator Starts Even When Hosted Cluster Is Annoated To Turn Off PKI

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26197~~. The following is the description of the original issue:
—
Description of problem:

    pki operator runs even when annotation to turn off PKI is on the hosted control plane

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3376

Bug OCPBUGS-32187: [console-plugin]Multiple Output tab is present if Pipeline console-plugin is enabled

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32117~~. The following is the description of the original issue:
—
Description of problem:

    Multiple Output tabs is present on PipelineRun details page if dynamic Pipeline console-plugin is enabled.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13756

Bug OCPBUGS-39041: HCP CCMs attempt direct internet access with proxied management cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37936~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-37832~~. The following is the description of the original issue:
—
CCMs attempt direct connections when the mgmt cluster on which the HCP runs is proxied and does not allow direction outbound connections.

Example from the AWS CCM

 I0731 21:46:33.948466       1 event.go:389] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: error listing AWS instances: \"WebIdentityErr: failed to retrieve credentials\\ncaused by: RequestError: send request failed\\ncaused by: Post \\\"https://sts.us-east-1.amazonaws.com/\\\": dial tcp 72.21.206.96:443: i/o timeout\""

https://github.com/openshift/hypershift/pull/4624

Bug OCPBUGS-14053: Critical Alert Rules do not have runbook url

View the Description View the linked PRs

Description of problem:

Critical Alert Rules do not have runbook url

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

This bug is being raised by Openshift Monitoring team as part of effort to detect invalid Alert Rules in OCP.

1.  Check details of MultipleDefaultStorageClasses Alert Rule
2.
3.

Actual results:

The Alert Rule MultipleDefaultStorageClasses has Critical Severity, but does not have runbook_url annotation.

Expected results:

All Critical Alert Rules must have runbbok_url annotation

Additional info:

Critical Alerts must have a runbook, please refer to style guide at https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide 

The runbooks are located at github.com/openshift/runbooks

To resolve the bug, 
- Add runbooks for the relevant Alerts at github.com/openshift/runbooks
- Add the link to the runbook in the Alert annotation 'runbook_url'
- Remove the exception in the origin test, added in PR https://github.com/openshift/origin/pull/27933

https://github.com/openshift/origin/pull/28042

Bug OCPBUGS-30854: Power VS: DHCP service is not dependent on wait_for_workspace

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30836~~. The following is the description of the original issue:
—
Description of problem:

    When deploying a cluster on Power VS, you need to wait for a short period after the workspace is created to facilitate the network configuration. This period is ignored by the DHCP service.

Version-Release number of selected component (if applicable):

How reproducible:

    Easily

Steps to Reproduce:

    1. Deploy a cluster on Power VS with an installer provisioned workspace
    2. Observe that the terraform logs ignore the waiting period

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8159

Bug OCPBUGS-37017: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/823

Bug OCPBUGS-36779: 3rd master still not joining to the cluster on ABI

View the Description View the linked PRs

Previously, in ~~OCPBUGS-32105~~, we fixed a bug where a race between the assisted-installer and the assisted-installer-controller to mark a Node as Joined would result in 30+ minutes of (unlogged) retries by the former if the latter won. This was indistinguishable from the installation process hanging and it would eventually timed out.

This bug has been fixed, but we were unable to reproduce the circumstances that caused it.

However, a reproduction by the customer reveals another problem: we now correctly retry checking the control plane nodes for readiness if we encounter a conflict with another write from assisted-installer-controller. However, we never reload fresh data from assisted-service - data that would show the host has already been updated and thus prevent us from trying to update it again. Therefore, we continue to get a conflict on every retry. (This is at least now logged, so we can see what is happening.)

This also suggests a potential way to reproduce the problem: whenever one control plane node has booted to the point that the assisted-installer-controller is running before the second control plane node has booted to the point that the Node is marked as ready in the k8s API, there is a possibility of a race. There is in fact no need for the write from assisted-installer-controller to come in the narrow window between when assisted-installer reads vs. writes to the assisted-service API, because assisted-installer is always using a stale read.

https://github.com/openshift/assisted-installer/pull/891

Bug OCPBUGS-17035: The expected minimal permissions to access tenancy port of thanos-querier service do not work

View the Description View the linked PRs

Description of problem:

The expected minimal permissions to access the tenancy port on the thanos-querier service in the openshift-monitoring namespace are not working, and instead of them are working different permissions. And different permissions are needed for GET requests and different for POST requests.

I am trying to use the tenancy port on the thanos-querier service in the openshift-monitoring namespace. I want a pod to access these metrics and thus I want to only add the minimal necessary permissions to that pod. From Slack discussions and the configuration for the thanos-querier (https://github.com/openshift/cluster-monitoring-operator/blob/release-4.11/assets/thanos-querier/kube-rbac-proxy-secret.yaml) one would expect that the needed permissions are:

```
rules:
  - verbs:
      - get
    apiGroups:
      - metrics.k8s.io/v1beta1
    resources:
      - pods
```

However, when binding such a role to a service account (and waiting a little bit for the update to propagate across the system), I get an error from inside its container:

```
sh-5.1$ curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt      -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"    'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?namespace=app'

Forbidden (user=system:serviceaccount:app:default, verb=get, resource=pods, subresource=)
```

The error messages suggests that the service account doesn't have the permissions needed. Changing the role's rules and waiting a little bit for the update to propagate across the system seems to fix this. Note the different `apiGroups`:

```
rules:
  - verbs:
      - get
    apiGroups:
      - ''
    resources:
      - pods
```

This results in successfully connecting to the tenancy port:

```
sh-5.1$ curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt      -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"    'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?namespace=app&query=up'

{"status":"success","data":{"resultType":"vector","result":[]}}
```

A similar issue also affects POST requests to the tenancy port. It would be expected that the minimal needed permissions are the same when making GET or POST requests. However, this is not the case. GET requests demand the verb `get` and POST request demand the verb `create`.

When using a service account with a Role having rules as:

```
rules:
  - verbs:
      - get
    apiGroups:
      - ''
    resources:
      - pods
```

I am getting this error for POST. (Note the used flags -X GET/-X POST and the verb in the error output).

```
sh-4.4$ curl --cacert /etc/kubernetes/certs/service-ca/service-ca.crt  -X GET    -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"    'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?namespace=clusters-dhurta-test-aws'
sh-4.4$
sh-4.4$ curl --cacert /etc/kubernetes/certs/service-ca/service-ca.crt  -X POST    -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"    'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?namespace=clusters-dhurta-test-aws'
Forbidden (user=system:serviceaccount:clusters-dhurta-test-aws:cluster-version-operator, verb=create, resource=pods, subresource=)
```

Changing the rules to:

```
rules:
  - verbs:
      - get
      - create
    apiGroups:
      - ''
    resources:
      - pods
```

Seems to fix the issues for POST.

```
sh-4.4$ curl --cacert /etc/kubernetes/certs/service-ca/service-ca.crt  -X GET    -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"    'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?namespace=clusters-dhurta-test-aws'
sh-4.4$ curl --cacert /etc/kubernetes/certs/service-ca/service-ca.crt  -X POST    -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"    'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?namespace=clusters-dhurta-test-aws'
```

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

2/2

Steps to Reproduce:

1. Use the Cluster Bot to launch a 4.14 nightly cluster (`launch 4.14 aws`)
2. Create a dummy namespace and launch an application inside the namespace.
3. Create a role in the namespace with the rules set to:
```
rules:
  - verbs:
      - get
    apiGroups:
      - metrics.k8s.io/v1beta1
    resources:
      - pods
```
4. Create a role binding and bind the role to the app's service account
5. Access the terminal inside the app's container
6. Access the tenancy port of thanos-querier using a POST request
```
curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt -X POST   -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"    'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?namespace=$NAMESPACE'
```

Actual results:

When running the `curl` command the output is:

Forbidden (user=system:serviceaccount:app:default, verb=create, resource=pods, subresource=)

Expected results:

Successfully connecting and receiving specified metrics.

For example:
{"status":"success","data":{"resultType":"vector","result":[]}}

Additional info:

I wasn't sure whether to mark this bug as a security related issue. I am marking this bug `Security Level: Red Hat Employee` because the bug is regarding the authorization to access user workload metrics.

https://github.com/openshift/cluster-monitoring-operator/pull/2057

Bug OCPBUGS-22629: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-powervs/pull/58

Bug OCPBUGS-31326: Priority Class override for ignition-server deployment was accidentally ripped out when a new reconcileProxyDeployment() func was introduced.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31118~~. The following is the description of the original issue:
—
Description of problem:

    Priority Class override for ignition-server deployment was accidentally ripped out when a new reconcileProxyDeployment() func was introduced.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1.Create a cluster with priority class override opted in
    2.Override priority class in HC
    3.Check ignition server deployment priority class

Actual results:

doesn't override priority class

Expected results:

overridden priority class

Additional info:

https://github.com/openshift/hypershift/pull/3800

Bug OCPBUGS-31426: [External OIDC] console pods crashing when issuerCertificateAuthority is set due to the CA configmap is not propagated to openshift-config namespace

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31319~~. The following is the description of the original issue:
—

$ oc logs --previous --timestamps -n openshift-console console-64df9b5bcb-8h8xk
2024-03-22T11:17:07.824396015Z I0322 11:17:07.824332       1 main.go:210] The following console plugins are enabled:
2024-03-22T11:17:07.824574844Z I0322 11:17:07.824558       1 main.go:212]  - monitoring-plugin
2024-03-22T11:17:07.824613918Z W0322 11:17:07.824603       1 authoptions.go:99] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
2024-03-22T11:22:07.828873678Z I0322 11:22:07.828819       1 main.go:634] Binding to [::]:8443...
2024-03-22T11:22:07.828982852Z I0322 11:22:07.828967       1 main.go:636] using TLS
2024-03-22T11:22:07.833771847Z E0322 11:22:07.833726       1 asynccache.go:62] failed a caching attempt: Get "https://keycloak-keycloak.apps.xxxx/realms/master/.well-known/openid-configuration": tls: failed to verify certificate: x509: certificate signed by unknown authority
2024-03-22T11:22:10.831644728Z I0322 11:22:10.831598       1 metrics.go:128] serverconfig.Metrics: Update ConsolePlugin metrics...
2024-03-22T11:22:10.848238183Z I0322 11:22:10.848187       1 metrics.go:138] serverconfig.Metrics: Update ConsolePlugin metrics: &map[monitoring:map[enabled:1]] (took 16.490288ms)
2024-03-22T11:22:12.829744769Z I0322 11:22:12.829697       1 metrics.go:80] usage.Metrics: Count console users...
2024-03-22T11:22:13.236378460Z I0322 11:22:13.236318       1 metrics.go:156] usage.Metrics: Update console users metrics: 0 kubeadmin, 0 cluster-admins, 0 developers, 0 unknown/errors (took 406.580502ms)

The cause is that the HCCO is not copying the issuerCertificateAuthority configmap into the openshift-config namespace of the HC.

https://github.com/openshift/hypershift/pull/3808

Bug OCPBUGS-35714: AWS - CPO can use incorrect CIDR range on the default worker security group

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35056~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34274~~. The following is the description of the original issue:
—
Description of problem:

AWS VPCs support a primary CIDR range and multiple secondary CIDR ranges: https://aws.amazon.com/about-aws/whats-new/2017/08/amazon-virtual-private-cloud-vpc-now-allows-customers-to-expand-their-existing-vpcs/

Let's pretend a VPC exists with:

Primary CIDR range: 10.0.0.0/24 (subnet-a)
Seconday CIDR range: 10.1.0.0/24 (subnet-b)

and a hostedcontrolplane object like:

  networking:
...
    machineNetwork:
    - cidr: 10.1.0.0/24
...
  olmCatalogPlacement: management
  platform:
    aws:
      cloudProviderConfig:
        subnet:
          id: subnet-b
        vpc: vpc-069a93c6654464f03

Even though all EC2 instances will be spun up in subnet-b (10.1.0.0/24), CPO will detect the CIDR range of the VPC as 10.0.0.0/24 (https://github.com/openshift/hypershift/blob/0d10c822912ed1af924e58ccb8577d2bb1fd68be/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L4755-L4765) and create security group rules only allowing inboud traffic from 10.0.0.0/24. This specifically prevents these EC2 instances from communicating with the VPC Endpoint created by the awsendpointservice CR and reading the hosted control plane pods.

Version-Release number of selected component (if applicable):

    Reproduced on a 4.14.20 ROSA HCP cluster, but the version should not matter

How reproducible:

100%

Steps to Reproduce:

    1. Create a VPC with at least one secondary CIDR block
    2. Install a ROSA HCP cluster providing the secondary CIDR block as the machine CIDR range and selecting the appropriate subnets within the secondary CIDR range

Actual results:

* Observe that the default security group contains inbound security group rules allowing traffic from the VPC's primary CIDR block (not a CIDR range containing the cluster's worker nodes)

* As a result, the EC2 instances (worker nodes) fail to reach the ignition-server

Expected results:

The EC2 instances are able to reach the ignition-server and HCP pods

Additional info:

This bug seems like it could be fixed by using the machine CIDR range for the security group instead of the VPC CIDR range. Alternatively, we could duplicate rules for every secondary CIDR block, but the default AWS quota is 60 inbound security group rules/security group, so it's another failure condition to keep in mind if we go that route.

aws ec2 describe-vpcs output for a VPC with secondary CIDR blocks:    

❯ aws ec2 describe-vpcs --region us-east-2 --vpc-id vpc-069a93c6654464f03
{
    "Vpcs": [
        {
            "CidrBlock": "10.0.0.0/24",
            "DhcpOptionsId": "dopt-0d1f92b25d3efea4f",
            "State": "available",
            "VpcId": "vpc-069a93c6654464f03",
            "OwnerId": "429297027867",
            "InstanceTenancy": "default",
            "CidrBlockAssociationSet": [
                {
                    "AssociationId": "vpc-cidr-assoc-0abbc75ac8154b645",
                    "CidrBlock": "10.0.0.0/24",
                    "CidrBlockState": {
                        "State": "associated"
                    }
                },
                {
                    "AssociationId": "vpc-cidr-assoc-098fbccc85aa24acf",
                    "CidrBlock": "10.1.0.0/24",
                    "CidrBlockState": {
                        "State": "associated"
                    }
                }
            ],
            "IsDefault": false,
            "Tags": [
                {
                    "Key": "Name",
                    "Value": "test"
                }
            ]
        }
    ]
}

https://github.com/openshift/hypershift/pull/4266

Bug OCPBUGS-19736: After Upgrade to 4.12 rebooted nodes no longer boot

View the Description View the linked PRs

Description of problem:

configure-ovs.sh breaks primary interface config by leaving generated configs in '/etc/NetworkManager/system-connections`

Version-Release number of selected component (if applicable):

4.10.52 -> 4.11.46 -> OCP 4.12.27 IPI VSphere

How reproducible:

reboot any node, the node will never become ready.

Steps to Reproduce:

1. Install and upgrade cluster
2. Reboot worker nodes after upgrade.
3.

Actual results:

Primary interface never sends DHCP and bad configs in /etc/NetworkManager/system-connections

Expected results:

No left over ovs-configure configs, and primary interface aquires IP Address using DHCP.

Additional info:

Workaround Only when using a single DHCP interface.
rm /etc/NetworkManager/system-connections/*

https://github.com/openshift/machine-config-operator/pull/3982

Bug OCPBUGS-20517: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-catalogd/pull/29

Bug OCPBUGS-24147: Update 4.15 ose-cluster-bootstrap-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-bootstrap/pull/101

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-bootstrap/pull/101

Bug OCPBUGS-17641: [Multi-NIC]EgressIP was not added to secondary NIC on egress node after apply the configuration

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

pre-merge testing or  4.14.0-0.nightly-2023-08-20-085537

How reproducible:

Always

Steps to Reproduce:

1. Label one worker node as egress node and enable ipforarding on it
2. Create an egressip object, it can be assigned to egress node
oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE                         ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-2.sriov.openshift-qe.sdn.com   172.22.0.100

oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2023-08-11T03:46:19Z"
    generation: 7
    name: egressip-1
    resourceVersion: "169277"
    uid: 7692bea5-c072-41e5-aa7a-acfa737a5428
  spec:
    egressIPs:
    - 172.22.0.100
    namespaceSelector:
      matchLabels:
        name: qe
  status:
    items:
    - egressIP: 172.22.0.100
      node: worker-2.sriov.openshift-qe.sdn.com
kind: List
metadata:
  resourceVersion: ""

3. Create a namespace test and some pods on it. add a label to namespace matching egressIP object.
4. From pod to access the bastion host

Actual results:

Outgoing traffic was timeout

From bastion node,it didn't get correct MAC for egressIP
? (172.22.0.100) at <incomplete> on sriovpr

egressIP was not added to secondary NIC on egress node
 oc debug node/worker-2.sriov.openshift-qe.sdn.com
Temporary namespace openshift-debug-crpt9 is created for debugging node...
Starting pod/worker-2sriovopenshift-qesdncom-debug-s857l ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.111.25
If you don't see a command prompt, try pressing enter.
sh-4.4# ip a show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:32:ca:4e:a8:bf brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.50/24 scope global enp1s0
       valid_lft forever preferred_lft forever
    inet6 fd00:1101::65fe:9a70:ab40:4c1a/128 scope global dynamic noprefixroute 
       valid_lft 85269sec preferred_lft 85269sec
    inet6 fe80::232:caff:fe4e:a8bf/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Expected results:

EgressIP works well on secondary NIC

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1907

Bug OCPBUGS-17866: GOOGLE_APPLICATION_CREDENTIALS is skipped for env vars

View the Description View the linked PRs

Description of problem:

According to

https://cloud.google.com/docs/authentication/provide-credentials-adc#local-key the default for application credentials is to set

GOOGLE_APPLICATION_CREDENTIALS. currently this var is missing from the list of environment variables checked.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6863

Bug OCPBUGS-19101: Update 4.15 ose-aws-ebs-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver/pull/235

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver/pull/235

Bug OCPBUGS-29613: Invalid CN name is not bubbled up in the CSR

View the Description View the linked PRs

Description of problem:

   Invalid CN is not bubbled up in the CR

Version-Release number of selected component (if applicable):

    4.15.0-rc7

How reproducible:

    always

Steps to Reproduce:

# generate a key with invalid CN
openssl genrsa -out myuser4.key 2048
openssl req -new -key myuser4.key -out myuser4.csr -subj "/CN=baduser/O=system:masters"
# get cert in the CSR
# apply the CSR
# Status remains in Accepted, but it is not Issued
% oc get csr | grep 29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr
29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr   4m29s   hypershift.openshift.io/ocm-integration-29ecg6n5bkugrh6io4his24ser3bt16n-ad-int1.customer-break-glass   system:admin                                                                60m                 Approved
# No status in the CSR status:
  conditions:
  - lastTransitionTime: "2024-02-16T14:06:41Z"
    lastUpdateTime: "2024-02-16T14:06:41Z"
    message: The requisite approval resource exists.
    reason: ApprovalPresent
    status: "True"
    type: Approved
# pki controller shows the error
 oc logs control-plane-pki-operator-bf6d75d5f-h95rf -n ocm-integration-29ecg6n5bkugrh6io4his24ser3bt16n-ad-int1 | grep "29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr"
I0216 14:06:41.842414       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"ocm-integration-29ecg6n5bkugrh6io4his24ser3bt16n-ad-int1", Name:"control-plane-pki-operator", UID:"b63dbaa9-18f7-4ee6-8473-8a38bdb6f2df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CertificateSigningRequestApproved' "29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr" in is approved
I0216 14:06:41.848623       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"ocm-integration-29ecg6n5bkugrh6io4his24ser3bt16n-ad-int1", Name:"control-plane-pki-operator", UID:"b63dbaa9-18f7-4ee6-8473-8a38bdb6f2df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CertificateSigningRequestInvalid' "29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr" is invalid: invalid certificate request: subject CommonName must begin with "system:customer-break-glass:"

Actual results:

Expected results:

    status in the CR show failed and the error

Additional info:

https://github.com/openshift/kubernetes/pull/1893

Bug OCPBUGS-31740: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1935

Bug OCPBUGS-37099: Restarting baremetal nodes from RHOCP GUI is not working

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36411~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35284~~. The following is the description of the original issue:
—
Description of problem:

Unable to restart baremetal node from OCP GUI in RHOCP 4.14

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1. Install cluster with Baremtal IPI  
2. Open console 
3. Try restarting node from GUI

Actual results:

Node is not getting restarted, nothing is happening

Expected results:

Node should get restarted

Additional info:

Attaching screenshot

https://github.com/openshift/console/pull/14071

Vulnerability OCPBUGS-43961: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/605

Bug OCPBUGS-20531: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-admission-controller/pull/76

Bug OCPBUGS-31708: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/342

Bug OCPBUGS-34912: Improve Pipeline list page performance

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34139~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-32632~~. The following is the description of the original issue:
—
Description of problem:

    In PR - https://github.com/openshift/console/pull/13676 we worked on improving the performance of the PipelineRun list page and the issue https://issues.redhat.com/browse/OCPBUGS-32631 is created to still improve the performance of the PLR list page. Once this is complete, we have to improve the performance of Pipeline list page by considering below point,

1. TaskRuns should not be fetched for all the PLR's. 
2. Use pipelinerun.status.conditions.message  to get the status of TaskRuns 3. For any PLR, if string pipelinerun.status.conditions.message having data about Tasks status use that string only instead of fetching TaskRuns

https://github.com/openshift/console/pull/13929

Bug OCPBUGS-18247: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1542

Bug OCPBUGS-33512: Remove Source link from Alertmanager notification emails

View the Description View the linked PRs

The notification emails sent by Alertmanager contain a "Source" link to the deleted Thanos UI. As the link is currently inaccessible, it should be removed.

https://github.com/openshift/cluster-monitoring-operator/pull/2343

Bug OCPBUGS-36821: e2e-gcp-operator loadbalancer service not going ready CI flakes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13106~~. The following is the description of the original issue:
—
Description of problem:

Various jobs are failing in e2e-gcp-operator due to the LoadBalancer-Type Service not going "ready", which means it most likely not getting an IP address.

Tests so far affected are:
- TestUnmanagedDNSToManagedDNSInternalIngressController
- TestScopeChange
- TestInternalLoadBalancerGlobalAccessGCP
- TestInternalLoadBalancer
- TestAllowedSourceRanges

For example, in TestInternalLoadBalancer, the Load Balancer never comes back ready:

operator_test.go:1454: Expected conditions: map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True]
         Current conditions: map[Admitted:True Available:False DNSManaged:True DNSReady:False Degraded:True DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True DeploymentRollingOut:False EvaluationConditionsDetected:False LoadBalancerManaged:True LoadBalancerProgressing:False LoadBalancerReady:False Progressing:False Upgradeable:True]

Where DNSReady:False and LoadBalancerReady:False.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

10% of the time

Steps to Reproduce:

1. Run e2e-gcp-operator many times until you see one of these failures

Actual results:

Test Failure

Expected results:

Not failure

Additional info:

Search.CI Links:
TestScopeChange
TestInternalLoadBalancerGlobalAccessGCP & TestInternalLoadBalancer

This does not seem related to https://issues.redhat.com/browse/OCPBUGS-6013. The DNS E2E tests actually pass this same condition check.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/359

Bug TRT-1274: Collect Azure disk metrics and create intervals to show on spyglass

View the Description View the linked PRs

reason/DisruptionBegan request-audit-id/91e612b4-dd19-4783-ad62-46c55bbdaee4 backend-disruption-name/oauth-api-reused-connections connection/reused disruption/openshift-tests stopped responding to GET requests over reused connections: error running request: 500 Internal Server Error: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"etcdserver: leader changed","code":500}

https://grafana-loki.ci.openshift.org/explore?orgId=1&left=%7B%22datasource%22:%22PCEB727DF2F34084E%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22PCEB727DF2F34084E%22%7D,%22editorMode%22:%22code%22,%22expr%22:%22%7Btype%3D%5C%22origin-interval%5C%22,invoker%3D~%5C%22.%2A4.14.%2Aazure.%2A%5C%22%7D%20%20%7C%20unpack%20%7C~%20%5C%22disruption%5C%22%20%7C~%20%5C%22500%20Internal%20Server%20Error.%2Aleader%20changed%5C%22%5Cn%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%22now-2d%22,%22to%22:%22now%22%7D%7D

Feels like there's something here we could dig into.

Most common on azure.

May show up in search.ci as well to help find the jobs more easily?

https://github.com/openshift/origin/pull/28375

Bug OCPBUGS-28628: [4.15] Panic: send on closed channel

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27959~~. The following is the description of the original issue:
—
In a CI run of etcd-operator-e2e I've found the following panic in the operator logs:

E0125 11:04:58.158222       1 health.go:135] health check for member (ip-10-0-85-12.us-west-2.compute.internal) failed: err(context deadline exceeded)
panic: send on closed channel

goroutine 15608 [running]:
github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth.func1()
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:58 +0xd2
created by github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:54 +0x2a5

which unfortunately is an incomplete log file. The operator recovered itself by restarting, we should fix the panic nonetheless.

Job run for reference:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1186/pull-ci-openshift-cluster-etcd-operator-master-e2e-operator/1750466468031500288

https://github.com/openshift/cluster-etcd-operator/pull/1191

Bug OCPBUGS-34665: infra machine going to failed status unexpectedly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34158~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33954~~. The following is the description of the original issue:
—
Description of problem:

Infra machine is going to failed status:

2024-05-18 07:26:49.815 | NAMESPACE               NAME                          PHASE     TYPE     REGION      ZONE   AGE
2024-05-18 07:26:49.822 | openshift-machine-api   ostest-wgdc2-infra-0-4sqdh    Running   master   regionOne   nova   31m
2024-05-18 07:26:49.826 | openshift-machine-api   ostest-wgdc2-infra-0-ssx8j    Failed                                31m
2024-05-18 07:26:49.831 | openshift-machine-api   ostest-wgdc2-infra-0-tfkf5    Running   master   regionOne   nova   31m
2024-05-18 07:26:49.841 | openshift-machine-api   ostest-wgdc2-master-0         Running   master   regionOne   nova   38m
2024-05-18 07:26:49.847 | openshift-machine-api   ostest-wgdc2-master-1         Running   master   regionOne   nova   38m
2024-05-18 07:26:49.852 | openshift-machine-api   ostest-wgdc2-master-2         Running   master   regionOne   nova   38m
2024-05-18 07:26:49.858 | openshift-machine-api   ostest-wgdc2-worker-0-d5cdp   Running   worker   regionOne   nova   31m
2024-05-18 07:26:49.868 | openshift-machine-api   ostest-wgdc2-worker-0-jcxml   Running   worker   regionOne   nova   31m
2024-05-18 07:26:49.873 | openshift-machine-api   ostest-wgdc2-worker-0-t29fz   Running   worker   regionOne   nova   31m

Logs from machine-controller shows below error:

2024-05-18T06:59:11.159013162Z I0518 06:59:11.158938       1 controller.go:156] ostest-wgdc2-infra-0-ssx8j: reconciling Machine
2024-05-18T06:59:11.159589148Z I0518 06:59:11.159529       1 recorder.go:104] events "msg"="Reconciled machine ostest-wgdc2-worker-0-jcxml" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ostest-wgdc2-worker-0-jcxml","uid":"245bac8e-c110-4bef-ac11-3d3751a93353","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"18617"} "reason"="Reconciled" "type"="Normal"
2024-05-18T06:59:12.749966746Z I0518 06:59:12.749845       1 controller.go:349] ostest-wgdc2-infra-0-ssx8j: reconciling machine triggers idempotent create
2024-05-18T07:00:00.487702632Z E0518 07:00:00.486365       1 leaderelection.go:332] error retrieving resource lock openshift-machine-api/cluster-api-provider-openstack-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-api-provider-openstack-leader": http2: client connection lost
2024-05-18T07:00:00.487702632Z W0518 07:00:00.486497       1 controller.go:351] ostest-wgdc2-infra-0-ssx8j: failed to create machine: error creating bootstrap for ostest-wgdc2-infra-0-ssx8j: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-machine-api/secrets/worker-user-data": http2: client connection lost
2024-05-18T07:00:00.487702632Z I0518 07:00:00.486534       1 controller.go:391] Actuator returned invalid configuration error: error creating bootstrap for ostest-wgdc2-infra-0-ssx8j: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-machine-api/secrets/worker-user-data": http2: client connection lost
2024-05-18T07:00:00.487702632Z I0518 07:00:00.486548       1 controller.go:404] ostest-wgdc2-infra-0-ssx8j: going into phase "Failed"

The openstack VM is not even created:

2024-05-18 07:26:50.911 | +--------------------------------------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------+--------------------+--------+
2024-05-18 07:26:50.917 | | ID                                   | Name                        | Status | Networks                                                                                                            | Image              | Flavor |
2024-05-18 07:26:50.924 | +--------------------------------------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------+--------------------+--------+
2024-05-18 07:26:50.929 | | 3a1b9af6-d284-4da5-8ebe-434d3aa95131 | ostest-wgdc2-worker-0-jcxml | ACTIVE | StorageNFS=172.17.5.187; network-dualstack=192.168.192.185, fd2e:6f44:5dd8:c956:f816:3eff:fe3e:4e7c                 | ostest-wgdc2-rhcos | worker |
2024-05-18 07:26:50.935 | | 5c34b78a-d876-49fb-a307-874d3c197c44 | ostest-wgdc2-infra-0-tfkf5  | ACTIVE | network-dualstack=192.168.192.133, fd2e:6f44:5dd8:c956:f816:3eff:fee6:4410, fd2e:6f44:5dd8:c956:f816:3eff:fef2:930a | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.941 | | d2025444-8e11-409d-8a87-3f1082814af1 | ostest-wgdc2-infra-0-4sqdh  | ACTIVE | network-dualstack=192.168.192.156, fd2e:6f44:5dd8:c956:f816:3eff:fe82:ae56, fd2e:6f44:5dd8:c956:f816:3eff:fe86:b6d1 | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.947 | | dcbde9ac-da5a-44c8-b64f-049f10b6b50c | ostest-wgdc2-worker-0-t29fz | ACTIVE | StorageNFS=172.17.5.233; network-dualstack=192.168.192.13, fd2e:6f44:5dd8:c956:f816:3eff:fe94:a2d2                  | ostest-wgdc2-rhcos | worker |
2024-05-18 07:26:50.951 | | 8ad98adf-147c-4268-920f-9eb5c43ab611 | ostest-wgdc2-worker-0-d5cdp | ACTIVE | StorageNFS=172.17.5.217; network-dualstack=192.168.192.173, fd2e:6f44:5dd8:c956:f816:3eff:fe22:5cff                 | ostest-wgdc2-rhcos | worker |
2024-05-18 07:26:50.957 | | f01d6740-2954-485d-865f-402b88789354 | ostest-wgdc2-master-2       | ACTIVE | StorageNFS=172.17.5.177; network-dualstack=192.168.192.198, fd2e:6f44:5dd8:c956:f816:3eff:fe1f:3c64                 | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.963 | | d215a70f-760d-41fb-8e30-9f3106dbaabe | ostest-wgdc2-master-1       | ACTIVE | StorageNFS=172.17.5.163; network-dualstack=192.168.192.152, fd2e:6f44:5dd8:c956:f816:3eff:fe4e:67b6                 | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.968 | | 53fe495b-f617-412d-9608-47cd355bc2e5 | ostest-wgdc2-master-0       | ACTIVE | StorageNFS=172.17.5.170; network-dualstack=192.168.192.193, fd2e:6f44:5dd8:c956:f816:3eff:febd:a836                 | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.975 | +--------------------------------------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------+--------------------+--------+

Version-Release number of selected component (if applicable):

RHOS-17.1-RHEL-9-20240123.n.1
4.15.0-0.nightly-2024-05-16-091947

Additional info:

   Must-gather link provided on private comment.

https://github.com/openshift/machine-api-provider-openstack/pull/120

Bug OCPBUGS-42992: Hypershift is managing kubeconfigs for DNS and Ingress operators

View the Description View the linked PRs

This is a clone of issue OCPBUGS-41824. The following is the description of the original issue:
—
Description of problem:

    The kubeconfigs for the DNS Operator and the Ingress Operator are managed by Hypershift and they should only be managed by the cloud service provider. This can lead to the kubeconfig/certificate being invalid in the cases where the cloud service provider further manages the kubeconfig (for example ca-rotation).

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4876

Bug OCPBUGS-19291: Update 4.15 ose-csi-driver-shared-resource-mustgather image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/144

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/144

Bug OCPBUGS-24107: Update 4.15 ose-service-ca-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/service-ca-operator/pull/226

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/service-ca-operator/pull/226

Bug OCPBUGS-25812: [OCP 4.15] VM stuck in terminating state after OCP node crash

View the Description View the linked PRs

Description of problem:

After a manual crash of a OCP node the OSPD VM running on the OCP node is stuck in terminating state

Version-Release number of selected component (if applicable):

OCP 4.12.15 
osp-director-operator.v1.3.0
kubevirt-hyperconverged-operator.v4.12.5

How reproducible:

Login to a OCP 4.12.15 Node running a VM 
Manually crash the master node.
After reboot the VM stay in terminating state

Steps to Reproduce:

    1. ssh core@masterX 
    2. sudo su
    3. echo c > /proc/sysrq-trigger

Actual results:

After reboot the VM stay in terminating state


$ omc get node|sed -e 's/modl4osp03ctl/model/g' | sed -e 's/telecom.tcnz.net/aaa.bbb.ccc/g'
NAME                               STATUS   ROLES                         AGE   VERSION
model01.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model02.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model03.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08


$ omc get pod -n openstack 
NAME                                                        READY   STATUS         RESTARTS   AGE
openstack-provision-server-7b79fcc4bd-x8kkz                 2/2     Running        0          8h
openstackclient                                             1/1     Running        0          7h
osp-director-operator-controller-manager-5896b5766b-sc7vm   2/2     Running        0          8h
osp-director-operator-index-qxxvw                           1/1     Running        0          8h
virt-launcher-controller-0-9xpj7                            1/1     Running        0          20d
virt-launcher-controller-1-5hj9x                            1/1     Running        0          20d
virt-launcher-controller-2-vhd69                            0/1     NodeAffinity   0          43d

$ omc describe  pod virt-launcher-controller-2-vhd69 |grep Status:
Status:                    Terminating (lasts 37h)

$ xsos sosreport-xxxx/|grep time
...
  Boot time: Wed Nov 22 01:44:11 AM UTC 2023
  Uptime:    8:27,  0 users

Expected results:

VM restart automatically OR does not stay in Terminating state

Additional info:

The issue has been seen two time.

First time, a crash of the kernel occured and we had the associated VM on the node in terminating state

Second time we try to reproduce the issue by crashing manually the kernel and we got the same result.
The VM running on the OCP node stay in terminating state

https://github.com/openshift/kubernetes/pull/1832

Bug OCPBUGS-32164: HCP: hypershift-operator on disconnected clusters ignores ImageContentSourcePolicies when a ImageDigestMirrorSet exist on the management cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29466~~. The following is the description of the original issue:
—
Description of problem:

    The design doc for ImageDigestMirrorSet states:
"ImageContentSourcePolicy CRD will be marked as deprecated and will be supported during all of 4.x. Update and coexistence of ImageDigestMirrorSet/ ImageTagMirrorSet and ImageContentSourcePolicy is supported. We encourage users to move to IDMS while supporting both in the cluster, but will not remove ICSP in OCP 4.x.".
see: https://github.com/openshift/machine-config-operator/blob/master/docs/ImageMirrorSetDesign.md#goals

see also:
https://github.com/openshift/enhancements/blob/master/enhancements/api-review/add-new-CRD-ImageDigestMirrorSet-and-ImageTagMirrorSet-to-config.openshift.io.md#update-the-implementation-for-migration-path
for the rationale behind it.

but the hypershift-operator is reading ImageContentSourcePolicy only if no ImageDigestMirrorSet exists on the cluster, see:
https://github.com/openshift/hypershift/blob/main/support/globalconfig/imagecontentsource.go#L101-L102

Version-Release number of selected component (if applicable):

    4.14, 4.15, 4.16

How reproducible:

    100%

Steps to Reproduce:

    1. Set both an ImageContentSourcePolicy and ImageDigestMirrorSet with different content on the management cluster
    2.
    3.

Actual results:

the hypershift-operator consumes only the ImageDigestMirrorSet content ignoring the ImageContentSourcePolicy one.

Expected results:

since both ImageDigestMirrorSet and ImageContentSourcePolicy (although deprecated) are still supported on the management cluster, the hypershift-operator should align.

Additional info:

currently oc-mirror (v1) is only generating imageContentSourcePolicy.yaml without any imageDigestMirrorSet.yaml equivalent breaking the hypershift disconnected scenario on clusters where an IDMS is already there for other reasons.

https://github.com/openshift/hypershift/pull/3870

Bug OCPBUGS-32404: [release-4.15] Creation of second hostedcluster in the same namespace fails with 'failed to set secret''s owner reference'

View the Description View the linked PRs

Description of problem:

Creation of a second hostedcluster in the same namespace fails with the error "failed to set secret''s owner reference" in the status of the second hostedlcuster's yaml.

~~~
  conditions:
  - lastTransitionTime: "2024-04-02T06:57:18Z"
    message: 'failed to reconcile the CLI secrets: failed to set secret''s owner reference'
    observedGeneration: 1
    reason: ReconciliationError
    status: "False"
    type: ReconciliationSucceeded
~~~

Note that the hosted control plane namespace is still different for both clusters.

Customer is just following the doc - https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.9/html/clusters/cluster_mce_overview#creating-a-hosted-cluster-bm for both the clusters and only the hostedcluster CR is created in the same namespace.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

Steps to Reproduce:

    1. Create a hostedcluster as per the doc https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.9/html/clusters/cluster_mce_overview#creating-a-hosted-cluster-bm
    2. Create another hostedcluster in the same namespace where the first hostedcluster was created.
    3. Second hostedcluster fails to proceed with the said error.

Actual results:

The hostedcluster creation fails

Expected results:

The hostedcluster creation should succeed

Additional info:

https://github.com/openshift/hypershift/pull/3907

Bug OCPBUGS-34510: Migrate HyperShift KAS to none endpoint reconciler type

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33428~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4097

Bug OCPBUGS-20110: Add an unit test - at least one interface must be defined for each node

View the Description View the linked PRs

Description of problem:

The unit test didn't cover a scenario when hosts are provided without any interfaces in the agent-config.yaml

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

No unit test

Expected results:

A valid unit test which tests the error message "at least one interface must be defined for each node"

Additional info:

https://github.com/openshift/installer/pull/7555

Bug OCPBUGS-20478: The secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers is not synced correctly when updating secret/vsphere-creds in ns/kube-system

View the Description View the linked PRs

Description of problem:

The secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers is not synced correctly when updating secret/vsphere-creds in ns/kube-system

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-10-084534

How reproducible:

Always

Steps to Reproduce:

Before updating the secret

$ oc -n kube-system get secret vsphere-creds -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
kind: Secret
metadata:
  annotations:
    cloudcredential.openshift.io/mode: passthrough
...

Same for the secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers

$ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
kind: Secret
metadata:
  annotations:
    cloudcredential.openshift.io/credentials-request: openshift-cloud-credential-operator/openshift-vmware-vsphere-csi-driver-operator
…

replace secret/vsphere-creds to use new vcenter (just for test)

$ oc -n kube-system get secret vsphere-creds -o yaml 
apiVersion: v1
data:
  vcsa2-qe.vmware.devcluster.openshift.com.password: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.username: xxx
(Updated to vcsa2-qe)

There are two vcenter info in vmware-vsphere-cloud-credentials:

$ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.password: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.username: xxx
(devqe and vcsa2-qe)

restore secret/vsphere-creds

$ oc -n kube-system get secret vsphere-creds -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
(Updated to devqe)

Still two vcenter info in vmware-vsphere-cloud-credentials:

$ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.password: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.username: xxx
(devqe and vcsa2-qe)

Actual results:

The secret/vmware-vsphere-cloud-credentials is not synced well

Expected results:

The secret/vmware-vsphere-cloud-credentials should be synced well

Additional info:

Storage vSphere csi driver controller pods are crash looping.

https://github.com/openshift/cloud-credential-operator/pull/628

Bug OCPBUGS-21973: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/529

Bug OCPBUGS-31544: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/569

Bug OCPBUGS-24120: Update 4.15 ose-azure-disk-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-disk-csi-driver/pull/64

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-disk-csi-driver/pull/64

Bug OCPBUGS-25324: Last visited tab not get selected on Pipelines page in dev perspective

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13443

Bug HOSTEDCP-1714: Kubernetes API Server Log Verbosity Annotation cherry pick to 4.15

View the Description View the linked PRs

Child of https://issues.redhat.com/browse/HOSTEDCP-1553 for cherry-picking to 4.15

https://github.com/openshift/hypershift/pull/4178

Bug OCPBUGS-21722: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24118: Update 4.15 ose-kubevirt-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-kubevirt/pull/28

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-kubevirt/pull/28

Bug OCPBUGS-37399: Add default sorting column for VirtualizedTable component of dynamic plugin sdk

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36186~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33539~~. The following is the description of the original issue:
—
Description of problem:

    VirtualizedTable component  in console dynamic plugin don't have default sorting column. We need default sorting column for list pages.

https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#virtualizedtable

https://github.com/openshift/console/pull/14079

Bug OCPBUGS-48105: [vsphere] Machine stuck in Provisioning status when machine is power off

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-1735~~. The following is the description of the original issue:
—
Description of problem:

When setting up cluster on vsphere, sometimes machine is powered off and in "Provisioning" phase, it will trigger a new machine creation, and report error "failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists"

Version-Release number of selected component (if applicable):

 4.12.0-0.ci.test-2022-09-26-235306-ci-ln-vh4qjyk-latest

How reproducible:

Sometimes, met two times

Steps to Reproduce:

1. Setup a vsphere cluster
2.
3.

Actual results:

Cluster installation failed, machine stuck in Provisioning status. 
$ oc get machine                      
NAME                             PHASE          TYPE   REGION   ZONE   AGE
jima-ipi-27-d97wp-master-0       Running                               4h
jima-ipi-27-d97wp-master-1       Running                               4h
jima-ipi-27-d97wp-master-2       Running                               4h
jima-ipi-27-d97wp-worker-7qn9b   Provisioning                          3h56m
jima-ipi-27-d97wp-worker-dsqd2   Running                               3h56m

$ oc edit machine jima-ipi-27-d97wp-worker-7qn9b
status:
  conditions:
  - lastTransitionTime: "2022-09-27T01:27:29Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2022-09-27T01:27:29Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2022-09-27T01:27:29Z"
    status: "True"
    type: Terminable
  lastUpdated: "2022-09-27T01:27:29Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastTransitionTime: "2022-09-27T01:36:09Z"
      message: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists.
      reason: MachineCreationSucceeded
      status: "False"
      type: MachineCreation
    taskRef: task-11363480

$ govc vm.info /SDDC-Datacenter/vm/jima-ipi-27-d97wp/jima-ipi-27-d97wp-worker-7qn9b
Name:           jima-ipi-27-d97wp-worker-7qn9b
  Path:         /SDDC-Datacenter/vm/jima-ipi-27-d97wp/jima-ipi-27-d97wp-worker-7qn9b
  UUID:         422cb686-6585-f05a-af13-b2acac3da294
  Guest name:   Red Hat Enterprise Linux 8 (64-bit)
  Memory:       16384MB
  CPU:          8 vCPU(s)
  Power state:  poweredOff
  Boot time:    <nil>
  IP address:   
  Host:         10.3.32.8

I0927 01:44:42.568599       1 session.go:91] No existing vCenter session found, creating new session
I0927 01:44:42.633672       1 session.go:141] Find template by instance uuid: 9535891b-902e-410c-b9bb-e6a57aa6b25a
I0927 01:44:42.641691       1 reconciler.go:270] jima-ipi-27-d97wp-worker-7qn9b: already exists, but was not powered on after clone, requeue
I0927 01:44:42.641726       1 controller.go:380] jima-ipi-27-d97wp-worker-7qn9b: reconciling machine triggers idempotent create
I0927 01:44:42.641732       1 actuator.go:66] jima-ipi-27-d97wp-worker-7qn9b: actuator creating machine
I0927 01:44:42.659651       1 reconciler.go:935] task: task-11363480, state: error, description-id: VirtualMachine.clone
I0927 01:44:42.659684       1 reconciler.go:951] jima-ipi-27-d97wp-worker-7qn9b: Updating provider status
E0927 01:44:42.659696       1 actuator.go:57] jima-ipi-27-d97wp-worker-7qn9b error: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists.
I0927 01:44:42.659762       1 machine_scope.go:101] jima-ipi-27-d97wp-worker-7qn9b: patching machine
I0927 01:44:42.660100       1 recorder.go:103] events "msg"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"jima-ipi-27-d97wp-worker-7qn9b","uid":"9535891b-902e-410c-b9bb-e6a57aa6b25a","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"17614"} "reason"="FailedCreate" "type"="Warning"
W0927 01:44:42.688562       1 controller.go:382] jima-ipi-27-d97wp-worker-7qn9b: failed to create machine: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists.
E0927 01:44:42.688651       1 controller.go:326]  "msg"="Reconciler error" "error"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists." "controller"="machine-controller" "name"="jima-ipi-27-d97wp-worker-7qn9b" "namespace"="openshift-machine-api" "object"={"name":"jima-ipi-27-d97wp-worker-7qn9b","namespace":"openshift-machine-api"} "reconcileID"="d765f02c-bd54-4e6c-88a4-c578f16c7149"
...
I0927 03:18:45.118110       1 actuator.go:66] jima-ipi-27-d97wp-worker-7qn9b: actuator creating machine
E0927 03:18:45.131676       1 actuator.go:57] jima-ipi-27-d97wp-worker-7qn9b error: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created
I0927 03:18:45.131725       1 machine_scope.go:101] jima-ipi-27-d97wp-worker-7qn9b: patching machine
I0927 03:18:45.131873       1 recorder.go:103] events "msg"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"jima-ipi-27-d97wp-worker-7qn9b","uid":"9535891b-902e-410c-b9bb-e6a57aa6b25a","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"17614"} "reason"="FailedCreate" "type"="Warning"
W0927 03:18:45.150393       1 controller.go:382] jima-ipi-27-d97wp-worker-7qn9b: failed to create machine: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created
E0927 03:18:45.150492       1 controller.go:326]  "msg"="Reconciler error" "error"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created" "controller"="machine-controller" "name"="jima-ipi-27-d97wp-worker-7qn9b" "namespace"="openshift-machine-api" "object"={"name":"jima-ipi-27-d97wp-worker-7qn9b","namespace":"openshift-machine-api"} "reconcileID"="5d92bc1d-2f0d-4a0b-bb20-7f2c7a2cb5af"
I0927 03:18:45.150543       1 controller.go:187] jima-ipi-27-d97wp-worker-dsqd2: reconciling Machine

Expected results:

Machine is created successfully.

Additional info:

machine-controller log: http://file.rdu.redhat.com/~zhsun/machine-controller.log

https://github.com/openshift/machine-api-operator/pull/1321

Bug OCPBUGS-19437: API docs content issue

View the Description View the linked PRs

Description of problem:

As the original PR has been merged, open the new bug for tracking the issue in Doc

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Sheet format issue for 'useActiveColumns','K8sGetResource',
' k8sDeleteResource,'k8sListResource', 'K8sUpdateResource' and 'k8sPatchResource'
Attached: https://drive.google.com/file/d/1NgitSi9mgB3zluqmp8eza4DhFOVY-Pt9/view?usp=drive_link 
2. The text 'code' is not highlight in 'getGroupVersionKindForModel'
Attached: https://drive.google.com/file/d/1sVxXdlIBxKxxokZX2iorJOER7ILGByzm/view?usp=drive_link
3. Incorrect </br> setting in 'ErrorBoundaryFallbackPage'
https://drive.google.com/file/d/1ubhcFb68kDwL-wKsknP1Hb0fos480OnA/view?usp=drive_link
4. Several links marked with label {@link}： ListPageCreate， useK8sModel，k8sGetResource，k8sDeleteResource， k8sListResource， k8sListResourceItems，YAMLEditor

Actual results:

Expected results:

Additional info:

Impacted Code Line:
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L616-L619
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1277-L1283
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1404-L1410
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1434-L1437
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1335-L1341
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1365-L1370
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1528
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L2157
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L698
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1035
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L1452
https://github.com/Mylanos/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#L2480

Bug OCPBUGS-37205: CNO must consider infra and workload machine config pools for IPsec rollout

View the Description View the linked PRs

Description of problem:

CNO assumes only master and worker machine config pools present on the cluster, While running CI with 24 nodes, it's found that there are two more pools infra and workload present. So these pools are also taken into consideration while rolling out ipsec machine config.

# omg get mcp
NAME      CONFIG                                              UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
infra     rendered-infra-52f7615d8c841e7570b7ab6cbafecac8     True     False     False     3             3                  3                    0                     38m
master    rendered-master-fbb5d8e1337d1244d30291ffe3336e45    True     False     False     3             3                  3                    0                     1h10m
worker    rendered-worker-52f7615d8c841e7570b7ab6cbafecac8    False    True      False     24            12                 12                   0                     1h10m
workload  rendered-workload-52f7615d8c841e7570b7ab6cbafecac8  True     False     False     0             0                  0                    0                     38m

CI run: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/50740/rehearse-50740-pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-azure-4.16-nightly-x86-control-plane-ipsec-24nodes/1782308642033242112

https://github.com/openshift/cluster-network-operator/pull/2439

Bug OCPBUGS-46451: [4.15] Libreswan and xfrm information and logs are not getting collected in sos-reports

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-46407~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-46281. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42679. The following is the description of the original issue:
—
Description of problem:

    In 4.14 libreswan is running as a containerized process inside the pod. SOS-Reports and must-gathers are not collecting libreswan logs and xfrm information from the nodes which is making the debugging process heavier. This should be fixed by working with the sos-report team OR by changing our must-gather scripts in 4.14 alone.

    From 4.15 libreswan is a systemd process running on the host so the swan logs are gathered in sos-report

For 4.14 specially during escalations gathering individual node data over and over is becoming painful for IPSEC. We need to ensure all the data required to debug IPSEC is collected in one place

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/must-gather/pull/471

Bug OCPBUGS-33642: PF5 Modal is not rendered correctly in Openshift Console Dynamic Plugin

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31901~~. The following is the description of the original issue:
—
Description of problem:

    After upgrading the dynamic console plugin to PF5, the modal is not rendered correctly. The header and footer of the modal are not displayed.

Version-Release number of selected component (if applicable):

    "@openshift-console/dynamic-plugin-sdk": "^1.1.0",
    "@patternfly/patternfly": "^5.2.0",
    "@patternfly/react-charts": "^7.2.0",
    "@patternfly/react-core": "^5.2.0",
    "@patternfly/react-icons": "^5.2.0",
    "@patternfly/react-table": "^5.2.0",
    "@patternfly/react-topology": "^5.2.0",

Steps to Reproduce:

    1. Include Modal component in a PF5 dynamic console plugin
    2. Render the modal component

Actual results:

The header and the footer of the modal are not displayed

Expected results:

The modal is rendered correctly

Additional info:

    This issue is related to the next PF modal component (currently in beta) created in PF version 5.2.0. As a temporary workaround, downgrading PF library to version 5.1.x fixes the issue.

https://github.com/openshift/console/pull/13849

Bug OCPBUGS-36377: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-csi-snapshot-controller-operator)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the cluster-csi-snapshot-controller-operator.

Namespaces with workloads that need pinning:

openshift-cluster-storage-operator

See 4.16 PR for more info on what needs pinning.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/211

Bug OCPBUGS-9340: oc adm upgrade runs default case for incorrect subcommand

View the Description View the linked PRs

Description of problem:

`oc adm upgrade` silently errors out on incorrect subcommand without doing/notifying anything
this is due to the `default` case in `run()` which catches all the incorrect subcommand and runs the default part instead.

Version-Release number of selected component (if applicable): 4.10 and current

How reproducible:
use any incorrect subcommand with `oc adm upgrade`.
example: `oc adm upgrade incorrect-subcommand`

Steps to Reproduce:
1. run `oc adm upgrade incorrect-subcommand`

Actual results:
oc prints the cluster upgrade status

Expected results:
oc should error out saying incorrect subcommand

https://github.com/openshift/oc/pull/1557

Bug OCPBUGS-26759: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/162

Bug OCPBUGS-28947: openshift/openshift-controller-manager - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28665~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/openshift-controller-manager/pull/285

Bug OCPBUGS-29080: [release-4.15] Modal dialogs expose code that is not null object safe

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28856~~. The following is the description of the original issue:
—
Description of problem:

When using the modal dialogs in a hook as part of the actions hook (i.e. useApplicationsActionsProvider) the console will throw an error since the console framework will pass null objects as part of the render cycle. According to Jon Jackson, the console should be safe from null objects but it looks like the code for useDeleteModal and getGroupVersionKindForresource are not safe,

Version-Release number of selected component (if applicable):

How reproducible:

   Always

Steps to Reproduce:

    1. Use one of the modal APIs in an actions provider hook
    2.
    3.

Actual results:

    Caught error in a child component: TypeError: Cannot read properties of undefined (reading 'split')
    at i (main-chunk-9fbeef79a…d3a097ed.min.js:1:1)
    at u (main-chunk-9fbeef79a…d3a097ed.min.js:1:1)
    at useApplicationActionsProvider (useApplicationActionsProvider.tsx:23:43)
    at ApplicationNavPage (ApplicationDetails.tsx:38:67)
    at na (vendors~main-chunk-8…87b.min.js:174297:1)
    at Hs (vendors~main-chunk-8…87b.min.js:174297:1)
    at Sc (vendors~main-chunk-8…87b.min.js:174297:1)
    at Cc (vendors~main-chunk-8…87b.min.js:174297:1)
    at _c (vendors~main-chunk-8…87b.min.js:174297:1)
    at pc (vendors~main-chunk-8…87b.min.js:174297:1)

Expected results:

    Works with no error

Additional info:

https://github.com/openshift/console/pull/13582

Bug OCPBUGS-33092: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/117

Bug OCPBUGS-25768: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/72

Bug OCPBUGS-27359: Spurious "wait has exceeded 40 minutes" when etcd operator briefly goes degraded in late upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25862~~. The following is the description of the original issue:
—

Description of problem:

At 17:26:09, the cluster is happily upgrading nodes:

An update is in progress for 57m58s: Working towards 4.14.1: 734 of 859 done (85% complete), waiting on machine-config

At 17:26:54, the upgrade starts to reboot master nodes and COs get noisy (this one specifically is ~~OCPBUGS-20061~~)

An update is in progress for 58m50s: Unable to apply 4.14.1: the cluster operator control-plane-machine-set is not available

~Two minutes later, at 17:29:07, CVO starts to shout about waiting on operators for over 40 despite not indicating anything is wrong earlier:

An update is in progress for 1h1m2s: Unable to apply 4.14.1: wait has exceeded 40 minutes for these operators: etcd, kube-apiserver

This is only because these operators go briefly degraded during master reboot (which they shouldn't but that is a different story). CVO computes its 40 minutes against the time when it first started to upgrade the given operator so it:

1. Upgrades etcd / KAS very early in the upgrade, noting the time when it started to do that
2. These two COs upgrade successfuly and upgrade proceeds
3. Eventually cluster starts rebooting masters and etcd/KAS go degraded
4. CVO compares current time against the noted time, discovers its more than 40 minutes and starts warning about it.

Version-Release number of selected component (if applicable):

all

How reproducible:

Not entirely deterministic:

1. the upgrade must go for 40m+ between upgrading etcd and upgrading nodes
2. the upgrade must reboot a master that is not running CVO (otherwise there will be a new CVO instance without the saved times, they are only saved in memory)

Steps to Reproduce:

1. Watch oc adm upgrade during the upgrade

Actual results:

Spurious "waiting for over 40m" message pops out of the blue

Expected results:

CVO simply says "waiting up to 40m on" and this eventually goes away as the node goes up and etcd goes out of degraded.

https://github.com/openshift/cluster-version-operator/pull/1023

Bug OCPBUGS-27368: Bump to kubernetes 1.28.6

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.6:

Changelog:
v1.28.6: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1285

https://github.com/openshift/kubernetes/pull/1857

Bug OCPBUGS-18860: Update 4.15 openshift-enterprise-base-rhel9 image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/149

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/149

Bug OCPBUGS-24067: Update 4.15 golang-github-openshift-oauth-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-proxy/pull/269

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-proxy/pull/269

Bug OCPBUGS-32426: [4.15z] slow ovnkube-node initialization on large number of services with externalIps

View the Description View the linked PRs

on clusters with a large number of services with externalIPs or services from type loadBalancer the ovnkube-node initialization can take up to 50 min

The problem is after a node reboot done by MCO the unschedule taint is removed from the node so the api allocates pods to that node that get stuck on ContrainerCreating and other nodes continue to go down for reboot making the workloads unavailable. (if no PDB exists for the workload to protect it)

https://github.com/openshift/ovn-kubernetes/pull/2156

Bug OCPBUGS-39444: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver/pull/78

Bug OCPBUGS-41701: The hypershift cli (hcp) reports an inaccurate OCP supported version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34803~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-32186~~. The following is the description of the original issue:
—
Description of problem:

    The self-managed hypershift cli (hcp) reports an inaccurate OCP supported version.

For example, if I have a hypershift-operator deployed which supports OCP v4.14 and I build the hcp cli from the latest source code, when I execute "hcp -v", the cli tool reports the following. 


$ hcp -v
hcp version openshift/hypershift: 02bf7af8789f73c7b5fc8cc0424951ca63441649. Latest supported OCP: 4.16.0

This makes it appear that the hcp cli is capable of deploying OCP v4.16.0, when the backend is actually limited to v4.14.0.

The cli needs to indicate what the server is capable of deploying. Otherwise it appears that v4.16.0 would be deployable in this scenario, but the backend would not allow that.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    100%

Steps to Reproduce:

    1. download an HCP client that does not match the hypershift-operator backend
    2. execute 'hcp -v'
    3. the reported "Latest supported OCP" is not representative of the version the hypershift-operator actually supports

Actual results:

Expected results:

     hcp cli reports a latest OCP version that is representative of what the deployed hypershift operator is capable of deploying.

Additional info:

https://github.com/openshift/hypershift/pull/4702

Bug OCPBUGS-44708: Network policy does not working properly during SDN live migration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42244~~. The following is the description of the original issue:
—
Description of problem:

Network policy doesn't work properly during SDN live migration. During the migration, when the 2 CNI plugins are running in parallel. Cross-CNI traffic will be denied by ACLs generated for the network policy.

Version-release number of selected component (if applicable):

How reproducible:

Steps to reproduce:

1. Deploy a cluster with openshift-sdn

2. Create testpods in 2 different namespaces, z1 and z2.

3. In namespace z1, create a network policy that allows traffic from z2.

4. Trigger SDN live migration

5. Monitor the accessibility between the pods in Z1 and Z2.

Actual results:

When the pods in z1 and z2 on different nodes are using different CNI, the traffic is denied.

Expected results:

The traffic shall be allowed regardless of the CNI utilized by either pod.

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (especially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so, please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so, please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g., AWS, Azure, GCP, baremetal, etc) ? If so, please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What are the srcNode, srcIP, srcNamespace and srcPodName?
What are the dstNode, dstIP, dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, SOSR report, or other attachment, please provide the following details:
- If the issue is in a customer namespace, then provide a namespace inspection.
- If it is a connectivity issue:
  - What are the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What are the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure, etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem have happened, if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with "sbr-triaged.”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with "sbr-untriaged.”
Do not set the priority; that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label "SDN-Jira-template.”
For guidance on using this template, please see
OCPBUGS Template Training for Networking components

https://github.com/openshift/ovn-kubernetes/pull/2399

Bug OCPBUGS-29316: OCP upgrade to nightly build failed on provider cluster - OVN-K fails to process annotation on live-migratable VM pods

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27853~~. The following is the description of the original issue:
—
Description of problem:

Upgrading OCP from 4.14.7 to 4.15.0 nightly build failed on Provider cluster which is part of provider-client setup.
Platform: IBM Cloud Bare Metal cluster.

Steps done:

Step 1.

$ oc patch clusterversions/version -p '{"spec":{"channel":"stable-4.15"}}' --type=merge
clusterversion.config.openshift.io/version patched

Step 2:
$ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2024-01-18-050837 --allow-explicit-upgrade --force
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requesting update to release image registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2024-01-18-050837

The cluster was not upgraded successfully.

 
$ oc get clusteroperator | grep -v "4.15.0-0.nightly-2024-01-18-050837   True        False         False"
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2024-01-18-050837   True        False         True       111s    APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
console                                    4.15.0-0.nightly-2024-01-18-050837   False       False         False      111s    RouteHealthAvailable: console route is not admitted
dns                                        4.15.0-0.nightly-2024-01-18-050837   True        True          False      12d     DNS "default" reports Progressing=True: "Have 4 available DNS pods, want 5.\nHave 5 available node-resolver pods, want 6."
etcd                                       4.15.0-0.nightly-2024-01-18-050837   True        False         True       12d     EtcdEndpointsDegraded: EtcdEndpointsController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [{Member:ID:14147288297306253147 name:"baremetal2-06.qe.rh-ocs.com" peerURLs:"https://52.116.161.167:2380" clientURLs:"https://52.116.161.167:2379"  Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://52.116.161.167:2379]: context deadline exceeded} {Member:ID:15369339084089827159 name:"baremetal2-03.qe.rh-ocs.com" peerURLs:"https://52.116.161.164:2380" clientURLs:"https://52.116.161.164:2379"  Healthy:true Took:9.617293ms Error:<nil>} {Member:ID:17481226479420161008 name:"baremetal2-04.qe.rh-ocs.com" peerURLs:"https://52.116.161.165:2380" clientURLs:"https://52.116.161.165:2379"  Healthy:true Took:9.090133ms Error:<nil>}]...
image-registry                             4.15.0-0.nightly-2024-01-18-050837   True        True          False      12d     Progressing: All registry resources are removed...
machine-config                             4.14.7                               True        True          True       7d22h   Unable to apply 4.15.0-0.nightly-2024-01-18-050837: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, MachineConfigPool master has not progressed to latest configuration: controller version mismatch for rendered-master-9b7e02d956d965d0906def1426cb03b5 expected eaab8f3562b864ef0cc7758a6b19cc48c6d09ed8 has 7649b9274cde2fb50a61a579e3891c8ead2d79c5: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-34b4781f1a0fe7119765487c383afbb3, retrying]]
monitoring                                 4.15.0-0.nightly-2024-01-18-050837   False       True          True       7m54s   UpdatingUserWorkloadPrometheus: client rate limiter Wait returned an error: context deadline exceeded, UpdatingUserWorkloadThanosRuler: waiting for ThanosRuler object changes failed: waiting for Thanos Ruler openshift-user-workload-monitoring/user-workload: context deadline exceeded
network                                    4.15.0-0.nightly-2024-01-18-050837   True        True          False      12d     DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 2 nodes)...
node-tuning                                4.15.0-0.nightly-2024-01-18-050837   True        True          False      98m     Working towards "4.15.0-0.nightly-2024-01-18-050837"


$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-9b7e02d956d965d0906def1426cb03b5   False     True       True       3              0                   0                     1                      12d
worker   rendered-worker-4f54b43e9f934f0659761929f55201a1   False     True       True       3              1                   1                     1                      12d


$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.7    True        True          120m    Unable to apply 4.15.0-0.nightly-2024-01-18-050837: an unknown error has occurred: MultipleErrors


$ oc get nodes
NAME                          STATUS                     ROLES                         AGE   VERSION
baremetal2-01.qe.rh-ocs.com   Ready                      worker                        12d   v1.27.8+4fab27b
baremetal2-02.qe.rh-ocs.com   Ready                      worker                        12d   v1.27.8+4fab27b
baremetal2-03.qe.rh-ocs.com   Ready                      control-plane,master,worker   12d   v1.27.8+4fab27b
baremetal2-04.qe.rh-ocs.com   Ready                      control-plane,master,worker   12d   v1.27.8+4fab27b
baremetal2-05.qe.rh-ocs.com   Ready                      worker                        12d   v1.28.5+c84a6b8
baremetal2-06.qe.rh-ocs.com   Ready,SchedulingDisabled   control-plane,master,worker   12d   v1.27.8+4fab27b
----------------------------------------------------

During the efforts to bring the cluster back to a good state, these steps were done:
The node baremetal2-06.qe.rh-ocs.com was uncordoned.

Tried to upgrade to using the command

$ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2024-01-22-051500 --allow-explicit-upgrade --force --allow-upgrade-with-warnings=true
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
warning: --allow-upgrade-with-warnings is bypassing: the cluster is already upgrading:  Reason: ClusterOperatorsDegraded
  Message: Unable to apply 4.15.0-0.nightly-2024-01-18-050837: wait has exceeded 40 minutes for these operators: etcd, kube-apiserverRequesting update to release image registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2024-01-22-051500


Upgrade to 4.15.0-0.nightly-2024-01-22-051500 also was not successful.
Node baremetal2-01.qe.rh-ocs.com was drained manually to see if that works.

Some clusteroperators stayed on the previous version. Some moved to Degraded state. 

$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-9b7e02d956d965d0906def1426cb03b5   False     True       False      3              1                   1                     0                      13d
worker   rendered-worker-4f54b43e9f934f0659761929f55201a1   False     True       True       3              1                   1                     1                      13d


$ oc get pdb -n openshift-storage
NAME                                              MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
rook-ceph-mds-ocs-storagecluster-cephfilesystem   1               N/A               1                     11d
rook-ceph-mon-pdb                                 N/A             1                 1                     11d
rook-ceph-osd                                     N/A             1                 1                     3h17m


$ oc rsh rook-ceph-tools-57fd4d4d68-p2psh ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME                             STATUS  REWEIGHT  PRI-AFF
-1         5.23672  root default                                                   
-5         1.74557      host baremetal2-01-qe-rh-ocs-com                           
 1    ssd  0.87279          osd.1                             up   1.00000  1.00000
 4    ssd  0.87279          osd.4                             up   1.00000  1.00000
-7         1.74557      host baremetal2-02-qe-rh-ocs-com                           
 3    ssd  0.87279          osd.3                             up   1.00000  1.00000
 5    ssd  0.87279          osd.5                             up   1.00000  1.00000
-3         1.74557      host baremetal2-05-qe-rh-ocs-com                           
 0    ssd  0.87279          osd.0                             up   1.00000  1.00000
 2    ssd  0.87279          osd.2                             up   1.00000  1.00000


OCP must-gather logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/hcp414-aaa/hcp414-aaa_20240112T084548/logs/must-gather-ibm-bm2-provider/must-gather.local.1079362865726528648/

Version-Release number of selected component (if applicable):

Initial version:
OCP 4.14.7
ODF 4.14.4-5.fusion-hci
OpenShift Virtualization: kubevirt-hyperconverged-operator.4.16.0-380
Local Storage: local-storage-operator.v4.14.0-202312132033
OpenShift Data Foundation Client : ocs-client-operator.v4.14.4-5.fusion-hci

How reproducible:

Reporting the first occurance of the isue.

Steps to Reproduce:

    1. On a Provider-client HCI setup , upgrade provider cluster to a nightly build of OCP

Actual results:

    OCP upgrade not successful. Some operators become degraded. worker machineconfigpool have 1 degraded machine count.

Expected results:

OCP upgrade to nightly build from 4.14.7 should be success.

Additional info:

    There are 3 hosted clients present

https://github.com/openshift/ovn-kubernetes/pull/2065

Task MON-3479: Update downstream prometheus-operator to v0.69.1

View the linked PRs

Bug OCPBUGS-18990: cluster-restore.sh does not move static pods back

View the Description View the linked PRs

Description of problem:

The script refactoring from https://github.com/openshift/cluster-etcd-operator/pull/1057 introduced a regression. 

Since the static pod list variable was renamed, it is now empty and won't restore the non-etcd pod yamls anymore.

Version-Release number of selected component (if applicable):

4.14 and later

How reproducible:

always

Steps to Reproduce:

1. create a cluster
2. restore using cluster-restore.sh

Actual results:

the apiserver and other static pods are not immediately restored

The script only outputs this log:

removing previous backup /var/lib/etcd-backup/member
Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup
starting restore-etcd static pod

Expected results:

the non-etcd static pods should be immediately restored by moving them into the manifest directory again.

You can see this by the log output:

Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup
starting restore-etcd static pod
starting kube-apiserver-pod.yaml
static-pod-resources/kube-apiserver-pod-7/kube-apiserver-pod.yaml
starting kube-controller-manager-pod.yaml
static-pod-resources/kube-controller-manager-pod-7/kube-controller-manager-pod.yaml
starting kube-scheduler-pod.yaml
static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yaml

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1111

Bug OCPBUGS-20270: The console repo readme is missing instructions for enabling monitoring locally

View the Description View the linked PRs

Description of problem:

Since moving to a dynamic plugin, the monitoring UI will not work when running locally unless some extra steps are taken. Bridge must be configured to use this plugin, which needs to be running alongside it. Our readme doesn't include this information or instructions.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. Read the readme

Actual results:

The readme does not include instructions for running monitoring locally

Expected results:

The readme includes instructions for running monitoring locally

https://github.com/openshift/console/pull/13226

Bug OCPBUGS-24095: Update 4.15 openshift-enterprise-registry-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-registry/pull/387

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-registry/pull/387

Bug OCPBUGS-27894: duplicate failure domains in CMPS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25453~~. The following is the description of the original issue:
—
Description of problem:

 CMPS was supported in 4.15 on vsphere platform when enable TechPreviewNoUpgrade. but after I build the cluster with no failure domains/single failure domain setting in install-config. there were three duplicated failure domains.

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-11-033133

How reproducible:

    install a cluster with TP enabled and don't set failure domain (or set single failure doamin) in install-config.

Steps to Reproduce:

    1. do not config failure domain in install-config (or set single failure doamin).
    2. install a cluster with TP enabled
    3. check CPMS with command:   
       oc get controlplanemachineset -oyaml

Actual results:

duplicated failure domains.
        failureDomains:
     platform: VSphere
     vsphere:
     - name: generated-failure-domain
     - name: generated-failure-domain
     - name: generated-failure-domain
    metadata:
     labels:

Expected results:

 failure domain should not duplicated when setting single failure domain in install-config.
 failure domain should not exists when not setting failure domain in install-config.

Additional info:

https://github.com/openshift/installer/pull/7951

Bug OCPBUGS-21940: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/206

Bug OCPBUGS-27566: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/668

Bug OCPBUGS-29845: Page fails to return to the Secrets list after clicking 'Cancel' on any Secret creation page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26566~~. The following is the description of the original issue:
—
Description of problem:

  when user click ‘Cancel’ on any Secret creation page, it doesn’t return to Secrets list page

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2024-01-06-062415

How reproducible:

    Always

Steps to Reproduce:

    1. Go to Create Key/value secret|Image pull secret|Source secret|Webhook secret|FromYaml page
       eg：/k8s/ns/default/secrets/~new/generic
    2. Click Cancel button
    3.

Actual results:

    The page does not go back to Secrets list page
    eg: /k8s/ns/default/core~v1~Secret

Expected results:

    The page should go back to the Secrets list page

Additional info:

https://github.com/openshift/console/pull/13630

Bug OCPBUGS-38262: [4.15] While upgrading from 4.12.55 to 4.13.42, the network operator goes in a degraded state due to the ovnkube-master pods ending up in a crashloopbackoff.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37939~~. The following is the description of the original issue:
—
Description of problem:

While upgrading from 4.12.55 to 4.13.42 the network operator seems to be in a degraded state due to the ovnkube-master pods ending up in a crashloopbackoff.

The ovnkube-master container appears to hit a context deadline timeout and is not starting. This happens for all 3 ovnkube-master pods.

ovnkube-master-b5dwz   5/6     CrashLoopBackOff   15 (4m49s ago)   75m
ovnkube-master-dm6g5   5/6     CrashLoopBackOff   15 (3m50s ago)   72m
ovnkube-master-lzltc         5/6     CrashLoopBackOff   16 (31s ago)     76m

Relevant logs :

1 ovnkube.go:369] failed to start network controller manager: failed to start default network controller: failed to sync address sets on controller init: failed to transact address set sync ops: error in transact with ops [{Op:insert Table:Address_Set Row:map[addresses:{GoSet:[172.21.4.58 172.30.113.119 172.30.113.93 172.30.140.204 172.30.184.23 172.30.20.1 172.30.244.26 172.30.250.254 172.30.29.56 172.30.39.131 172.30.54.87 172.30.54.93 172.30.70.9]} external_ids:{GoMap:map[direction:ingress gress-index:0 ip-family:v4 ...]} log:false match:ip4.src == {$a10011776377603330168, $a10015887742824209439, $a10026019104056290237, $a10029515256826812638, $a5952808452902781817, $a10084011578527782670, $a10086197949337628055, $a10093706521660045086, $a10096260576467608457, $a13012332091214445736, $a10111277808835218114, $a10114713358929465663, $a101155018460287381, $a16191032114896727480, $a14025182946114952022, $a10127722282178953052, $a4829957937622968220, $a10131833063630260035, $a3533891684095375041, $a7785003721317615588, $a10594480726457361847, $a10147006001458235329, $a12372228123457253136, $a10016996505620670018, $a10155660392008449200, $a10155926828030234078, $a15442683337083171453, $a9765064908646909484, $a7550609288882429832, $a11548830526886645428, $a10204075722023637394, $a10211228835433076965, $a5867828639604451547, $a10222049254704513272, $a13856077787103972722, $a11903549070727627659,.... (this is a very long list of ACL)

https://github.com/openshift/ovn-kubernetes/pull/2252

Bug OCPBUGS-21876: pipe can hide errors when using ip command

View the Description View the linked PRs

Description of problem:

if pipefail is active in a bash script, the pipe ( | ) usage can hide the actual error of the ip command if it fails with exit code different from 1

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-image/pull/408

Bug OCPBUGS-26406: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1196

Bug OCPBUGS-33263: Pipeline list page is crashed when navigating from Search page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33202~~. The following is the description of the original issue:
—
Description of problem:

    When navigating to Pipelines list page from Search menu in Dev perspective, Pipelines list page is getting crashed

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1.Install Pipelines Operator
    2.Go to Developer perspective
    3.Go to search menu, select Pipeline

Actual results:

    Page is getting crashed

Expected results:

    Page should not be crashed and should show Pipelines List page

Additional info:

https://github.com/openshift/console/pull/13819

Bug OCPBUGS-19118: Update 4.15 ose-cluster-platform-operators-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/platform-operators/pull/91

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/platform-operators/pull/91

Bug OCPBUGS-20096: pause image is still on RHEL 8

View the Description View the linked PRs

Description of problem:

Recently we bumped the hyperkube image [1] to use both RHEL 9 builder and base images.

In order to keep things consistent, we tried to do the same with the "pause" image [2], however, that caused mass failures in payload jobs [3] due to a mismatch with ART [4], which still builds that image with RHEL 8.

As a result, we decided to keep builder & base images for "pause" in RHEL 8, as this work was not required for the kube 1.28 bump nor the FIPS issue we were addressing.

However, for the sake of consistency, eventually it'd be good to bump the "pause" builder & base images to RHEL 9.

[1] https://github.com/openshift/kubernetes/blob/6ab54b8d9a0ea02856efd3835b6f9df5da9ce115/openshift-hack/images/hyperkube/Dockerfile.rhel#L1

[2] https://github.com/openshift/kubernetes/blob/6ab54b8d9a0ea02856efd3835b6f9df5da9ce115/build/pause/Dockerfile.Rhel#L1

[3] https://github.com/openshift/kubernetes/blob/6ab54b8d9a0ea02856efd3835b6f9df5da9ce115/build/pause/Dockerfile.Rhel#L1

[4] https://github.com/openshift-eng/ocp-build-data/blob/openshift-4.15/images/openshift-enterprise-pod.yml

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Builder & base images for "pause" are RHEL 8.

Expected results:

Builder & base images for "pause" are RHEL 9.

Additional info:

https://github.com/openshift/kubernetes/pull/1734

Bug OCPBUGS-39099: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/1111

Bug OCPBUGS-21616: Create from YAML crashes when YAML editor is empty

View the Description View the linked PRs

Description of problem:

When any object is created from YAML with empty editor window, the application crashes.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Navigate to Virtualization -> VirtualMachines 
2. Open "Create VirtualMachine" menu 
3. Select "With YAML"
4. Clear the editor content
5. Click "Create" button

Actual results:

The application crashes

Expected results:

User is notified about invalid/empty editor content.

Additional info:

The same happens in 4.13

https://github.com/openshift/console/pull/13176

Bug OCPBUGS-29007: Failed spot VM machinesets in non-zonal Azure regions

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25940~~. The following is the description of the original issue:
—
Description of problem:

New spot VMs fail to be created by machinesets defining providerSpec.value.spotVMOptions in Azure regions without Availability Zones.

Azure-controller logs the error: Azure Spot Virtual Machine is not supported in Availability Set.

A new availabilitySet is created for each machineset in non-zonal regions, but this only works with normal nodes. Spot VMs and availabilitySets are incompatible as per Microsoft docs for this error: You need to choose to either use an Azure Spot Virtual Machine or use a VM in an availability set, you can't choose both.
From: https://learn.microsoft.com/en-us/azure/virtual-machines/error-codes-spot

Version-Release number of selected component (if applicable):

n/a

How reproducible:

    Always

Steps to Reproduce:

1. Follow the instructions to create a machineset to provision spot VMs: 
  https://docs.openshift.com/container-platform/4.12/machine_management/creating_machinesets/creating-machineset-azure.html#machineset-creating-non-guaranteed-instance_creating-machineset-azure

2. New machines will be in Failed state:
$ oc get machines -A
NAMESPACE               NAME                                            PHASE     TYPE              REGION       ZONE   AGE
openshift-machine-api   mabad-test-l5x58-worker-southindia-spot-c4qr5   Failed                                          7m17s
openshift-machine-api   mabad-test-l5x58-worker-southindia-spot-dtzsn   Failed                                          7m17s
openshift-machine-api   mabad-test-l5x58-worker-southindia-spot-tzrhw   Failed                                          7m28s


3. Events in the failed machines show errors creating spot VMs with availabilitySets:
Events:
  Type     Reason             Age                 From                           Message
  ----     ------             ----                ----                           -------
  Warning  FailedCreate       28s                 azure-controller               InvalidConfiguration: failed to reconcile machine "mabad-test-l5x58-worker-southindia-spot-dx78z": failed to create vm mabad-test-l5x58-worker-southindia-spot-dx78z: failure sending request for machine mabad-test-l5x58-worker-southindia-spot-dx78z: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Azure Spot Virtual Machine is not supported in Availability Set. For more information, see http://aka.ms/AzureSpot/errormessages."

Actual results:

     Machines stay in Failed state and nodes are not created

Expected results:

     Machines get created and new spot VM nodes added to the cluster.

Additional info:

    This problem was identified from a customer alert in an ARO cluster. ICM for ref (requires b- MSFT account): https://portal.microsofticm.com/imp/v3/incidents/incident/455463992/summary

https://github.com/openshift/machine-api-provider-azure/pull/95

Bug OCPBUGS-38383: The Catalog Operator attempts to connect to deleted catalogSources

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38290~~. The following is the description of the original issue:
—
Description of problem:

OLM still check the deleted catsrc of openshift-marketplace

Version-Release number of selected component (if applicable):

4.13

How reproducible:

not always

Steps to Reproduce:

https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-gcp-ipi-sdn-p1-f7/1632127504539979776/artifacts/gcp-ipi-sdn-p1-f7/openshift-extended-test/build-log.txt

In daily CI, we met this issue several times.
for example:
https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-gcp-ipi-sdn-p1-f7/1632127504539979776/artifacts/gcp-ipi-sdn-p1-f7/openshift-extended-test/build-log.txt

prometheus-dependency1-cs has been deleted, but many sub are installed failed due to ErrorPreventedResolution.

"message": "failed to populate resolver cache from source prometheus-dependency1-cs/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup prometheus-dependency1-cs.openshift-marketplace.svc on 172.30.0.10:53: no such host\"",
                "reason": "ErrorPreventedResolution",
                "status": "True",
                "type": "ResolutionFailed"

2023-03-04T22:35:00.761837299Z time="2023-03-04T22:35:00Z" level=info msg="removed client for deleted catalogsource" source="{prometheus-dependency1-cs openshift-marketplace}"

 4114 2023-03-04T22:39:38.039489890Z E0304 22:39:38.039410       1 queueinformer_operator.go:298] sync "e2e-test-olm-a-fa98jfef-sxnxr" failed: failed to populate resolver cach      e from source prometheus-dependency1-cs/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error wh      ile dialing dial tcp: lookup prometheus-dependency1-cs.openshift-marketplace.svc on 172.30.0.10:53: no such host"

Actual results:

The deleted catsrc impacts sub installation.

Expected results:

The deleted catsrc should not impact sub installation.

Additional info:

https://github.com/openshift/operator-framework-olm/pull/840

Bug OCPBUGS-30215: Azure MAO CredentialsRequest Contains Unnecessary virtualMachines/extensions Permissions

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29956~~. The following is the description of the original issue:
—
Description of problem:

CredentialsRequest for Azure AD workload identity contains unnecessary permissions under `virtualMachines/extensions`.   Specifically write and delete.

Version-Release number of selected component (if applicable):

4.14.0+

How reproducible:

Every time

Steps to Reproduce:

    1. Create a cluster without the CredentialsRequest permissions mentioned
    2. Scale machineset
    3. See no permission errors

Actual results:

We have unnecessary permissions, but still no errors

Expected results:

Still no permission errors after removal.

Additional info:

RHCOS doesn't leverage virtual machine extensions.  It appears as though the code path is dead.

Bug OCPBUGS-30862: [release-4.15] 4.15 Control plane won't allow the creation of a 4.14 and lower node pool

View the Description View the linked PRs

Description of problem:

    4.15 control plane can't create a 4.14 node pool due to an issue with payload

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create an Hosted Cluster in 4.15
    2. Create a Node Pool in 4.14
    3. Node pool stuck in provisioning

Actual results:

    No node pool is created

Expected results:

    Node pool is created as we support N-2 version there

Additional info:

Possibly linked to OCPBUGS-26757

https://github.com/openshift/hypershift/pull/3740

Bug OCPBUGS-32744: CSI topology can be disabled even though the env is compatible

View the Description View the linked PRs

Description of problem: In an environment with the following zones, topology was disabled while it should be enabled by default

$ openstack availability zone list --compute
+-----------+-------------+
| Zone Name | Zone Status |
+-----------+-------------+
| AZ-0      | available   |
| AZ-1      | available   |
| AZ-2      | available   |
+-----------+-------------+

$ openstack availability zone list --volume
+-----------+-------------+
| Zone Name | Zone Status |
+-----------+-------------+
| nova      | available   |
| AZ-0      | available   |
| AZ-1      | available   |
| AZ-2      | available   |
+-----------+-------------+

We have a check that verify the number of zones is identical for compute and volumes. This check should be removed. We want however to ensure that for every compute zone we have a matching volume zone.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/165

Story CCO-437: Document conversion from passthrough to manual Azure AD Workload Identity credentials

View the linked PRs

https://github.com/openshift/cloud-credential-operator/pull/598

Bug OCPBUGS-13829: tokenConfig's accessTokenInactivityTimeout fields doesn't work in hypershift guest cluster

View the Description View the linked PRs

Description of problem:

The configured accessTokenInactivityTimeout under tokenConfig in HostedCluster doesn't have any effect.
1. The value is not getting updated in oauth-openshift configmap 
2. hostedcluster allows user to set accessTokenInactivityTimeout value < 300s, where as in master cluster the value should be > 300s.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Install a fresh 4.13 hypershift cluster  
2. Configure accessTokenInactivityTimeout as below:
$ oc edit hc -n clusters
...
  spec:
    configuration:
      oauth:
        identityProviders:
        ...
        tokenConfig:          
          accessTokenInactivityTimeout: 100s
...
3. Check the hcp:
$ oc get hcp -oyaml
...
        tokenConfig:           
          accessTokenInactivityTimeout: 1m40s
...

4. Login to guest cluster with testuser-1 and get the token
$ oc login https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443 -u testuser-1 -p xxxxxxx
$ TOKEN=`oc whoami -t`
$ oc login --token="$TOKEN"
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443" as "testuser-1" using the token provided.
You don't have any projects. You can try to create a new project, by running
    oc new-project <projectname>

Actual results:

1. hostedcluster will allow user to set the value < 300s for accessTokenInactivityTimeout which is not possible on master cluster.

2. The value is not updated in oauth-openshift configmap:
$ oc get cm oauth-openshift -oyaml -n clusters-hypershift-ci-25785 
...
      tokenConfig:
        accessTokenMaxAgeSeconds: 86400
        authorizeTokenMaxAgeSeconds: 300
...

3. Login doesn't fail even if the user is not active for more than the set accessTokenInactivityTimeout seconds.

Expected results:

Login fails if the user is not active within the accessTokenInactivityTimeout seconds.

https://github.com/openshift/hypershift/pull/3025

Bug OCPBUGS-18390: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2081

Bug OCPBUGS-23980: PipelineRun logs not autoscrolling to the bottom of the page

View the Description View the linked PRs

Description:

High volume Pipelinerun/Taskrun logs are not auto scrolling to the bottom of the page.

Steps to reproduce:

1. Create pipelinerun that produces high volume log output
2. navigate to logs page

Video - https://drive.google.com/file/d/17Dc0ME6KYtkyQmW96lT8J_tMfT-dBRbb/view?usp=drive_link

https://github.com/openshift/console/pull/13377

Bug OCPBUGS-29339: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4179

Bug OCPBUGS-43931: OLM Catalog ImageStreams not getting updated on minor release upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43930~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43929~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38425. The following is the description of the original issue:
—
Description of problem:

    When a HostedCluster is upgraded to a new minor version, its OLM catalog imagestreams are not updated to use the tag corresponding to the new minor version.

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1. Create a HostedCluster (4.15.z)
    2. Upgrade the HostedCluster to a new minor version (4.16.z)

Actual results:

    OLM catalog imagestreams remain at the previous version (4.15)

Expected results:

    OLM catalog imagestreams are updated to new minor version (4.16)

Additional info:

https://github.com/openshift/hypershift/pull/5187

Bug OCPBUGS-38059: [release-4.15] Hosted control planes: IDP communication through Konnectivity does not respect outgoing HTTP/s PROXY in DataPlane

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36932~~. The following is the description of the original issue:
—
Description of problem:

Customer defines proxy in its HostedCluster resource definition. The variables are propagated to some pods but not to oauth one:

oc describe pod kube-apiserver-5f5dbf78dc-8gfgs | grep PROX
HTTP_PROXY: http://ocpproxy.corp.example.com:8080
HTTPS_PROXY: http://ocpproxy.corp.example.com:8080
NO_PROXY: .....
oc describe pod oauth-openshift-6d7b7c79f8-2cf99| grep PROX
HTTP_PROXY: socks5://127.0.0.1:8090
HTTPS_PROXY: socks5://127.0.0.1:8090
ALL_PROXY: socks5://127.0.0.1:8090
NO_PROXY: kube-apiserver

apiVersion: hypershift.openshift.io/v1beta1
kind: HostedCluster

...

spec:
autoscaling: {}
clusterID: 9c8db607-b291-4a72-acc7-435ec23a72ea
configuration:

.....
proxy:
httpProxy: http://ocpproxy.corp.example.com:8080
httpsProxy: http://ocpproxy.corp.example.com:8080

Version-Release number of selected component (if applicable): 4.14

https://github.com/openshift/hypershift/pull/4497

Bug OCPBUGS-47680: AWS installation fails when AssociatePublicIpAddress value is set to false in SCP.

View the Description View the linked PRs

This is a clone of issue OCPBUGS-46508. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45186. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45130. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-44373. The following is the description of the original issue:
—
Description of problem:

The installation with aws installation fails when the SCP has the value for AssociatePublicIpAddress set to False. The IAM user is not able to create new EC2 instances i.e. the worker nodes are not getting created. 
However the bootstrap and Master nodes gets created.

The below logs can be observed in the machine-api controller logs :

2024/10/31 16:05:28 failed to create instance: UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws:sts::<account-id>:assumed-role/<role-name> is not authorized to perform: ec2:RunInstances on resource: arn:aws:ec2:ap-southeast-1:<account-id>:network-interface/* with an explicit deny in a service control policy. Encoded authorization failure message: <encoded-message>

Version-Release number of selected component (if applicable):

    4.17

How reproducible:

    Always

Steps to Reproduce:

    1. Set the value of AssociatePublicIpAddress: False inside SCP.
    2. Perform a normal IPI aws installation with IAM user which has the above SCP applied.
    3. Observe that the workers are not getting created.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-provider-aws/pull/122

Bug OCPBUGS-13669: Azure-file-CSI-Driver should not be installed on Azure Stack Hub

View the Description View the linked PRs

Description of problem:

Azure Stack Hub doesn't support Azure-file yet (from https://learn.microsoft.com/en-us/azure-stack/user/azure-stack-acs-differences?view=azs-2206), so we should not install Azure-file-CSI-Driver on it.

$ oc get infrastructures cluster -o json | jq .status.platformStatus.azure
{
  "armEndpoint": "https://management.mtcazs.wwtatc.com",
  "cloudName": "AzureStackCloud",
  "networkResourceGroupName": "wduan-0516b-ash-rs7gh-rg",
  "resourceGroupName": "wduan-0516b-ash-rs7gh-rg"
}
$ oc get clustercsidrivers file.csi.azure.com
NAME                 AGE
file.csi.azure.com   45m
$ oc get sc azurefile-csi
NAME            PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
azurefile-csi   file.csi.azure.com   Delete          Immediate           true                   47m
$ oc describe pvc mydep-pvc-02
  Warning  ProvisioningFailed  <invalid>  file.csi.azure.com_wduan-0516b-ash-rs7gh-master-1_19c3f203-70a7-4d7f-afcc-22665adff5fe  failed to provision volume with StorageClass "azurefile-csi": rpc error: code = Internal desc = failed to ensure storage account: failed to create storage account f0f49c11984fb413a958286, error: &{false 400 0001-01-01 00:00:00 +0000 UTC {
  "code": "StorageAccountInvalidKind",
  "message": "The requested storage account kind is invalid in this location.",
  "target": "StorageAccount"
}}

Version-Release number of selected component (if applicable):
4.13.0-0.nightly-2023-05-11-225357

How reproducible:
Always

Steps to Reproduce:
See Description

Actual results:
Azure-file-CSI-Driver is installed on Azure Stack Hub

Expected results:
Azure-file-CSI-Driver should not be installed on Azure Stack Hub

https://github.com/openshift/cluster-storage-operator/pull/395

Bug OCPBUGS-16788: The file permissions of /var/lib/cni/networks/openshift-sdn in all sdn pods should be updated to 600 to conform with CIS benchmarks

View the Description View the linked PRs

Description of problem:

Observation from CISv1.4 pdf:
1.1.9 Ensure that the Container Network Interface file permissions are set to 600 or more restrictive

"Container Network Interface provides various networking options for overlay networking.
You should consult their documentation and restrict their respective file permissions to maintain the integrity of those files. Those files should be writable by only the administrators on the system."
 
To conform with CIS benchmarksChange, the /var/lib/cni/networks/openshift-sdn files in all sdn pods should be updated to 600.
$ for i in $(oc get pods -n openshift-sdn -l app=sdn -oname); do oc exec -n openshift-sdn $i -- find /var/lib/cni/networks/openshift-sdn -type f -exec stat -c %a {} \;; done
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
Defaulted container "sdn" out of: sdn, kube-rbac-proxy
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644
644

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

The file permissions for /var/lib/cni/networks/openshift-sdn files in all sdn pods is 644

Expected results:

The file permissions for /var/lib/cni/networks/openshift-sdn files in all sdn pods should be updated to 600

Additional info:

https://github.com/openshift/sdn/pull/584

Bug OCPBUGS-21906: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/containernetworking-plugins/pull/128

Bug OCPBUGS-28822: [AWS Edge Zone] Failed to install on the regions which only one type of edge zone available

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27737~~. The following is the description of the original issue:
—
Description of problem:

Failed to install OCP on the below LZ/WLZ, the common point in the below regions is that all of them have only one type of zones: LZ or WLZ. e.g. in af-south-1, only LZ is available, no WL, in ap-northeast-2, only WL is available, no LZ.



Failed regions/zones:

af-south-1 ['af-south-1-los-1a']     
failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[1].platform.aws: Internal error: getting Local Zones: unable to retrieve Wavelength Zone names: no zones with type wavelength-zone in af-south-1

ap-south-1 ['ap-south-1-ccu-1a', 'ap-south-1-del-1a']
level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[1].platform.aws: Internal error: getting Local Zones: unable to retrieve Wavelength Zone names: no zones with type wavelength-zone in ap-south-1

ap-southeast-1 ['ap-southeast-1-bkk-1a', 'ap-southeast-1-mnl-1a']
level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[1].platform.aws: Internal error: getting Local Zones: unable to retrieve Wavelength Zone names: no zones with type wavelength-zone in ap-southeast-1

me-south-1 ['me-south-1-mct-1a']     
level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[1].platform.aws: Internal error: getting Local Zones: unable to retrieve Wavelength Zone names: no zones with type wavelength-zone in me-south-1

ap-southeast-2 ['ap-southeast-2-akl-1a', 'ap-southeast-2-per-1a']
level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[1].platform.aws: Internal error: getting Local Zones: unable to retrieve Wavelength Zone names: no zones with type wavelength-zone in ap-southeast-2

eu-north-1 ['eu-north-1-cph-1a', 'eu-north-1-hel-1a']
level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[1].platform.aws: Internal error: getting Local Zones: unable to retrieve Wavelength Zone names: no zones with type wavelength-zone in eu-north-1

ap-northeast-2 ['ap-northeast-2-wl1-cjj-wlz-1', 'ap-northeast-2-wl1-sel-wlz-1']
level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[1].platform.aws: Internal error: getting Local Zones: unable to retrieve Local Zone names: no zones with type local-zone in ap-northeast-2

ca-central-1 ['ca-central-1-wl1-yto-wlz-1']
level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[1].platform.aws: Internal error: getting Local Zones: unable to retrieve Local Zone names: no zones with type local-zone in ca-central-1

eu-west-2	['eu-west-2-wl1-lon-wlz-1', 'eu-west-2-wl1-man-wlz-1', 'eu-west-2-wl2-man-wlz-1']
level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[1].platform.aws: Internal error: getting Local Zones: unable to retrieve Local Zone names: no zones with type local-zone in eu-west-2

Version-Release number of selected component (if applicable):

4.15.0-rc.3-x86_64

How reproducible:

Steps to Reproduce:

1) install OCP on above regions/zones

Actual results:

See description.

Expected results:

Don't check LZ's availability while installing OCP in WLZ
Don't check WLZ's availability while installing OCP in LZ

Additional info:

https://github.com/openshift/installer/pull/7973

Bug OCPBUGS-21773: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1958

Bug OCPBUGS-21822: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3103

Bug OCPBUGS-23462: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/771

Bug OCPBUGS-44278: ROKS v4.16.16 HyperShift-based clusters fail to get oauth token in the OpenShift web console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44277~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-44276~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-44163. The following is the description of the original issue:
—
Description of problem:

We identified a regression where we can no longer get oauth tokens for HyperShift v4.16 clusters via the OpenShift web console. v4.16.10 works fine, but once clusters are patched to v4.16.16 (or are created at that version) they fail to get the oauth token. 

This is due to this faulty PR: https://github.com/openshift/hypershift/pull/4496.

The oauth openshift deployment was changed and affected the IBM Cloud code path.  We need this endpoint to change back to using `socks5`.

Bug:
<           value: socks5://127.0.0.1:8090
---
>           value: http://127.0.0.1:8092
98c98
<           value: socks5://127.0.0.1:8090
---
>           value: http://127.0.0.1:80924:53
Fix:
Change http://127.0.0.1:8092 to socks5://127.0.0.1:8090

Version-Release number of selected component (if applicable):

4.16.16

How reproducible:

Every time.

Steps to Reproduce:

    1. Create ROKS v4.16.16 HyperShift-based cluster. 
    2. Navigate to the OpenShift web console.
    2. Click IAM#<username> menu in the top right.
    3. Click 'Copy login command'.
    4. Click 'Display token'.

Actual results:

Error getting token: Post "https://example.com:31335/oauth/token": http: server gave HTTP response to HTTPS client

Expected results:

The oauth token should be successfully displayed.

Additional info:

https://github.com/openshift/hypershift/pull/5068

Bug OCPBUGS-19100: Update 4.15 ose-csi-snapshot-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/104

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/104

Bug OCPBUGS-22166: network-tools throwing errors on --help

View the Description View the linked PRs

Description of problem:

network-tools -h
error: You must be logged in to the server (Unauthorized)
error: You must be logged in to the server (Unauthorized)
Usage: network-tools [command]

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/network-tools/pull/93

Bug OCPBUGS-22830: Specify google cloud CLI to version 447.0.0

View the Description View the linked PRs

Description of problem:

google CLI deprecated Python 3.5-3.7 from 448.0.0 causing release ci jobs failed with ERROR: gcloud failed to load. You are running gcloud with Python 3.6, which is no longer supported by gcloud. . specified version to 447.0.0
job link: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-o[…]cp-upi-f28-destructive/1719562110486188032

https://github.com/openshift/installer/pull/7663

Bug OCPBUGS-29020: external-dns causing route53 throttling

View the Description View the linked PRs

Many jobs are failing because route53 is throttling us during cluster creation.
We need a make external-dns make fewer calls.

The theoretical minimum is:
list zones - 1 call
list zone records - (# of records / 100) calls
create 3 records per HC - 1-3 calls depending on how they are batched

https://github.com/openshift/hypershift/pull/3521

Bug OCPBUGS-3541: When an ingresscontroller with empty/invalid spec is created and then deleted, "route_metrics_controiller_routes_per_shard" metric displays incorrect value

View the Description View the linked PRs

Description of problem:

When creating an ingresscontroller with empty spec (or where spec.domain clashes with an existing IC), the ingresscontroller's status shows  Admitted as "False" and reason is "Invalid". However, "route_controller_metrics_routes_per_shard" metric shows the shard in the Observe tab of the web-console.

When the invalid ingresscontroller is deleted, the "route_controller_metrics_routes_per_shard" metric
does not clear the row corresponding to the deleted invalid IC.

Version-Release number of selected component (if applicable):

4.12.0-ec5

How reproducible:

Always

Steps to Reproduce:

1. Create the invalid IC with the following spec:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: ic-invalid
  namespace: openshift-ingress-operator
spec: {}

2. Check the status of the IC:

$ oc get ingresscontroller -n openshift-ingress-operator ic-invalid -oyaml
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"operator.openshift.io/v1","kind":"IngressController","metadata":{"annotations":{},"name":"ic-invalid","namespace":"openshift-ingress-operator"},"spec":{}}
  creationTimestamp: "2022-11-11T12:53:41Z"
  generation: 1
  name: ic-invalid
  namespace: openshift-ingress-operator
  resourceVersion: "97453"
  uid: 96eae28e-bb14-447e-822f-602f3a3bb378
spec:
  httpEmptyRequestsPolicy: Respond
status:
  availableReplicas: 0
  conditions:
  - lastTransitionTime: "2022-11-11T12:53:41Z"
    message: 'conflicts with: default'
    reason: Invalid
    status: "False"
    type: Admitted
  domain: apps.arsen-cluster1.devcluster.openshift.com
  endpointPublishingStrategy:
    loadBalancer:
      dnsManagementPolicy: Managed
      providerParameters:
        aws:
          classicLoadBalancer:
            connectionIdleTimeout: 0s
          type: Classic
        type: AWS
      scope: External
    type: LoadBalancerService
  observedGeneration: 1
  selector: ""

3. Check the "route_metrics_controller_routes_per_shard" metric on the web-console

4. Delete the IC

5. Check the "route_metrics_controller_routes_per_shard" metric again on the web-console

Actual results:

As shown in the attached screenshot, "route_metrics_controller_routes_per_shard" metric adds one row for the
invalid IC. This is not cleared even when the IC is deleted.

Expected results:

The "route_metrics_controller_routes_per_shard" metric should not add metric for invalid ICs.
Additionally, when the invalid IC is deleted the metric should clear the corresponding row.

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/869

Bug OCPBUGS-18567: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/965

Bug OCPBUGS-48281: OWNERS update

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-48159~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-47768~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-47725. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-47526. The following is the description of the original issue:
—
Description of problem:

  OWNERS file updated to include prabhakar and Moe as owners and reviewers

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    This is to fecilitate easy backport via automation

https://github.com/openshift/builder/pull/427

Bug OCPBUGS-19271: Update 4.15 hypershift image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/hypershift/pull/3017

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-20331: previously disabled cluster capability Console unintentionally enabled during an upgrade

View the Description View the linked PRs

Description of problem:

a 4.13 cluster installed with
baselineCapabilitySet: None
additionalEnabledCapabilities: ['NodeTuning', 'CSISnapshot']

an upgrade to 4.14 causing a previously disabled Console to became ImplicitlyEnabled (in contrast with newly added 4.14 capabilities that are expected to be enabled implicitly in this case)

'ImplicitlyEnabledCapabilities'
{
  "lastTransitionTime": "2023-10-09T19:08:29Z",
  "message": "The following capabilities could not be disabled: Console, ImageRegistry, MachineAPI",
  "reason": "CapabilitiesImplicitlyEnabled",
  "status": "True",
  "type": "ImplicitlyEnabledCapabilities"
}

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-220853

How reproducible:

100%

Steps to Reproduce:

as described above

Additional info:

the root cause appears to be https://github.com/openshift/cluster-kube-apiserver-operator/pull/1542

more info in https://redhat-internal.slack.com/archives/CB48XQ4KZ/p1696940380413289

Bug OCPBUGS-21626: tokenConfig's accessTokenInactivityTimeout in hosted cluster is not consistent with management cluster

View the Description View the linked PRs

Description: If tokenConfig.accessTokenInactivityTimeout set to less than 300s, the accessTokenInactivityTimeout doesn't work in hosted cluster whereas in Management cluster, we get below error while trying to set the timeout < 300s :

spec.tokenConfig.accessTokenInactivityTimeout: Invalid value: v1.Duration{Duration:100000000000}: the minimum acceptable token timeout value is 300 seconds*

Steps to reproduce the issue:

1. Install a fresh 4.15 hypershift cluster  
2. Configure accessTokenInactivityTimeout as below:
$ oc edit hc -n clusters
...
  spec:
    configuration:
      oauth:
        identityProviders:
        ...
        tokenConfig:          
          accessTokenInactivityTimeout: 100s
...
3. Wait for the oauth pods to redeploy and check the oauth cm for updated accessTokenInactivityTimeout value:
$ oc get cm oauth-openshift -oyaml -n clusters-hypershift-ci-xxxxx 
...
        tokenConfig:           
          accessTokenInactivityTimeout: 1m40s
...
4. Login to guest cluster with testuser-1 and get the token
$ oc login https://a889<...>:6443 -u testuser-1 -p xxxxxxx
$ TOKEN=`oc whoami -t`

Actual result:

Wait for 100s and try login with the TOKEN
$ oc login --token="$TOKEN"
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://a889<...>:6443" as "testuser-1" using the token provided.
You don't have any projects. You can try to create a new project, by running
    oc new-project <projectname>

Expected result:

1. Login fails if the user is not active within the accessTokenInactivityTimeout seconds.

2. In Management cluster, we get below error when trying to set the timeout to less than 300s :
spec.tokenConfig.accessTokenInactivityTimeout: Invalid value: v1.Duration{Duration:100000000000}: the minimum acceptable token timeout value is 300 seconds* 
Implement the same in hosted cluster.

https://github.com/openshift/hypershift/pull/3110

Bug OCPBUGS-22061: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-kubevirt/pull/34

Bug OCPBUGS-29843: improve empty state message for Machines and MachineSets page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27247~~. The following is the description of the original issue:
—
Description of problem:

in UPI cluster, there is no MachineSets and Machines resource, when user visits Machines and MachineSets list page, we will see simple text 'Not found'

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-16-113018

How reproducible:

Always

Steps to Reproduce:

1. setup UPI cluster
2. goes to MachineSets and Machines list page, check the empty state message

Actual results:

2. we just simply show 'Not found' text

Expected results:

2. for other resources, we show richer text 'No <resourcekind> found', so we should also show 'No Machines found' and 'No MachineSets found' for these pages

Additional info:

https://github.com/openshift/console/pull/13628

Bug OCPBUGS-37566: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1716

Bug OCPBUGS-18932: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/302

Bug OCPBUGS-19714: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13229

Bug OCPBUGS-29509: "Failed to watch *v1.PartialObjectMetadata" errors in prometheus-operator logs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29304~~. The following is the description of the original issue:
—
Description of problem:

    Sometimes the prometheus-operator's informer will be stuck because it receives objects that can't be converted to *v1.PartialObjectMetadata.

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Not always

Steps to Reproduce:

    1. Unknown
    2.
    3.

Actual results:

    prometheus-operator logs show errors like

2024-02-09T08:29:35.478550608Z level=warn ts=2024-02-09T08:29:35.478491797Z caller=klog.go:108 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:110: failed to list *v1.PartialObjectMetadata: Get \"https://172.30.0.1:443/api/v1/secrets?resourceVersion=29022\": dial tcp 172.30.0.1:443: connect: connection refused"
2024-02-09T08:29:35.478592909Z level=error ts=2024-02-09T08:29:35.478541608Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:110: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Get \"https://172.30.0.1:443/api/v1/secrets?resourceVersion=29022\": dial tcp 172.30.0.1:443: connect: connection refused"

Expected results:

    No error

Additional info:

    The bug has been introduced in v0.70.0 by https://github.com/prometheus-operator/prometheus-operator/pull/5993 so it only affects 4.16 and 4.15.

https://github.com/openshift/prometheus-operator/pull/278

Bug OCPBUGS-30413: openshift/images repository lacks a CI job to run unit tests

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29484~~. The following is the description of the original issue:
—

Description of problem

The egress-router implementations under https://github.com/openshift/images/tree/master/egress have unit tests alongside the implementations, within the same repository, but the repository does not have a CI job to run those unit tests. We do not have any tests for egress-router in https://github.com/openshift/origin. This means that we are effectively lacking CI test coverage for egress-router.

Version-Release number of selected component (if applicable)

All versions.

How reproducible

100%.

Steps to Reproduce

1. Open a PR in https://github.com/openshift/images and check which CI jobs are run on it.
2. Check the job definitions in https://github.com/openshift/release/blob/master/ci-operator/jobs/openshift/images/openshift-images-master-presubmits.yaml.

Actual results

There are "ci/prow/e2e-aws", "ci/prow/e2e-aws-upgrade", and "ci/prow/images" jobs defined, but no "ci/prow/unit" job.

Expected results

There should be a "ci/prow/unit" job, and this job should run the unit tests that are defined in the repository.

Additional info

The lack of a CI job came up on https://github.com/openshift/images/pull/162.

https://github.com/openshift/images/pull/168

Bug OCPBUGS-32264: installer log bundle should gather console logs even when ssh fails

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30774~~. The following is the description of the original issue:
—
Description of problem:

    When the installer gathers a log bundle after failure (either automatically or with gather bootstrap), the installer fails to return serial console logs if an SSH connection to the bootstrap node is refused. 

Even if the serial console logs were collected, the installer exits on error if ssh connection is refused:

time="2024-03-09T20:59:26Z" level=info msg="Pulling VM console logs"
time="2024-03-09T20:59:26Z" level=debug msg="Search for matching instances by tag in us-west-1 matching aws.Filter{\"kubernetes.io/cluster/ci-op-4ygffz3q-be93e-jnn92\":\"owned\"}"
time="2024-03-09T20:59:26Z" level=debug msg="Search for matching instances by tag in us-west-1 matching aws.Filter{\"openshiftClusterID\":\"2f9d8822-46fd-4fcd-9462-90c766c3d158\"}"
time="2024-03-09T20:59:27Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-bootstrap" Instance=i-0413f793ffabe9339
time="2024-03-09T20:59:27Z" level=debug msg="Download complete" Instance=i-0413f793ffabe9339
time="2024-03-09T20:59:27Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-master-0" Instance=i-0ab5f920818366bb8
time="2024-03-09T20:59:27Z" level=debug msg="Download complete" Instance=i-0ab5f920818366bb8
time="2024-03-09T20:59:27Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-master-2" Instance=i-0b93963476818535d
time="2024-03-09T20:59:27Z" level=debug msg="Download complete" Instance=i-0b93963476818535d
time="2024-03-09T20:59:28Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-master-1" Instance=i-0797728e092bfbeef
time="2024-03-09T20:59:28Z" level=debug msg="Download complete" Instance=i-0797728e092bfbeef
time="2024-03-09T20:59:28Z" level=info msg="Pulling debug logs from the bootstrap machine"
time="2024-03-09T20:59:28Z" level=debug msg="Added /tmp/bootstrap-ssh3643557583 to installer's internal agent"
time="2024-03-09T20:59:28Z" level=debug msg="Added /tmp/.ssh/ssh-privatekey to installer's internal agent"
time="2024-03-09T21:01:39Z" level=error msg="Attempted to gather debug logs after installation failure: failed to connect to the bootstrap machine: dial tcp 13.57.212.80:22: connect: connection timed out"

from: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_api/1788/pull-ci-openshift-api-master-e2e-aws-ovn/1766560949898055680

We can see the console logs were downloaded, they should be saved in the log bundle.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Failed install where SSH to bootstrap node fails. https://github.com/openshift/installer/pull/8137 provides a potential reproducer
    2.
    3.

Actual results:

Expected results:

Additional info:

Error handling needs to be reworked here: https://github.com/openshift/installer/blob/master/cmd/openshift-install/gather.go#L160-L190

https://github.com/openshift/installer/pull/8274

Bug OCPBUGS-23956: After PatternFly5 update: Task node has text decoration on hover

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13371

Bug OCPBUGS-25596: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/888

Bug OCPBUGS-45008: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/795

Task MON-3376: Remove deprecated --logtostderr argument of kube-rbac-proxy

View the Description View the linked PRs

The argument has been deprecated in the v0.14.0 release:

https://github.com/brancz/kube-rbac-proxy/releases/tag/v0.14.0

https://github.com/openshift/cluster-monitoring-operator/pull/2077

Bug OCPBUGS-20192: openshift.io/scc: restricted-readonly when setting up router sharding

View the Description View the linked PRs

When setting up router sharding with `endpointPublishingStrategy: Private` in a OCP 4.13.11 BareMetal cluster, the restricted-readonly scc is added to the router pods. Causing them to CrashLoopBackOff:

~~~
$ oc get pod -n openshift-ingress router-spinque-xxx -oyaml | grep -i scc
openshift.io/scc: restricted-readonly <<<
$ oc get pod -n openshift-ingress router-spinque-xxxj -oyaml | grep -i scc
openshift.io/scc: restricted-readonly <<<<
$ oc get pod -n openshift-ingress router-spinque-xxx -oyaml | grep -i scc
openshift.io/scc: restricted-readonly <<<<
~~~
~~~
router-spinque-xxx 0/1 CrashLoopBackOff 27 2h
router-spinque-xxx 0/1 CrashLoopBackOff 27 2h
router-spinque-xxx 0/1 CrashLoopBackOff 27 2h
~~~

Please find the must-gather as well as the sos-report from one of the nodes in the case 03624389 in supportshell

—

The following scc config can be used to reproduce this issue on any platform:

allowPrivilegeEscalation: true
allowedCapabilities: []
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  type: MustRunAs
groups:
- system:authenticated
kind: SecurityContextConstraints
metadata:
  name: bad-router
priority: 0
readOnlyRootFilesystem: true
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
  type: MustRunAsRange
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users: []
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret

Save the above yaml as bad-router-scc.yaml then apply it to your cluster:

$ oc apply -f bad-router-scc.yaml

Force the restart of router pods, such as by deleting one:

$ oc delete pod router-default-6465854689-gvjhs

The newly started pod(s) should be running but not ready, with the bad-router scc:

$ oc get pods
NAME                              READY   STATUS    RESTARTS   AGE
router-default-6465854689-7x558   0/1     Running   0          49s
$ oc get pod router-default-6465854689-7x558 -o yaml|grep scc
    openshift.io/scc: bad-router

If you wait long enough, it will restart multiple times, and eventually enter the CrashLoopBackOff state

https://github.com/openshift/cluster-ingress-operator/pull/981

Bug OCPBUGS-26043: Adding test case when exceed openshift.io/image-tags will ban to create new image references in the project

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25943~~. The following is the description of the original issue:
—
Description of problem:

Adding test case when exceed openshift.io/image-tags will ban to create new image references in the project

Version-Release number of selected component (if applicable):

    4.16

pr - https://github.com/openshift/origin/pull/28464

https://github.com/openshift/origin/pull/28492

Bug OCPBUGS-27405: GCP machine-API provider permissions should support publicIP

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27385~~. The following is the description of the original issue:
—

Description of problem:

In a 4.16.0-ec.1 cluster, scaling up a MachineSet with publicIP:true fails with:

$ oc -n openshift-machine-api get -o json machines.machine.openshift.io | jq -r '.items[] | select(.status.phase == "Failed") | .status.providerStatus.conditions[].message' | sort  | uniq -c
      1 googleapi: Error 403: Required 'compute.subnetworks.useExternalIp' permission for 'projects/openshift-gce-devel-ci-2/regions/us-central1/subnetworks/ci-ln-q4d8y8t-72292-msmgw-worker-subnet', forbidden

Version-Release number of selected component

Seen in 4.16.0-ec.1. Not noticed in 4.15.0-ec.3. Fix likely needs a backport to 4.15 to catch up with ~~OCPBUGS-26406~~.

How reproducible

Seen in the wild in a cluster after updating from 4.15.0-ec.3 to 4.16.0-ec.1. Reproduced in Cluster Bot on the first attempt, so likely very reproducible.

Steps to Reproduce

launch 4.16.0-ec.1 gcp Cluster Bot cluster (logs).

$ oc adm upgrade
Cluster version is 4.16.0-ec.1

Upstream: https://api.integration.openshift.com/api/upgrades_info/graph
Channel: candidate-4.16 (available channels: candidate-4.16)
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.
$ oc -n openshift-machine-api get machinesets
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-q4d8y8t-72292-msmgw-worker-a   1         1         1       1           60m
ci-ln-q4d8y8t-72292-msmgw-worker-b   1         1         1       1           60m
ci-ln-q4d8y8t-72292-msmgw-worker-c   1         1         1       1           60m
ci-ln-q4d8y8t-72292-msmgw-worker-f   0         0                             60m
$ oc -n openshift-machine-api get -o json machinesets | jq -c '.items[].spec.template.spec.providerSpec.value.networkInterfaces' | sort | uniq -c
      4 [{"network":"ci-ln-q4d8y8t-72292-msmgw-network","subnetwork":"ci-ln-q4d8y8t-72292-msmgw-worker-subnet"}]
$ oc -n openshift-machine-api edit machineset ci-ln-q4d8y8t-72292-msmgw-worker-f  # add publicIP
$ oc -n openshift-machine-api get -o json machineset ci-ln-q4d8y8t-72292-msmgw-worker-f | jq -c '.spec.template.spec.providerSpec.value.networkInterfaces'
[{"network":"ci-ln-q4d8y8t-72292-msmgw-network","publicIP":true,"subnetwork":"ci-ln-q4d8y8t-72292-msmgw-worker-subnet"}]
$ oc -n openshift-machine-api scale --replicas 1 machineset ci-ln-q4d8y8t-72292-msmgw-worker-f
$ sleep 300
$ oc -n openshift-machine-api get -o json machines.machine.openshift.io | jq -r '.items[] | select(.status.phase == "Failed") | .status.providerStatus.conditions[].message' | sort  | uniq -c

Actual results

      1 googleapi: Error 403: Required 'compute.subnetworks.useExternalIp' permission for 'projects/openshift-gce-devel-ci-2/regions/us-central1/subnetworks/ci-ln-q4d8y8t-72292-msmgw-worker-subnet', forbidden

Expected results

Successfully created machines.

Additional info

I would expect the CredentialsRequest to ask for this permission, but it doesn't seem to. The old roles/compute.admin includes it, and it probably just needs to be added explicitly. Not clear how many other permissions might also need explicit listing.

https://github.com/openshift/machine-api-operator/pull/1207

Bug OCPBUGS-35894: The third master is not joining to the cluster on an Agent Based Installations

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32105~~. The following is the description of the original issue:
—
After performing an Agent Based Installation on Baremetal, the master node which was initially the rendezvous host is not joining to the cluster.

Checking podman containers on this node we see that 'assisted-installer' pod appears with 143 exit code after the second master is detected as ready:

2024-04-01T15:21:14.677437000Z time="2024-04-01T15:21:14Z" level=info msg="Found 1 ready master nodes"
2024-04-01T15:21:19.684831000Z time="2024-04-01T15:21:19Z" level=info msg="Found a new ready master node <second-master> with id <master-id>"

podman pods status:

$ podman ps -a
CONTAINER ID  IMAGE                                                                                                                   COMMAND               CREATED         STATUS                     PORTS       NAMES
20b338ab8906  localhost/podman-pause:4.4.1-1707368644                                                                                                       16 hours ago    Up 16 hours                            d2b97e733b33-infra
0876c611f655  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27c5328e1d9a0d7db874c6e52efae631ab3c29a3d4da50c50b2e783dcb784128  /bin/bash start_d...  16 hours ago    Up 16 hours                            assisted-db
a9a116bed3a7  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27c5328e1d9a0d7db874c6e52efae631ab3c29a3d4da50c50b2e783dcb784128  /assisted-service     16 hours ago    Up 16 hours                            service
0afbe44c2cf2  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27c5328e1d9a0d7db874c6e52efae631ab3c29a3d4da50c50b2e783dcb784128  /usr/local/bin/ag...  16 hours ago    Exited (0) 16 hours ago                apply-host-config
45da1bdf2440  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b3daca74ad515845d5f8dcf384f0e51d58751a2785414edc3f20969a6fc0403  next_step_runner ...  16 hours ago    Up 16 hours                            next-step-runner
8d1306b0ea3a  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:79e97d8cbd27e2c7402f7e016de97ca2b1f4be27bd52a981a27e7a2132be1ef4  --role bootstrap ...  16 hours ago    Exited (143) 15 hours ago              assisted-installer
8b0cc08890b4  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f44844c4024dfa35688eac52e5e3d1540311771c4a24fef1ba4a6dccecc0e55  start --node-name...  16 hours ago    Exited (0) 16 hours ago                hungry_varahamihira
4916c14b9f7e  registry.redhat.io/rhel9/support-tools:latest                                                                           /usr/bin/bash         34 seconds ago  Up 34 seconds                          toolbox-core

crio pods status:

CONTAINER           IMAGE                                                                                                                    CREATED             STATE               NAME                 ATTEMPT             POD ID              POD
03b89032db0bc       98fc664e8c2aa859c10ec8ea740b083c7c85925d75506bcb85c6c9c640945c36                                                         13 seconds ago      Exited              etcd                 182                 5d42cdad70890       etcd-bootstrap-member-<failed-master-name>.local
01008c6e32e5a       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b38d75b297fa52d1ba29af0715cec2430cd5fda1a608ed0841a09c55c292fb3   16 hours ago        Running             coredns              0                   5f8736b856a0c       coredns-<failed-master-name> 5e00e89ebef34       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2e119d0d9f8470dd634a62329d2670602c5f169d0d9bbe5ad25cee07e716c94b   16 hours ago        Exited              render-config        0                   5f8736b856a0c       coredns-<failed-master-name> f5098d5d27a39       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2e119d0d9f8470dd634a62329d2670602c5f169d0d9bbe5ad25cee07e716c94b   16 hours ago        Running             keepalived-monitor   0                   4fb91cefa8a9e       keepalived-<failed-master-name> a1e9d4c8cf477       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d24879d39e10fcf00a7c28ab23de1d6cf0c433a1234ff34880f12642b75d4512   16 hours ago        Running             keepalived           0                   4fb91cefa8a9e       keepalived-<failed-master-name> de21bc99f0d3f       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8c74c57f91f0f7ed26bb62f58c7b84c55750e51947fd6cc5711fa18f30b9f68c   16 hours ago        Running             etcdctl              0                   5d42cdad70890       etcd-bootstrap-member-<failed-master-name>

https://github.com/openshift/assisted-installer/pull/859

Bug OCPBUGS-29116: ResolutionFailed doesn't clear after recovery

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24587~~. The following is the description of the original issue:
—

Description of problem:

Installation some operators. After some time the ResolutionFailed showing up:

$ kubectl get subscription.operators -A -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,ResolutionFailed:.status.conditions[?(@.type=="ResolutionFailed")].status,MSG:.status.conditions[?(@.type=="ResolutionFailed")].message'
NAMESPACE                   NAME                                                                         ResolutionFailed   MSG
infra-sso                   rhbk-operator                                                                True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
metallb-system              metallb-operator-sub                                                         True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
multicluster-engine         multicluster-engine                                                          True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
open-cluster-management     acm-operator-subscription                                                    True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
openshift-cnv               kubevirt-hyperconverged                                                      True               [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.202.255:50051: connect: connection refused"]
openshift-gitops-operator   openshift-gitops-operator                                                    True               [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.202.255:50051: connect: connection refused"]
openshift-local-storage     local-storage-operator                                                       True               [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.202.255:50051: connect: connection refused"]
openshift-nmstate           kubernetes-nmstate-operator                                                  <none>             <none>
openshift-operators         devworkspace-operator-fast-redhat-operators-openshift-marketplace            <none>             <none>
openshift-operators         external-secrets-operator                                                    <none>             <none>
openshift-operators         web-terminal                                                                 <none>             <none>
openshift-storage           lvms                                                                         <none>             <none>
openshift-storage           mcg-operator-stable-4.14-redhat-operators-openshift-marketplace              <none>             <none>
openshift-storage           ocs-operator-stable-4.14-redhat-operators-openshift-marketplace              <none>             <none>
openshift-storage           odf-csi-addons-operator-stable-4.14-redhat-operators-openshift-marketplace   <none>             <none>
openshift-storage           odf-operator                                                                 <none>             <none>

At the package server logs you can see one time the catalog source is not available, after a while the catalog source is available but the error doesn't disappear from the subscription.

Package server logs:

time="2023-12-05T14:27:09Z" level=warning msg="error getting bundle stream" action="refresh cache" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.30.37.69:50051: connect: connection refused\"" source="{redhat-operators openshift-marketplace}"
time="2023-12-05T14:27:09Z" level=info msg="updating PackageManifest based on CatalogSource changes: {community-operators openshift-marketplace}" action="sync catalogsource" address="community-operators.openshift-marketplace.svc:50051" name=community-operators namespace=openshift-marketplace
time="2023-12-05T14:28:26Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-marketplace openshift-marketplace}" action="sync catalogsource" address="redhat-marketplace.openshift-marketplace.svc:50051" name=redhat-marketplace namespace=openshift-marketplace
time="2023-12-05T14:30:23Z" level=info msg="updating PackageManifest based on CatalogSource changes: {certified-operators openshift-marketplace}" action="sync catalogsource" address="certified-operators.openshift-marketplace.svc:50051" name=certified-operators namespace=openshift-marketplace
time="2023-12-05T14:35:56Z" level=info msg="updating PackageManifest based on CatalogSource changes: {certified-operators openshift-marketplace}" action="sync catalogsource" address="certified-operators.openshift-marketplace.svc:50051" name=certified-operators namespace=openshift-marketplace
time="2023-12-05T14:37:28Z" level=info msg="updating PackageManifest based on CatalogSource changes: {community-operators openshift-marketplace}" action="sync catalogsource" address="community-operators.openshift-marketplace.svc:50051" name=community-operators namespace=openshift-marketplace
time="2023-12-05T14:37:28Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-operators openshift-marketplace}" action="sync catalogsource" address="redhat-operators.openshift-marketplace.svc:50051" name=redhat-operators namespace=openshift-marketplace
time="2023-12-05T14:39:40Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-marketplace openshift-marketplace}" action="sync catalogsource" address="redhat-marketplace.openshift-marketplace.svc:50051" name=redhat-marketplace namespace=openshift-marketplace
time="2023-12-05T14:46:07Z" level=info msg="updating PackageManifest based on CatalogSource changes: {certified-operators openshift-marketplace}" action="sync catalogsource" address="certified-operators.openshift-marketplace.svc:50051" name=certified-operators namespace=openshift-marketplace
time="2023-12-05T14:47:37Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-operators openshift-marketplace}" action="sync catalogsource" address="redhat-operators.openshift-marketplace.svc:50051" name=redhat-operators namespace=openshift-marketplace
time="2023-12-05T14:48:21Z" level=info msg="updating PackageManifest based on CatalogSource changes: {community-operators openshift-marketplace}" action="sync catalogsource" address="community-operators.openshift-marketplace.svc:50051" name=community-operators namespace=openshift-marketplace
time="2023-12-05T14:49:53Z" level=info msg="updating

Version-Release number of selected component (if applicable):

4.14.3

How reproducible:

Steps to Reproduce:

    1. Install an operator for example metallb
    2. Wait until the catalog pod is not available for on time.
    3. ResolutionFailed doesn't disappear anymore

Actual results:

ResolutionFailed doesn't disappear anymore from subscription.

Expected results:

ResolutionFailed disappear from subscription.

https://github.com/openshift/operator-framework-olm/pull/684

Bug OCPBUGS-19184: Update 4.15 cluster-storage-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-storage-operator/pull/398

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-storage-operator/pull/398

Bug OCPBUGS-22741: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/thanos/pull/131

Bug OCPBUGS-24078: Update 4.15 ose-cluster-policy-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-policy-controller/pull/143

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-policy-controller/pull/143

Bug OCPBUGS-28817: 4.15: Storage is Progressing when it can't connect to vCenter

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27159~~. The following is the description of the original issue:
—
This is continuation of ~~OCPBUGS-23342~~, now the vmware-vsphere-csi-driver-operator cannot connect to vCenter at all. Tested using invalid credentials.

The operator ends up with no Progressing condition during upgrade from 4.11 to 4.12, and cluster-storage-operator interprets it as Progressing=true.

Bug OCPBUGS-29079: Promote ecr-credential-provider image with RPM

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28548~~. The following is the description of the original issue:
—
Description of problem:

In https://github.com/openshift/release/pull/47648 ecr-credentials-provider is built in CI and later included in RHCOS.

In order to make it work on OKD it needs to be included in the payload, so that OKD machine-os could extract RPM and install it on the host

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Ref: ~~OCPBUGS-25662~~

Bug OCPBUGS-18984: Potentially inconsistent snapshots taken from UpgradeBackupController on z releases

View the Description View the linked PRs

Description of problem:

With the fix for BZ 2079803 [1] we have introduced a backup trigger on every z-release (instead of every y-release). Sadly we have not updated the CVO [2] logic along with it, which effectively stops the upgrade until a snapshot was taken. 

Currently we have a split state machine (thanks Trevor):

... today we have this for minor updates:
1. User bumps ClusterVersion spec asking for a minor update
2. CVO checks for a recent etcd backup.  Until it is available, we refuse to accept the retarget request.
3. Once the etcd backup is available (assuming no other precondition issues), we accept the retarget and start updating.

While for patch updates:
1. User bumps ClusterVersion spec asking for a minor update.
2. CVO accepts the retarget, sets status.desired , and starts in on the update


In the latter two cases, it might be that the CEO takes a snapshot while the upgrade is already running (race condition). This creates an inconsistent snapshot, which on restore would just re-attempt to execute the (botched) upgrade.


[1] https://github.com/openshift/cluster-etcd-operator/pull/835
[2] https://github.com/openshift/cluster-version-operator/blob/master/pkg/payload/precondition/clusterversion/etcdbackup.go#L76-L77

Version-Release number of selected component (if applicable):

any OCP > 4.10

How reproducible:

almost always (race condition between CEO and CVO)

Steps to Reproduce:

1. trigger a z-upgrade
2. observe when the etcd backup is taken, it might happen after the upgrade is already in progress

Actual results:

The snapshot that was created contains parts of the newly upgraded OCP (CVO CRD or any other operator state).

Expected results:

The snapshot should not contain any information that could come through with the z-upgrade.

Additional info:

Either the CVO should also wait on z-upgrades to ensure the snapshots are consistently on a pre-upgrade state, or we revert the z-stream upgrade behavior again.

—

William Caban and our team decided to entirely remove the controller.

W. Trevor King to drop the requirement in CVO.

Bug OCPBUGS-24072: Update 4.15 ose-aws-ebs-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver/pull/245

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver/pull/245

Bug SO-119: Resync OKD MariaDB and ruby imagestreams from library

View the Description View the linked PRs

Description of problem:

OKD samples have synced with invalid MariaDB ref

https://github.com/openshift/cluster-samples-operator/pull/525

Bug OCPBUGS-19230: Update 4.15 marketplace-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/operator-framework/operator-marketplace/pull/539

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/operator-framework/operator-marketplace/pull/539

Bug OCPBUGS-23327: file path used for oci images can result in an error

View the Description View the linked PRs

Description of problem:

When executing oc mirror using an oci path, you can end up with in an error state when the destination is a file://&lt;path> destination (i.e. mirror to disk).

Version-Release number of selected component (if applicable):

4.14.2

How reproducible:

always

Steps to Reproduce:

At IBM we use the ibm-pak tool to generate a OCI catalog, but this bug is reproducible using a simple skopeo copy. Once you've copied the image locally you can move it around using file system copy commands to test this in different ways.

1. Make a directory structure like this to simulate how ibm-pak creates its own catalogs. The problem seems to be related to the path you use, so this represents the failure case:

mkdir -p /root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list

2. make a location where the local storage will live:

mkdir -p /root/.ibm-pak/oc-mirror-storage

3. Next, copy the image locally using skopeo:

skopeo copy docker://icr.io/cpopen/ibm-zcon-zosconnect-catalog@sha256:8d28189637b53feb648baa6d7e3dd71935656a41fd8673292163dd750ef91eec oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list --all --format v2s2

4. You can copy the OCI catalog content to a location where things will work properly so you can see a working example:

cp -r /root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list /root/ibm-zcon-zosconnect-catalog

5. You'll need an ISC... I've included both the oci references in the example (the commented out one works, but the oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list reference fails).

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror:
operators:
- catalog: oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list
#- catalog: oci:///root/ibm-zcon-zosconnect-catalog
packages:
- name: ibm-zcon-zosconnect
channels:
- name: v1.0
full: true
targetTag: 27ba8e
targetCatalog: ibm-catalog
storageConfig:
local:
path: /root/.ibm-pak/oc-mirror-storage

6. run oc mirror (remember the ISC has oci refs for good and bad scenarios). You may want to change your working directory to different locations between running the good/bad examples.

oc mirror --config /root/.ibm-pak/data/publish/latest/image-set-config.yaml "file://zcon --dest-skip-tls --max-per-registry=6

Actual results:


Logging to .oc-mirror.log
Found: zcon/oc-mirror-workspace/src/publish
Found: zcon/oc-mirror-workspace/src/v2
Found: zcon/oc-mirror-workspace/src/charts
Found: zcon/oc-mirror-workspace/src/release-signatures
error: ".ibm-pak/data/publish/latest/catalog-oci/manifest-list/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c" is not a valid image reference: invalid reference format

Expected results:


Simple example where things were working with the oci:///root/ibm-zcon-zosconnect-catalog reference (this was executed in the same workspace so no new images were detected).

Logging to .oc-mirror.log
Found: zcon/oc-mirror-workspace/src/publish
Found: zcon/oc-mirror-workspace/src/v2
Found: zcon/oc-mirror-workspace/src/charts
Found: zcon/oc-mirror-workspace/src/release-signatures
3 related images processed in 668.063974ms
Writing image mapping to zcon/oc-mirror-workspace/operators.1700092336/manifests-ibm-zcon-zosconnect-catalog/mapping.txt
No new images detected, process stopping

Additional info:


I debugged the error that happened and captured one of the instances where the ParseReference call fails. This is only for reference to help narrow down the issue.

github.com/openshift/oc/pkg/cli/image/imagesource.ParseReference (/root/go/src/openshift/oc-mirror/vendor/github.com/openshift/oc/pkg/cli/image/imagesource/reference.go:111)
github.com/openshift/oc-mirror/pkg/image.ParseReference (/root/go/src/openshift/oc-mirror/pkg/image/image.go:79)
github.com/openshift/oc-mirror/pkg/cli/mirror.(*MirrorOptions).addRelatedImageToMapping (/root/go/src/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:194)
github.com/openshift/oc-mirror/pkg/cli/mirror.(*OperatorOptions).plan.func3 (/root/go/src/openshift/oc-mirror/pkg/cli/mirror/operator.go:575)
golang.org/x/sync/errgroup.(*Group).Go.func1 (/root/go/src/openshift/oc-mirror/vendor/golang.org/x/sync/errgroup/errgroup.go:75)
runtime.goexit (/usr/local/go/src/runtime/asm_amd64.s:1594)

Also, I wanted to point out that because we use a period in the path (i.e. .ibm-pak) I wonder if that's causing the issue? This is just a guess and something to consider. *FOLLOWUP* ... I just removed the period from ".ibm-pak" and that seemed to make the error go away.

https://github.com/openshift/oc-mirror/pull/756

Bug OCPBUGS-24582: Functions list page always show create project page

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13419

Bug OCPBUGS-17289: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/25

Bug OCPBUGS-19093: Skip agent-tui on OCI

View the Description View the linked PRs

The agent-tui interface for editing the network config for the Agent ISO at boot time only runs on the graphical console (tty1). It's difficult to run two copies, so this gives the most value for now.

Although tty1 always exists, OCI only has a serial console available (assuming it is enabled - see ~~OCPBUGS-19092~~), so the user doesn't see anything on the console while agent-tui is running (and in fact the systemd progress output is suspended for the duration).

Network configuration of any kind is rarely needed in the cloud, anyway. So on OCI specifically we mostly are slowing boot down by 20s for no real reason. We should disable agent-tui in this case - either by disabling the service or simply not adding the binary to the ISO image.

Bug OCPBUGS-19222: Update 4.15 cluster-version-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-version-operator/pull/970

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-version-operator/pull/970

Bug OCPBUGS-22539: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/6858

Bug OCPBUGS-26063: [release-4.15] IBMCloud: Add support for endpoint overrides

View the Description View the linked PRs

Description of problem:

cherry-pick of https://github.com/openshift/cluster-image-registry-operator/pull/955

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/984

Bug OCPBUGS-29502: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/303

Bug OCPBUGS-33076: OpenShift Pipelines: Error during adding parameters to Pipeline

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31082~~. The following is the description of the original issue:
—
Description of problem:

When adding parameters to a pipeline there is an error when trying to save.

It seems a resource[] section is added, this doesn't happen when using yaml resources and oc client.

Discussed with Vikram Raj

Version-Release number of selected component (if applicable):

    4.14.12

How reproducible:

    Always

Steps to Reproduce:

    1.Create a pipeline
    2.Add a parameter
    3.Save the pipeline

Actual results:

    Error shown

Expected results:

    Save successful

Additional info:

https://github.com/openshift/console/pull/13801

Bug OCPBUGS-36072: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/121

Bug OCPBUGS-21771: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus-alertmanager/pull/79

Bug OCPBUGS-21793: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-machine-approver/pull/204

Task MON-3533: Remove e2e plugins from cluster:kube_persistentvolume_plugin_type_counts:sum

View the Description View the linked PRs

Similar to what has been done in ~~MON-3484~~.

https://github.com/openshift/cluster-monitoring-operator/pull/2171

Bug OCPBUGS-19214: Update 4.15 ose-oauth-apiserver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-apiserver/pull/90

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-apiserver/pull/90

Bug OCPBUGS-19216: Update 4.15 coredns image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/coredns/pull/95

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/coredns/pull/95

Bug OCPBUGS-21640: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-aws/pull/85

Bug OCPBUGS-31826: Wrong dnsPolicy is used for konnectivity-agent in data plane

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31444~~. The following is the description of the original issue:
—
Description of problem:

The konnectivity-agent on the data plane needs to resolve its proxy-server-url to connect the control plane's konnectivity server. Also, the these agents are using the default dnsPolicy which is ClusterFirst.

This creates a dependency with CoreDNS. If CoreDNS is misconfigured or down, agents won't able to connect to the server, and all konnectivity related traffic goes down (blocks updates, webhooks, logs, etc).

The correction would to use the dnsPolicy: Default in the konnectivity-agent daemonset on the data plane, so it would use the name resolution configuration from the node.

This makes sure that the konnectivity-agent's proxy-server-url can be resolved even if coreDNS is down or mis-configured

The konnectivity-agent control plane deployment shall not change as it still needs to use coreDNS as in that case a ClusterIP Service is configured as proxy-server-url.

Version-Release number of selected component (if applicable):

4.14, 4.15

How reproducible:

Break coreDNS configuration

Steps to Reproduce:

1. Put an invalid forwarder to the dns.operator/default to fail upstream DNS resolving
2. Rollout restart the konnectivity-agent daemonset in kube-system

Actual results:

kubectl log is failing

Expected results:

kubectl log is working

Additional info:

https://github.com/openshift/hypershift/pull/3845

Bug OCPBUGS-32092: Logs of haproxy too verbose

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32028~~. The following is the description of the original issue:
—
==== This Jira covers only haproxy component ====

Description of problem:

Pods running in the namespace openshift-vsphere-infra are so much verbose printing as INFO messages that should debug.

This excesse of verbosity has an impact in CRIO, in the node and also in the Logging system. 

For instance, having 71 nodes, the number of logs coming from this namespace in 1 month was: 450.000.000 meaning 1TB of logs written to disk on the node by CRIO, reading but the Red Hat log collector and stored in the Log Store.

Added to the impact on the performance, it have a financial impact for the storage needed.

Examples of logs are that adjust better to DEBUG and not as INFO:
```
/// For keep-alive pods are printed 4 messages per node each 10 seconds per node, in this example, the number of nodes is 71, then, this means 284 log entries per second, then 1704 log entries by minute and keepalive pod
$ oc logs keepalived-master.example-0 -c  keepalived-monitor |grep master.example-0|grep 2024-02-15T08:20:21 |wc -l

$ oc logs keepalived-master-example-0 -c  keepalived-monitor |grep worker-example-0|grep 2024-02-15T08:20:21 
2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"
2024-02-15T08:20:21.733399279Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
2024-02-15T08:20:21.733421398Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"

/// For haproxy logs observed 2 logs printed per 6 seconds for each master, this means 6 messages in the same second, 60 messages/minute per pod
$ oc logs haproxy-master-0-example -c haproxy-monitor
...
2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="Searching for Node IP of master-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x]'."
2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="For node master-example-0 selected peer address x.x.x.x using NodeInternalIP"

Version-Release number of selected component (if applicable):

OpenShift 4.14
VSphere IPI installation

How reproducible:

Always

Steps to Reproduce:

    1. Install OpenShift 4.14 Vsphere IPI environment
    2. Review the logs of the haproxy pods and keealived pods running in the namespace `openshift-vsphere-infra`

Actual results:

The pods haproxy-* and keepalived-* pods being so much verbose printing as INFO messages should be as DEBUG. 

Some of the messages are available in the Description of the problem in the present bug.

Expected results:

Printed as INFO only relevant messages helping to reduce the verbosity of the pods running in the namespace  `openshift-vsphere-infra`

Additional info:

https://github.com/openshift/machine-config-operator/pull/4313

Bug OCPBUGS-36347: Live migration gets stuck when the ConfigMap MTU is absent

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35829~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-35316~~. The following is the description of the original issue:
—
Description of problem:

Live migration gets stuck when the ConfigMap MTU is absent. The ConfigMap mtu should be created by the mtu-prober job at the installation time since 4.11. But if the cluster was upgrade from a very early releases, such as 4.4.4, the ConfigMap mtu may be absent.

Version-Release number of selected component (if applicable):

4.16.rc2

How reproducible:

Steps to Reproduce:

1. build a 4.16 cluster with OpenShiftSDN
2. remove the configmap mtu from the namespace cluster-network-operator.
3. start live migration.

Actual results:

Live migration gets stuck with error

NetworkTypeMigrationFailed
Failed to process SDN live migration (configmaps "mtu" not found)

Expected results:

Live migration finished successfully.

Additional info:

A workaround is to create the configmap mtu manually before starting live migration.

https://github.com/openshift/cluster-network-operator/pull/2426

Bug OCPBUGS-19292: Update 4.15 ose-network-tools image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/network-tools/pull/87

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/network-tools/pull/87

Bug OCPBUGS-24113: Update 4.15 ose-aws-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-aws/pull/485

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-aws/pull/485

Bug OCPBUGS-28778: Whereabouts reconciler errors with "IPPool not found" on pod deletion although the IPPool exists

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23199~~. The following is the description of the original issue:
—
Description of problem:

During a pod deletion, the whereabouts reconciler correctly detects the pod deletion but it errors out claiming that the IPPool is not found.However, when checking the audit logs, we can see no deletion, no re-creation and we can even see successful "patch" and "get" requests to the same IPPool. This means that the IPPool was never deleted and properly accessible at the time of the issue, so the error in the reconciler looks like it made some mistake while retrieving the IPPool.

Version-Release number of selected component (if applicable):

4.12.22

How reproducible:

Sometimes

Steps to Reproduce:

    1.Delete pod
    2.
    3.

Actual results:

Error in whereabouts reconciler. New pods cannot using additional networks with whereabouts IPAM plugin cannot have IPs allocated due to wrong cleanup.

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2242

Bug OCPBUGS-31651: [release-4.15]: Default catalog source pod never get updates

View the Description View the linked PRs

Description of problem:

The default catalog source pod never gets updates, the users have to manually recreate it to get updated. Here is must-gather log for your debugging: https://drive.google.com/file/d/16_tFq5QuJyc_n8xkDFyK83TdTkrsVFQe/view?usp=drive_link

I went through the code and found the `updateStrategy` depends on the `ImageID`, see

https://github.com/openshift/operator-framework-olm/blob/master/staging/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go#L527-L534

// imageID returns the ImageID of the primary catalog source container or an empty string if the image ID isn't available yet.
// Note: the pod must be running and the container in a ready status to return a valid ImageID.
func imageID(pod *corev1.Pod) string {
 if len(pod.Status.ContainerStatuses) < 1 {
 logrus.WithField("CatalogSource", pod.GetName()).Warn("pod status unknown")
 return ""
 }
 return pod.Status.ContainerStatuses[0].ImageID
}

But, for those default catalog source pods, their `pod.Status.ContainerStatuses[0].ImageID` will never change since it's the `opm` image, not index image.

jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.containerStatuses} |jq
[
  {
    "containerID": "cri-o://115bd207312c7c8c36b63bfd251c085a701c58df2a48a1232711e15d7595675d",
    "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e",
    "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e",
    "lastState": {},
    "name": "registry-server",
    "ready": true,
    "restartCount": 1,
    "started": true,
    "state": {
      "running": {
        "startedAt": "2024-03-26T04:21:41Z"
      }
    }
  }
]

The imageID() func should return the index image ID for those default catalog sources.

jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.initContainerStatuses[1]} |jq
{
  "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c",
  "image": "registry.redhat.io/redhat/redhat-operator-index:v4.15",
  "imageID": "registry.redhat.io/redhat/redhat-operator-index@sha256:19010760d38e1a898867262698e22674d99687139ab47173e2b4665e588635e1",
  "lastState": {},
  "name": "extract-content",
  "ready": true,
  "restartCount": 1,
  "started": false,
  "state": {
    "terminated": {
      "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c",
      "exitCode": 0,
      "finishedAt": "2024-03-26T04:21:39Z",
      "reason": "Completed",
      "startedAt": "2024-03-26T04:21:27Z"
    }
  }
}

Version-Release number of selected component (if applicable):

    4.15.2

How reproducible:

    always

Steps to Reproduce:

    1. Install an OCP 4.16.0
    2. Waiting for the redhat-operator catalog source updates
    3.

Actual results:

The redhat-operator catalog source never gets updates.

Expected results:

These default catalog source should get updates depending on the `updateStrategy`.

    jiazha-mac:~ jiazha$ oc get catalogsource redhat-operators -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  annotations:
    operatorframework.io/managed-by: marketplace-operator
    target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
  creationTimestamp: "2024-03-20T15:48:59Z"
  generation: 1
  name: redhat-operators
  namespace: openshift-marketplace
  resourceVersion: "12217605"
  uid: cc0fc420-c9d8-4c7d-997e-f0893b4c497f
spec:
  displayName: Red Hat Operators
  grpcPodConfig:
    extractContent:
      cacheDir: /tmp/cache
      catalogDir: /configs
    memoryTarget: 30Mi
    nodeSelector:
      kubernetes.io/os: linux
      node-role.kubernetes.io/master: ""
    priorityClassName: system-cluster-critical
    securityContextConfig: restricted
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/master
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 120
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 120
  icon:
    base64data: ""
    mediatype: ""
  image: registry.redhat.io/redhat/redhat-operator-index:v4.15
  priority: -100
  publisher: Red Hat
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 10m
status:
  connectionState:
    address: redhat-operators.openshift-marketplace.svc:50051
    lastConnect: "2024-03-27T06:35:36Z"
    lastObservedState: READY
  latestImageRegistryPoll: "2024-03-27T10:23:16Z"
  registryService:
    createdAt: "2024-03-20T15:56:03Z"
    port: "50051"
    protocol: grpc
    serviceName: redhat-operators
    serviceNamespace: openshift-marketplace

Additional info:

I also checked the currentPodsWithCorrectImageAndSpec, but no hash changed due to the pod.spec are the same always.

time="2024-03-26T03:22:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace
time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW
time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW
time="2024-03-26T03:27:02Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA
time="2024-03-26T03:27:03Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA

https://github.com/openshift/operator-framework-olm/pull/726

Bug OCPBUGS-39343: The filepath including leading slash makes error during parsing devfile using Gitlab

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33199~~. The following is the description of the original issue:
—
Description of problem:

When creating an application based on devfile "Import from Git" in Developer console using only GitLab repo, the following error block to create it.
It only happened when using GitLab, not Github. And CLI operation based on "oc new-app" could work well. In other words, the issue is only for Dev console.  

  Could not fetch kubernetes resource "/deploy.yaml" for component "kubernetes-deploy" from Git repository https://{gitlaburl}.

Version-Release number of selected component (if applicable):

4.15.z

How reproducible:

Always

Steps to Reproduce:

You can always reproduce according to the following procedures.
a. Switch "Developer" mode at your web console.
b. Move "+Add", then click "Import from Git" in "Git Repository" section at the page.
c. Input "https://<GITLAB HOSTNAME>/XXXX/devfile-sample-go-basic.git" to the "Git Repo URL" text box.
d. Select "GitLab" at "Git type" drop box.
e. You can see the below error messages.

Actual results:

The "/deploy.yaml" file path evaluated as invalid one with 400 response status during the process as follows.
Look at the URL, "/%2Fdeploy.yaml" shows us leading slash was duplicated there.

  Request URL:
    https://<GITLAB HOSTNAME>/api/v4/projects/yyyy/repository/files/%2Fdeploy.yaml/raw?ref=main
  Response:
    {"error":"file_path should be a valid file path"}

Expected results:

 The request URL for handling "deploy.yaml" file should be removed the duplicated leading slash and provide correct file path.

 Request URL:
   https://<GITLAB HOSTNAME>/api/v4/projects/yyyy/repository/files/deploy.yaml/raw?ref=main
 Response:
   "deploy.yaml" contents.

Additional info:

I submitted a pull request to fix this here: https://github.com/openshift/console/pull/13812

https://github.com/openshift/console/pull/14227

Bug OCPBUGS-28830: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/994

Bug OCPBUGS-29812: [4.15] yaml tab crash on some browser versions when mce is installed

View the Description View the linked PRs

Description of problem:

YAML tab will crash on some specific browser versions when MCE is installed

Version-Release number of selected component (if applicable):

 4.15.0-0.nightly-2024-02-21-153906

How reproducible:

Always

Steps to Reproduce:

1. Install multicluster engine for Kubernetes, create MultiClusterEngine instance and wait until mce plugin is successfully enabled
2. Check resources YAML tab, for example

Deployment creation page /k8s/ns/yapei/deployments/~new/form
DeploymentConfig creation page /k8s/ns/yapei/deploymentconfigs/~new/form
ConfigMap creation YAML view page /k8s/ns/yapei/configmaps/~new/form
Route creation YAML view page /k8s/ns/yapei/routes/~new/form
BuildConfig creation YAML view page /k8s/ns/yapei/buildconfigs/~new/form

Actual results:

2. we can see an error page returned when visiting these pages
TypeErrorDescription:e is undefined

Expected results:

2. no error and page correctly loaded

Additional info:

https://github.com/openshift/console/pull/13638

Bug HOSTEDCP-1708: Remove liveness and readiness probes from image registry operator and ingress operator for 4.15

View the Description View the linked PRs

Cherry pick removal of probes to 4.15. Child of https://issues.redhat.com/browse/HOSTEDCP-1570

https://github.com/openshift/hypershift/pull/4128

Bug OCPBUGS-22043: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-gcp/pull/46

Bug OCPBUGS-23164: Console: Cannot Edit Shipwright Build

View the Description View the linked PRs

Description of problem:


Unable to edit Shipwright Builds with the upcoming builds for Red Hat OpenShift release (based on Shipwright v0.12.0) in the developer and admin consoles.

Workaround is to use `oc edit build.shipwright.io ...`

Version-Release number of selected component (if applicable):


OCP 4.14
builds for OpenShift v1.0.0

How reproducible:


Always

Steps to Reproduce:


1. Deploy the builds for Red Hat OpenShift release candidate operator
2. Create a Build using the shp command line: `shp build create ...`
3. Open the Dev or Admin console for Shipwright Builds
4. Attempt to edit the Build object

Actual results:


Page appears to "freeze", does not let you edit.

Expected results:


Shipwright Build objects can be edited.

Additional info:


Can be reproduced by deploying the following "test catalog" - quay.io/adambkaplan/shipwright-io/operator-catalog:v0.13.0-rc7, then creating a subscription for the Shipwright operator.

Will likely be easier to reproduce once we have the downstream operator in the Red Hat OperatorHub catalog.

https://github.com/openshift/console/pull/13341

Bug OCPBUGS-24139: Update 4.15 csi-attacher-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-attacher/pull/65

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-attacher/pull/65

Bug OCPBUGS-19260: Update 4.15 csi-snapshot-validation-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/106

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/106

Bug OCPBUGS-24218: Scheduler TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-kube-scheduler-operator/pull/511

Bug OCPBUGS-31445: Upgrade EventListener apiVersion to v1beta1

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30958~~. The following is the description of the original issue:
—
Description of problem:

    Support of apiVersion v1alpha1 has been removed. So, it is better to upgrade the apiVersion to v1beta1.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13704

Story OSASINFRA-2139: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/110

Bug MGMT-15950: Fix DNS wilcard domain validation

View the Description View the linked PRs

Description of the problem:
Fix DNS wilcard domain validation.
DNS wildcard domain starts with validateNoWildcardDNS. The domain may have an optional trailing dot.
Currently the assumption is that the trailing dot is mandatory for the domain name.

How reproducible:

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5544

Bug OCPBUGS-29599: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2074

Bug OCPBUGS-29585: PowerVS: handle composite_instance

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29584~~. The following is the description of the original issue:
—
Description of problem:

When the IPI installer creates a service instance for the user, PowerVS will now have the type as composite_instance rather than service_instance. Fixup delete cluster to account for this change.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Create cluster
    2. Destroy cluster
    3.

Actual results:

The newly created service instance does not delete.

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8032

Bug OCPBUGS-19391: CVO hotloops on ClusterRoleBinding cluster-baremetal-operator and ConfigMap openshift-machine-config-operator/kube-rbac-proxy

View the Description View the linked PRs

Description of problem:

In a 4.14 cluster, I'm seeing CVO hotloops on ClusterRoleBinding cluster-baremetal-operator and ConfigMap openshift-machine-config-operator/kube-rbac-proxy with empty ManagedFields.  

# oc logs  cluster-version-operator-7cf78c4f65-hfh7f -n openshift-cluster-version | grep -o 'Updating .*due to diff'| sort | uniq -c
     93 Updating ClusterRoleBinding cluster-baremetal-operator due to diff
     93 Updating ClusterRole machine-api-operator-ext-remediation due to diff
     93 Updating ConfigMap openshift-machine-config-operator/kube-rbac-proxy due to diff

CVO logs the diff as below:

I0919 10:19:24.658975       1 rbac.go:38] Updating ClusterRoleBinding cluster-baremetal-operator due to diff:   &v1.ClusterRoleBinding{
      TypeMeta: v1.TypeMeta{
-         Kind:       "",
+         Kind:       "ClusterRoleBinding",
-         APIVersion: "",
+         APIVersion: "rbac.authorization.k8s.io/v1",
      },
      ObjectMeta: v1.ObjectMeta{
          ... // 2 identical fields
          Namespace:                  "openshift-machine-api",
          SelfLink:                   "",
-         UID:                        "cb8a7ffe-9966-4224-b1b6-3e7db6da7009",
+         UID:                        "",
-         ResourceVersion:            "2571",
+         ResourceVersion:            "",
          Generation:                 0,
-         CreationTimestamp:          v1.Time{Time: s"2023-09-19 03:02:31 +0000 UTC"},
+         CreationTimestamp:          v1.Time{},
          DeletionTimestamp:          nil,
          DeletionGracePeriodSeconds: nil,
          ... // 2 identical fields
          OwnerReferences: {{APIVersion: "config.openshift.io/v1", Kind: "ClusterVersion", Name: "version", UID: "fb1c6e8c-01bc-415f-8b55-c55a4601bd10", ...}},
          Finalizers:      nil,
-         ManagedFields: []v1.ManagedFieldsEntry{
-             {
-                 Manager:    "cluster-version-operator",
-                 Operation:  "Update",
-                 APIVersion: "rbac.authorization.k8s.io/v1",
-                 Time:       s"2023-09-19 03:02:31 +0000 UTC",
-                 FieldsType: "FieldsV1",
-                 FieldsV1:   s`{"f:metadata":{"f:annotations":{".":{},"f:capability.openshift.i`...,
-             },
-         },
+         ManagedFields: nil,
      },
      Subjects: {{Kind: "ServiceAccount", Name: "cluster-baremetal-operator", Namespace: "openshift-machine-api"}},
      RoleRef:  {APIGroup: "rbac.authorization.k8s.io", Kind: "ClusterRole", Name: "cluster-baremetal-operator"},
  }

...

I0919 10:14:55.572553       1 core.go:138] Updating ConfigMap openshift-machine-config-operator/kube-rbac-proxy due to diff:   &v1.ConfigMap{
      TypeMeta: v1.TypeMeta{
-         Kind:       "",
+         Kind:       "ConfigMap",
-         APIVersion: "",
+         APIVersion: "v1",
      },
      ObjectMeta: v1.ObjectMeta{
          ... // 2 identical fields
          Namespace:                  "openshift-machine-config-operator",
          SelfLink:                   "",
-         UID:                        "9c6c667f-8e10-4fca-8c1d-c8c0fc158ee5",
+         UID:                        "",
-         ResourceVersion:            "164024",
+         ResourceVersion:            "",
          Generation:                 0,
-         CreationTimestamp:          v1.Time{Time: s"2023-09-19 03:01:42 +0000 UTC"},
+         CreationTimestamp:          v1.Time{},
          DeletionTimestamp:          nil,
          DeletionGracePeriodSeconds: nil,
          ... // 2 identical fields
          OwnerReferences: {{APIVersion: "config.openshift.io/v1", Kind: "ClusterVersion", Name: "version", UID: "fb1c6e8c-01bc-415f-8b55-c55a4601bd10", ...}},
          Finalizers:      nil,
-         ManagedFields: []v1.ManagedFieldsEntry{
-             {
-                 Manager:    "cluster-version-operator",
-                 Operation:  "Update",
-                 APIVersion: "v1",
-                 Time:       s"2023-09-19 10:10:23 +0000 UTC",
-                 FieldsType: "FieldsV1",
-                 FieldsV1:   s`{"f:data":{},"f:metadata":{"f:annotations":{".":{},"f:include.re`...,
-             },
-             {
-                 Manager:    "machine-config-operator",
-                 Operation:  "Update",
-                 APIVersion: "v1",
-                 Time:       s"2023-09-19 10:10:25 +0000 UTC",
-                 FieldsType: "FieldsV1",
-                 FieldsV1:   s`{"f:data":{"f:config-file.yaml":{}}}`,
-             },
-         },
+         ManagedFields: nil,
      },
      Immutable:  nil,
      Data:       {"config-file.yaml": "authorization:\n  resourceAttributes:\n    apiVersion: v1\n    reso"...},
      BinaryData: nil,
  }

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-15-233408

How reproducible:

1/1

Steps to Reproduce:

1. Install a 4.14 cluster
2.
3.

Actual results:

CVO hotloops on ClusterRoleBinding cluster-baremetal-operator and ConfigMap openshift-machine-config-operator/kube-rbac-proxy

Expected results:

CVO doesn't hotloop on resources with empty ManagedFields

Additional info:

https://github.com/openshift/cluster-version-operator/pull/993

Bug OCPBUGS-23398: HyperShift AWS KMS Backup key ARN incorrect

View the Description View the linked PRs

Description of problem:

AWS KMS on HyperShift makes use of two UNIX sockets via which the KMS plugins are run. Each unix socket should run connect to independent KMS instances i.e. with their own AWS ARNs. However, as of today both the active KMS socket as well as the backup KMS socket seem to be using the same ARN which essentially translates that the backup KMS instance never gets used.

Version-Release number of selected component (if applicable):

HyperShift - main branch (PR #423)
GitHub indicates all the following hypershift versions would be affected.
v0.1.15, v0.1.14, v0.1.13,  v0.1.12, v0.1.11, v0.1.10, v0.1.9, v0.1.8, v0.1.7, v0.1.6, v0.1.5, v0.1.4, v0.1.3, v0.1.2, v0.1.1, v0.1.0, 2.0.0-20220406093220, 2.0.0-20220323110745, 2.0.0-20220319120001, 2.0.0-20220317155435

How reproducible:

Always

Steps to Reproduce:

1. By creating a HyperShift cluster
2. Checking if backup KMS instance was ever used

Actual results:

Active KMS instance's ARN is used even by the backup KMS socket

Expected results:

Backup KMS socket should use it's own backupKey.ARN

Additional info:

https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/kas/aws_kms.go#L119

should use backupKey.ARN instead of activeKey.ARN in the func call

https://github.com/openshift/hypershift/pull/3216

Bug OCPBUGS-24158: Update 4.15 kube-state-metrics-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-state-metrics/pull/105

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-state-metrics/pull/105

Bug OCPBUGS-28326: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-openstack/pull/105

Bug OCPBUGS-22868: accessTokenInactivityTimeout field is required when configuring oauth identity providers

View the Description View the linked PRs

Description of problem:

It failed to configure oauth identity providers in the HostedCluster when accessTokenInactivityTimeout is not set

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a cluster
2. configure htpasswd without accessTokenInactivityTimeout field in the HostedCluster CR
3. it failed to apply

Actual results:

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters -o yaml > cluster.yaml 
  spec:
    configuration:
      oauth:
        identityProviders:
        - htpasswd:
            fileData:
              name: htpass-secret
          mappingMethod: claim
          name: my_htpasswd_provider
          type: HTPasswd
      secretRefs:
      - name: htpass-secret
jiezhao-mac:hypershift jiezhao$ oc apply -f cluster.yaml
The HostedCluster "jie-test" is invalid: spec.configuration.oauth: Invalid value: "object": no such key: tokenConfig evaluating rule: spec.configuration.oauth.tokenConfig.accessTokenInactivityTimeout minimum acceptable token timeout value is 300 seconds

Expected results:

htpasswd should be configured successfully without accessTokenInactivityTimeout field

Additional info:

When accessTokenInactivityTimeout it set to 300s, htpasswd is configured in the HostedCluster successfully.

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters -o yaml > cluster.yaml

  spec:
    configuration:
      oauth:
        identityProviders:
        - htpasswd:
            fileData:
              name: htpass-secret
          mappingMethod: claim
          name: my_htpasswd_provider
          type: HTPasswd
        tokenConfig:
          accessTokenInactivityTimeout: 300s
      secretRefs:
      - name: htpass-secret

jiezhao-mac:hypershift jiezhao$ oc apply -f cluster.yaml 
hostedcluster.hypershift.openshift.io/jie-test configured
jiezhao-mac:hypershift jiezhao$ 

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster/jie-test -n clusters -ojsonpath='{.spec.configuration}' | jq
{
  "oauth": {
    "identityProviders": [
      {
        "htpasswd": {
          "fileData": {
            "name": "htpass-secret"
          }
        },
        "mappingMethod": "claim",
        "name": "my_htpasswd_provider",
        "type": "HTPasswd"
      }
    ],
    "tokenConfig": {
      "accessTokenInactivityTimeout": "300s"
    }
  }
}

https://github.com/openshift/hypershift/pull/3157

Bug OCPBUGS-44294: Backwards compatibility for ENI tagging in AWS on HCP ROSA

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44234~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43921~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-43898. The following is the description of the original issue:
—
Description of problem:

OCP 4.17 requires permissions to tag network interfaces (ENIs) on instance creation in support of the Egress IP feature.

ROSA HCP uses managed IAM policies, which are reviewed and gated by AWS. The current policy AWS has applied does not allow us to tag ENIs out of band, only ones that have 'red-hat-managed: true`, which are going to be tagged during instance creation.

However, in order to support backwards compatibility for existing clusters, we need to roll out a CAPA patch that allows us to call `RunInstances` with or without the ability to tag ENIs.

Once we backport this to the Z streams, upgrade clusters and rollout the updated policy with AWS, we can then go back and revert the backport.

For more information see https://issues.redhat.com/browse/SDE-4496

Version-Release number of selected component (if applicable):

4.17

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-api-provider-aws/pull/532

Bug OCPBUGS-34641: Invalid Pull-Secret when using password which contains a colon character

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31727~~. The following is the description of the original issue:
—
Description of problem:

When using the OpenShift Assisted Installer with a password containing the `:` colon character.

Version-Release number of selected component (if applicable):

    OpenShift 4.15

How reproducible:

    Everytime

Steps to Reproduce:

    1. Attempt to install using the Agentbased installer with a pull-secret which includes a colon character.
   
   The following snippet of. code appears to be hit when there is a colon within the user/password section of the pull-secret.
https://github.com/openshift/assisted-service/blob/d3dd2897d1f6fe108353c9241234a724b30262c2/internal/cluster/validations/validations.go#L132-L135

Actual results:

    Install fails

Expected results:

   Install succeeds

Additional info:

https://github.com/openshift/assisted-service/pull/6381

Bug OCPBUGS-44998: add IBM Block Storage CSI driver support for RWX

View the Description View the linked PRs

This is a clone of issue OCPBUGS-44657. The following is the description of the original issue:
—
Corresponding Jira ticket for PR https://github.com/openshift/console/pull/14076

https://github.com/openshift/console/pull/14536

Bug OCPBUGS-18089: Don't set SSL connection on DBs anymore with OVN-IC

View the Description View the linked PRs

SB and NB containers have this command to expose their DB via SSL and set the inactivity probe interval. With OVN-IC we don't use SSL for the DBs anymore, so we can remove that bit.

if ! retry 60 "inactivity-probe" "ovn-sbctl --no-leader-only -t 5 set-connection pssl:.OVN_SB_PORT.LISTEN_DUAL_STACK – set connection . inactivity_probe=.OVN_CONTROLLER_INACTIVITY_PROBE"; then

should become:

if ! retry 60 "inactivity-probe" "ovn-sbctl --no-leader-only -t 5 set connection . inactivity_probe=.OVN_CONTROLLER_INACTIVITY_PROBE"; then

Also we can clean up the comment at the end where it polls the IPsec status, which is just a way of making sure the DB is ready and answering queries. We dont' need to wait for the cluster to converge (since there's no RAFT) but could change it to:

"Kill some time while DB becomes ready by checking IPsec status"

Bug OCPBUGS-35081: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/288

Bug OCPBUGS-37677: [release-4.15] Mark e2e flaky tests as flaky

View the Description View the linked PRs

Description of problem:

    Flake's gonna flake

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Run e2e test
    2. ...
    3. Profit

Actual results:

Red

Expected results:

Green

Additional info:

https://github.com/openshift/operator-framework-olm/pull/829

Bug OCPBUGS-27148: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7906

Bug OCPBUGS-29179: Nodepool has message NotFound when replica is set to 0

View the Description View the linked PRs

Description of problem:

When the replica for a nodepool is set to 0, the message for the nodepool is "NotFound". This message should not be displayed if the desired replica is 0.

Version-Release number of selected component (if applicable):

How reproducible:

    Create a nodepool and set the replica to 0

Steps to Reproduce:

    1. Create a hosted cluster
    2. Set the replica for the nodepool to 0
    3.

Actual results:

NodePool message is "NotFound"

Expected results:

NodePool message to be empty

Additional info:

https://github.com/openshift/hypershift/pull/3473

Bug OCPBUGS-36938: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4358

Bug OCPBUGS-19147: Update 4.15 ose-azure-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-azure/pull/282

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-azure/pull/282

Bug OCPBUGS-22946: verbose prometheus-operator-admission-webhook logs

View the Description View the linked PRs

Description of problem:

issue is found when verify ~~OCPBUGS-21637~~, so verbose prometheus-operator-admission-webhook logs

$ oc -n openshift-monitoring get pod -l app.kubernetes.io/name=prometheus-operator-admission-webhook
NAME                                                     READY   STATUS    RESTARTS   AGE
prometheus-operator-admission-webhook-5d96cbcbfc-6lx4m   1/1     Running   0          56m
prometheus-operator-admission-webhook-5d96cbcbfc-jj66x   1/1     Running   0          53m

$ oc -n openshift-monitoring logs prometheus-operator-admission-webhook-5d96cbcbfc-6lx4m
level=info ts=2023-11-06T01:50:33.617049649Z caller=main.go:140 address=[::]:8443 msg="Starting TLS enabled server" http2=false
ts=2023-11-06T01:50:34.601774794Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:50:40.439015896Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:50:40.43925044Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:50:50.437745065Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:50:50.448362455Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:00.428162615Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:00.428571968Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:10.426317894Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:10.426769416Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:20.426701853Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:20.427289877Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:30.429156675Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:30.429229042Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:40.426522527Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:40.427038656Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:50.428974832Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:51:50.429036156Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:00.428747039Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:00.42880275Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:10.426871896Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:10.428574666Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:20.428211529Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:20.428638108Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:30.427148775Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:30.427631515Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:40.427167231Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:40.427658789Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:50.427851476Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:52:50.428319729Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:00.428583783Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:00.429083642Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:10.426258718Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:10.426788637Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:20.430876533Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:20.431510269Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:30.427527316Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:30.428046481Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:40.428449342Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:40.428886681Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:50.426513473Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:53:50.427038956Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:00.426639171Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:00.427164997Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:10.426804033Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:10.427276217Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:20.427705297Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:20.428214309Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:30.428041006Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:30.428525809Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:40.426257489Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:40.42674803Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:50.42708913Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:54:50.427155482Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:00.428431788Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:00.428881681Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:10.429549989Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:10.429618004Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:20.427741192Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:20.428196221Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:30.4269946Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:30.427451901Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:40.426994787Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:40.427502475Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:50.426456346Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:55:50.426610051Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:00.426520596Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:00.426676076Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:10.435077603Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:10.435135319Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:20.427693249Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:20.428171589Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:30.428760772Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:30.428828762Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:40.428545666Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:40.429005303Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:50.426103842Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:56:50.426578009Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:00.427041793Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:00.427482797Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:10.427963834Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:10.428440451Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:20.428877932Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:20.428945521Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:30.426157935Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:30.426639545Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:40.42875961Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:40.42884264Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:50.426450177Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:57:50.426939532Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:00.428456873Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:00.428904131Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:10.428931448Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:10.428987646Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:20.429377819Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:20.4294396Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:30.428108184Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:30.428580595Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:40.426962512Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:40.427429076Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:50.429177401Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:58:50.429637834Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:00.428197981Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:00.428655487Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:10.426418388Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:10.426908577Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:20.426705875Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:20.427197531Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:30.427909675Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:30.428395421Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:40.429100447Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:40.429871853Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:50.4268663Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T01:59:50.427329161Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:00.429149297Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:00.429205811Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:10.426857098Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:10.427290243Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:20.42638474Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:20.426901703Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:30.428885162Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:30.429373666Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:40.427093878Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:40.427622056Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:50.428691098Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:00:50.428743261Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:00.426355861Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:00.42685464Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:10.426208743Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:10.426710363Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:20.426872491Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:20.42731801Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:30.426612427Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:30.427084214Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:40.428796629Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:40.429400491Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:50.427001992Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:01:50.42827597Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:00.428013056Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:00.428469744Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:10.426711057Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:10.427247058Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:20.429136255Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:20.429208369Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:30.427158806Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:30.427593326Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:40.426389918Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:40.426875768Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:50.429551365Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:02:50.429619241Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:00.426621326Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:00.427126079Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:10.426301507Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:10.426803336Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:13.952615577Z caller=stdlib.go:105 caller=server.go:3215 msg="http: TLS handshake error from 10.130.0.1:52552: EOF"
ts=2023-11-06T02:03:20.426371089Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:20.426852234Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:30.428789504Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:30.428874536Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:40.427028458Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:40.427463333Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:50.429615112Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:03:50.429679407Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:00.4285878Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:00.429074488Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:10.4279579Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:10.428403727Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:20.426433063Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:20.426940057Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:30.428317498Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:30.428730147Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:40.42911069Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:40.429194383Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:50.42820753Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:04:50.428643464Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:05:00.427890872Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from main.newSrv.func1 (main.go:173)"
ts=2023-11-06T02:05:00.428356508Z caller=stdlib.go:105 caller=server.go:3215 msg="http: superfluous response.WriteHeader call from 
...

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-04-120954

How reproducible:

always

Steps to Reproduce:

1. check prometheus-operator-admission-webhook logs

Actual results:

verbose prometheus-operator-admission-webhook logs

https://github.com/openshift/prometheus-operator/pull/254

Bug OCPBUGS-25604: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/103

Bug OCPBUGS-37288: GCP cluster with CCO Passthrough mode failed to install due to CCO degraded

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36834~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-36140~~. The following is the description of the original issue:
—
Description of problem:

GCP private cluster with CCO Passthrough mode failed to install due to CCO degraded.
status:  
conditions:  - lastTransitionTime: "2024-06-24T06:04:39Z"    message: 1 of 7 credentials requests are failing to sync.    reason: CredentialsFailing    status: "True"    type: Degraded

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2024-06-21-203120

How reproducible:

Always

Steps to Reproduce:

    1.Create GCP private cluster with CCO Passthrough mode, flexy template is private-templates/functionality-testing/aos-4_13/ipi-on-gcp/versioned-installer-xpn-private     
    2.Wait for cluster installation

Actual results:

jianpingshu@jshu-mac ~ % oc get clusterversionNAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUSversion             False       False         23m     Error while reconciling 4.13.0-0.nightly-2024-06-21-203120: the cluster operator cloud-credential is degraded

status:  
conditions:  - lastTransitionTime: "2024-06-24T06:04:39Z"    message: 1 of 7 credentials requests are failing to sync.    reason: CredentialsFailing    status: "True"    type: Degraded

jianpingshu@jshu-mac ~ % oc -n openshift-cloud-credential-operator get -o json credentialsrequests | jq -r '.items[] | select(tostring | contains("InfrastructureMismatch") | not) | .metadata.name as $n | .status.conditions // [{type: "NoConditions"}] | .[] | .type + "=" + .status + " " + $n + " " + .reason + ": " + .message' | sortCredentialsProvisionFailure=True cloud-credential-operator-gcp-ro-creds CredentialsProvisionFailure: failed to grant creds: error while validating permissions: error testing permissions: googleapi: Error 400: Permission commerceoffercatalog.agreements.list is not valid for this resource., badRequest
NoConditions= openshift-cloud-network-config-controller-gcp :
NoConditions= openshift-gcp-ccm :
NoConditions= openshift-gcp-pd-csi-driver-operator :
NoConditions= openshift-image-registry-gcs :
NoConditions= openshift-ingress-gcp :
NoConditions= openshift-machine-api-gcp :

Expected results:

Cluster installed successfully without degrade

Additional info:

Some problem PROW CI tests: 
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.14-multi-nightly-gcp-ipi-user-labels-tags-filestore-csi-tp-arm-f14/1805064266043101184
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-4.14-upgrade-from-stable-4.13-gcp-ipi-xpn-fips-f28/1804676149503070208

https://github.com/openshift/cloud-credential-operator/pull/730

Bug OCPBUGS-41947: HCP: nodes never become available when workers require a proxy to access KAS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37937~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-37786~~. The following is the description of the original issue:
—
Description of problem:

In the use case when worker nodes require a proxy for outside access and the control plane is external (and only accessible via the internet), ovnkube-node pods never become available because the ovnkube-controller container cannot reach the Kube APIServer.

Version-Release number of selected component (if applicable):

How reproducible: Always

Steps to Reproduce:

1. Create an AWS hosted cluster with Public access and requires a proxy to access the internet.

2. Wait for nodes to become active

Actual results:

Nodes join cluster, but never become active

Expected results:

Nodes join cluster and become active

https://github.com/openshift/cluster-network-operator/pull/2503

Bug OCPBUGS-25772: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-libvirt/pull/278

Task HOSTEDCP-1405: Get RHTAP setup with release-4.15

View the Description View the linked PRs

Work with Kyl on getting RHTAP setup with release-4.15

https://github.com/openshift/hypershift/pull/3669

Bug OCPBUGS-18948: OLM CRD compatibility check logic is incorrect

View the Description View the linked PRs

Description of problem:

OLM is supposed to verify that an update to a CRD does not introduce validation that is more restrictive than what is currently in effect. The logic for this only works if a CRD uses a single spec.validation entry, but this is unlikely to ever be the case. Instead, most CRDs use per-version validation schemas.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create an operator that has a CRD with an entry in spec.versions along with spec.versions[].schema populated with some validation schema.
2. Create a CR
3. Attempt to upgrade to a newer version of the operator, where the CRD is updated to add a new version whose schema validation is more restrictive and will fail against the CR that was previously created

Actual results:

Upgrade succeeds

Expected results:

Upgrade fails

Additional info:

https://github.com/openshift/operator-framework-olm/pull/592

Bug OCPBUGS-24136: Update 4.15 vmware-vsphere-syncer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/99

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/99

Bug OCPBUGS-17060: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/908

Bug OCPBUGS-32060: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/564

Bug OCPBUGS-34927: Out-of-range error in AZ check

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34882~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34870~~. The following is the description of the original issue:
—

Description of problem:

In ~~OCPBUGS-30951~~, we modified a check used in the Cinder CSI Driver Operator to relax the requirements for enabling topology support. Unfortunately in doing this we introduced a bug: we now attempt to access the volume AZ for each compute AZ, which isn't valid if there are more compute AZs than volume AZS. This needs to be addressed.

Version-Release number of selected component (if applicable):

This affects 4.14 through to master (unreleased 4.17).

How reproducible:

Always.

Steps to Reproduce:

1. Deploy OCP-on-OSP on a cluster with fewer storage AZs than compute AZs

Actual results:

Operator fails due to out-of-range error.

Expected results:

Operator should not fail.

Additional info:

None.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/172

Bug OCPBUGS-5969: [Nutanix]No host has enough available memory for VM, machine stuck in Provisioning and machineset scale/delete cannot delete machines

View the Description View the linked PRs

Description of problem:

Nutanix machine without enough memory stuck in Provisioning and machineset scale/delete cannot work

Version-Release number of selected component (if applicable):

Server Version: 
4.12.0
4.13.0-0.nightly-2023-01-17-152326

How reproducible:

Always

Steps to Reproduce:

1. Install Nutanix Cluster 
Template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/tree/master/functionality-testing/aos-4_12/ipi-on-nutanix//versioned-installer
master_num_memory: 32768
worker_num_memory: 16384
networkType: "OVNKubernetes"
installer_payload_image: quay.io/openshift-release-dev/ocp-release:4.12.0-x86_64 2.
3. Scale up the cluster worker machineset from 2 replicas to 40 replicas
4. Install a Infra machinesets with 3 replicas, and a Workload machinesets with 1 replica
Refer to this doc https://docs.openshift.com/container-platform/4.11/machine_management/creating-infrastructure-machinesets.html#machineset-yaml-nutanix_creating-infrastructure-machinesets  and config the following resource
VCPU=16
MEMORYMB=65536
MEMORYSIZE=64Gi

Actual results:

1. The new infra machines stuck in 'Provisioning' status for about 3 hours.

% oc get machines -A | grep Prov                                               
openshift-machine-api   qili-nut-big-jh468-infra-48mdt      Provisioning                                      175m
openshift-machine-api   qili-nut-big-jh468-infra-jnznv      Provisioning                                      175m
openshift-machine-api   qili-nut-big-jh468-infra-xp7xb      Provisioning                                      175m

2. Checking the Nutanix web console, I found 
infra machine 'qili-nut-big-jh468-infra-jnznv' had the following msg
"
No host has enough available memory for VM qili-nut-big-jh468-infra-48mdt (8d7eb6d6-a71e-4943-943a-397596f30db2) that uses 4 vCPUs and 65536MB of memory. You could try downsizing the VM, increasing host memory, power off some VMs, or moving the VM to a different host. Maximum allowable VM size is approximately 17921 MB
"

infra machine 'qili-nut-big-jh468-infra-jnznv' is not round

infra machine 'qili-nut-big-jh468-infra-xp7xb' is in green without warning.
But In must gather I found some error:
03:23:49openshift-machine-apinutanixcontrollerqili-nut-big-jh468-infra-xp7xbFailedCreateqili-nut-big-jh468-infra-xp7xb: reconciler failed to Create machine: failed to update machine with vm state: qili-nut-big-jh468-infra-xp7xb: failed to get node qili-nut-big-jh468-infra-xp7xb: Node "qili-nut-big-jh468-infra-xp7xb" not found

3. Scale down the worker machineset from 40 replicas to 30 replicas can not work. Still have 40 Running worker machines and 40 Ready nodes after about 3 hours.

% oc get machinesets -A
NAMESPACE               NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   qili-nut-big-jh468-infra      3         3                             176m
openshift-machine-api   qili-nut-big-jh468-worker     30        30        30      30          5h1m
openshift-machine-api   qili-nut-big-jh468-workload   1         1                             176m

% oc get machines -A | grep worker| grep Running -c
40

% oc get nodes | grep worker | grep Ready -c
40

4. I delete the infra machineset, but the machines still in Provisioning status and won't get deleted

% oc delete machineset -n openshift-machine-api   qili-nut-big-jh468-infra
machineset.machine.openshift.io "qili-nut-big-jh468-infra" deleted

% oc get machinesets -A
NAMESPACE               NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   qili-nut-big-jh468-worker     30        30        30      30          5h26m
openshift-machine-api   qili-nut-big-jh468-workload   1         1                             3h21m

% oc get machines -A | grep -v Running
NAMESPACE               NAME                                PHASE          TYPE   REGION    ZONE              AGE
openshift-machine-api   qili-nut-big-jh468-infra-48mdt      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-infra-jnznv      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-infra-xp7xb      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-workload-qdkvd                                                     3h22m

Expected results:

The new infra machines should be either Running or Failed.
Cluster worker machinest scaleup and down should not be impacted.

Additional info:

must-gather download url will be added to the comment.

https://github.com/openshift/machine-api-provider-nutanix/pull/52

Bug OCPBUGS-33641: [ibm-vpc] Scheduling issue on IBM Cloud Bare Metal nodes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33329~~. The following is the description of the original issue:
—
Description of problem:

The DaemonSet code any taints to be ignored - therefore the Operator executes on the IBM Cloud Bare Metal

Version-Release number of selected component (if applicable):

IBM Cloud Infrastructure Services (formerly known as VPC Infrastructure Environment), using IBM Cloud Bare Metal profiles with either Gen2 (Intel Cascade Lake) or Gen3 (Intel Sapphire Rapids) hardware.
Special note - this refers to IBM Cloud Bare Metal, and NOT applicable to IBM Cloud Bare Metal (Classic) in the legacy Classic Infrastructure environment (AKA. Softlayer).

How reproducible:

    Reproducible

Steps to Reproduce:

IBM LAB team found a bug that is causing errors on the bare metal worker nodes, and is requesting a patch to ibm-vpc-block-csi-driver

The proposed solution, enforce Namespace to select nodes where instance-type NOT CONTAINS substring 'metal'. This will stop the Namespace's DaemonSet from scheduling the Operator on IBM Cloud Bare Metals:

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/manifests/01_namespace.yaml

```
kind: Namespace
apiVersion: v1
metadata:
  annotations:
    openshift.io/node-selector: 'node.openshift.io/instance-type notin (metal)'

Actual results:

Expected results:

enforce Namespace to select nodes where instance-type NOT CONTAINS substring 'metal'. This will stop the Namespace's DaemonSet from scheduling the Operator on IBM Cloud Bare Metals:

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/manifests/01_namespace.yaml

Additional info:

03802506

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/116

Bug OCPBUGS-19494: when ovn ipsec pod stop/restart it kills pluto preventing further IPsec IKE communication

View the Description View the linked PRs

Description of problem:

ipsec container kills pluto even if that was started by systemd

Version-Release number of selected component (if applicable):

on any 4.14 nightly

How reproducible:

every time

Steps to Reproduce:

1. enable N-S ipsec
2. enable E-W IPsec
3. kill/stop/delete one of the ipsec-host pods

Actual results:

pluto is killed on that host

Expected results:

pluto keeps running

Additional info:

https://github.com/yuvalk/cluster-network-operator/blob/37d1cc72f4f6cd999046bd487a705e6da31301a5/bindata/network/ovn-kubernetes/common/ipsec-host.yaml#L235
this should be removed

https://github.com/openshift/cluster-network-operator/pull/2015

Bug OCPBUGS-21741: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/368

Bug OCPBUGS-23305: Install should skip validate if apivip/ingressvip in different subnet with machine networks for ELB

View the Description View the linked PRs

Description of problem:

our ELB which is 10.1.235.128, however the machine host default network in another subnet. 192.168. then installation will be break with 
"
platform.baremetal.apiVIPs: Invalid value: "10.1.235.128": IP expected to be in one of the machine networks: 192.168.90.0/24, platform.baremetal.ingressVIPs: Invalid value: "10.1.235.128": IP expected to be in one of the machine networks: 192.168.90.0/24"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. setup cluster with loadbalncer is "usermanaged" type and apivip/ingressvip in different subnet with machine network CIDR
2.
3.

Actual results:

platform.baremetal.apiVIPs: Invalid value: "10.1.235.128": IP expected to be in one of the machine networks: 192.168.90.0/24, platform.baremetal.ingressVIPs: Invalid value: "10.1.235.128": IP expected to be in one of the machine networks: 192.168.90.0/24

Expected results:

for ELB, apivip/ingressVip may different subnet with machine network CIDR.

Additional info:

https://github.com/openshift/installer/pull/7803

Bug OCPBUGS-23855: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource-operator/pull/93

Bug OCPBUGS-24075: Update 4.15 ose-azure-file-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver/pull/44

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-file-csi-driver/pull/44

Bug OCPBUGS-17282: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/84

Bug OCPBUGS-18662: cnf-tests: [test_id: 55012] RPS configuration applied on some physical devices

View the Description View the linked PRs

Description of problem:
RPS configuration test failed with the following error:

[FAILED] Failure recorded during attempt 1:
a host device rps mask is different from the reserved CPUs; have "0" want ""
Expected
    <bool>: false
to be true
In [It] at: /tmp/cnf-ZdGbI/cnf-features-deploy/vendor/github.com/onsi/gomega/internal/assertion.go:62 @ 09/06/23 03:47:44.144
< Exit [It] [test_id:55012] Should have the correct RPS configuration - /tmp/cnf-ZdGbI/cnf-features-deploy/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/performance.go:337 @ 09/06/23 03:47:44.144 (39.949s)

Full report:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-telco5g-cnftests/1699249554244767744/artifacts/e2e-telco5g-cnftests/telco5g-cnf-tests/artifacts/test_results.html

How reproducible:

Very often

Steps to Reproduce:
1. Reproduce automatically by the cnf-tests nightly job

Actual results:
Some of the virtual devices are not configured with the correct RPS mask

Expected results:
All virtual network devices are expected to have the correct RPS mask

Bug OCPBUGS-19987: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1134

Bug OCPBUGS-23339: The name for ImageDigestMirrorSet created by oc-mirror is not valid

View the Description View the linked PRs

Description of problem:

The name for ImageDigestMirrorSet created by oc-mirror is not valid

Version-Release number of selected component (if applicable):

4.15

How reproducible:

always

Steps to Reproduce:

1. Use the idcp yaml file created by oc-mirror , will hit error

Actual results:
cat out/working-dir/cluster-resources/idms_2023-11-16T04\:04\:49Z.yaml
apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
creationTimestamp: null
name: idms_2023-11-16T04:04:49Z
spec:
imageDigestMirrors:
- mirrors:
- ec2-3-143-247-94.us-east-2.compute.amazonaws.com:5000/ocp/openshift-release-dev
source: quay.io/openshift-release-dev
- mirrors:
- ec2-3-143-247-94.us-east-2.compute.amazonaws.com:5000/ocp/openshift
source: localhost:5005/openshift
status: {}

oc create -f out/working-dir/cluster-resources/idms_2023-11-16T04\:04\:49Z.yaml
The ImageDigestMirrorSet "idms_2023-11-16T04:04:49Z" is invalid: metadata.name: Invalid value: "idms_2023-11-16T04:04:49Z": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9][a-z0-9])?(\.[a-z0-9]([-a-z0-9][a-z0-9])?)*')

Expected results:

name valid and no error.

Additional info:

https://github.com/openshift/oc-mirror/pull/743

Bug OCPBUGS-23783: After PatternFly5 update: Topology > Service binding misses application grouping

View the Description View the linked PRs

Issue and 45 and 55 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

When created operator-backed with service binding, the application group visual doesn't show up

Note: Is this really PF5-related, or does this issue exist already on 4.14?

Screenshot: https://drive.google.com/drive/u/1/folders/1OKeJ8PPGZi-1QyqQ184xQznmqii37NNB

https://github.com/openshift/console/pull/13376

Bug OCPBUGS-24106: Update 4.15 ose-cluster-kube-scheduler-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-scheduler-operator/pull/513

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-scheduler-operator/pull/513

Bug OCPBUGS-15215: OAuth template config in HostedCluter.configuration.oauth is not honored in HyperShift

View the Description View the linked PRs

Description of problem:

Standalone OpenShift allows customizing templates for OAuth via the oauth.config.openshift.io/cluster resource. In HyperShift, this is done via the HostedCluster.spec.configuration.oauth field. However, setting a reference to secrets in these fields does not take effect on a HyperShift cluster.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1.Create a HostedCluster and specify alternate templates for oauth via the HostedCluster.spec.configuration.oauth field.
2. View the oauth UI by attempting to log in to the OpenShift console.
3.

Actual results:

Different oauth templates do not take effect

Expected results:

Templates affect the look of the oauth login page

Additional info:

https://github.com/openshift/hypershift/pull/3041

Bug OCPBUGS-41838: Post SDN to OVN live migration shows intermittent service connectivity failures for OSD on GCP cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38697~~. The following is the description of the original issue:
—
Description of problem: After migrating cluster from SDN to OVN, seeing intermittent failures while accessing service.

Wed Jul 31 05:28:11 UTC 2024
Wed Jul 31 01:28:11 EDT 2024
Hello OpenShift!


Wed Jul 31 05:28:42 UTC 2024
Wed Jul 31 01:28:42 EDT 2024
curl: (7) Failed to connect to 34.92.142.227 port 27018 after 75006 ms: Couldn't connect to server


Wed Jul 31 05:30:27 UTC 2024
Wed Jul 31 01:30:27 EDT 2024
Hello OpenShift!



Wed Jul 31 05:31:59 UTC 2024
Wed Jul 31 01:31:59 EDT 2024
Hello OpenShift!


Wed Jul 31 05:33:31 UTC 2024
Wed Jul 31 01:33:31 EDT 2024
Hello OpenShift!


Wed Jul 31 05:34:01 UTC 2024
Wed Jul 31 01:34:01 EDT 2024
curl: (52) Empty reply from server


Wed Jul 31 05:38:51 UTC 2024
Wed Jul 31 01:38:51 EDT 2024
Hello OpenShift!

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.15.14
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.0-0.nightly-2024-07-29-053620
Kubernetes Version: v1.28.11+add48d0*no* further _formatting_ is done here

How reproducible:

Steps to Reproduce:

1. Create a 4.14 SDN OSD on GCP cluster

2. Upgrade to 4.15

3. Scale cluster to 24 nodes

4. Add cluster-density-v2 workload

5. Run migration and let if finish

6. Start seeing errors

Actual results: Intermittent failures accessing service.

Expected results: Live migration should not cause disruption to service.

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
For guidance on using this template please see
OCPBUGS Template Training for Networking components

https://github.com/openshift/ovn-kubernetes/pull/2298

Story CCO-192: Make CCO use only Lease object for leader election

View the Description View the linked PRs

This needs to wait until 4.12 branches, which should be June 24 per https://lists.corp.redhat.com/archives/aos-hive/2022-April/000006.html

https://github.com/openshift/cloud-credential-operator/pull/627

Bug OCPBUGS-19193: Update 4.15 ose-kubevirt-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-kubevirt/pull/25

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-kubevirt/pull/25

Bug OCPBUGS-30970: Upgrade from 4.15 to 4.16 fails because of kubelet reporting "Failed to register CRI auth plugins" error

View the Description View the linked PRs

Description of problem:

Upgrade from 4.15 to 4.16 is failing because kubelet reports this error:

Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411346    7755 kubelet.go:308] "Adding static pod path" path="/etc/kubernetes/manifests"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411380    7755 file.go:69] "Watching path" path="/etc/kubernetes/manifests"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411406    7755 kubelet.go:319] "Adding apiserver pod source"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411426    7755 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.414274    7755 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="cri-o" version="1.28.4-4.rhaos4.15.git92d1839.el8" apiVersion="v1"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: E0315 17:03:31.414963    7755 kuberuntime_manager.go:273] "Failed to register CRI auth plugins" err="plugin binary executable /usr/libexec/kubelet-image-credential-provider-plugins/acr-credential-provider did not exist"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: Failed to start Kubernetes Kubelet.
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Consumed 155ms CPU time





We have seen this issue in prow job periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-azure-ipi-workers-rhel8-f28 (a cluster with rhel workers) and in manual upgrades in IPI on GCP clusters (a cluster with coreos workers).

Version-Release number of selected component (if applicable):

 Upgrade from 4.15.3 to 4.16.0-0.nightly-2024-03-13-061822

oc get clusterversion -o yaml
...
    history:
    - acceptedRisks: |-
        Target release version="" image="registry.build04.ci.openshift.org/ci-op-wb5fkm5k/release@sha256:da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b" cannot be verified, but continuing anyway because the update was forced: unable to verify sha256:da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b against keyrings: verifier-public-key-redhat
        [2024-03-15T15:33:11Z: prefix sha256-da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b in config map signatures-managed: no more signatures to check, 2024-03-15T15:33:11Z: unable to retrieve signature from https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b/signature-1: no more signatures to check, 2024-03-15T15:33:11Z: unable to retrieve signature from https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b/signature-1: no more signatures to check, 2024-03-15T15:33:11Z: parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2024-03-15T15:33:11Z: serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2024-03-15T15:33:11Z: serial signature store wrapping config maps in openshift-config-managed with label "release.openshift.io/verification-signatures", serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check]
        Precondition "ClusterVersionRecommendedUpdate" failed because of "UnknownUpdate": RetrievedUpdates=True (), so the update from 4.15.3 to 4.16.0-0.nightly-2024-03-13-061822 is probably neither recommended nor supported.
      completionTime: null
      image: registry.build04.ci.openshift.org/ci-op-wb5fkm5k/release@sha256:da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b
      startedTime: "2024-03-15T15:33:28Z"
      state: Partial
      verified: false
      version: 4.16.0-0.nightly-2024-03-13-061822
    - completionTime: "2024-03-15T13:33:08Z"
      image: registry.build04.ci.openshift.org/ci-op-wb5fkm5k/release@sha256:8e8c6c2645553e6df8eb7985d8cb322f333a4152453e2aa85fff24ac5e0755b0
      startedTime: "2024-03-15T13:02:04Z"
      state: Completed
      verified: false
      version: 4.15.3

How reproducible:

Always

Steps to Reproduce:

    1. Upgrade from 4.15 to 4.16 using prow job periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-azure-ipi-workers-rhel8-f28 or an IPI on GCP cluster.

Actual results:

Worker nodes do not join the cluster when they are rebooted:

sh-4.4$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-b566c3af4e215e2a77e6f9d9e5a988de   True      False      False      3              3                   3                     0                      3h59m
worker   rendered-worker-21862c92d0f14a4842f6093f65571bd1   False     True       False      3              0                   0                     0                      3h59m

sh-4.4$ oc get nodes
NAME                                  STATUS                        ROLES                  AGE     VERSION
ci-op-wb5fkm5k-e450c-s6m96-master-0   Ready                         control-plane,master   4h5m    v1.29.2+a0beecc
ci-op-wb5fkm5k-e450c-s6m96-master-1   Ready                         control-plane,master   4h6m    v1.29.2+a0beecc
ci-op-wb5fkm5k-e450c-s6m96-master-2   Ready                         control-plane,master   4h6m    v1.29.2+a0beecc
ci-op-wb5fkm5k-e450c-s6m96-rhel-1     NotReady,SchedulingDisabled   worker                 3h17m   v1.28.7+6e2789b
ci-op-wb5fkm5k-e450c-s6m96-rhel-2     Ready                         worker                 3h17m   v1.28.7+6e2789b
ci-op-wb5fkm5k-e450c-s6m96-rhel-3     Ready                         worker                 3h17m   v1.28.7+6e2789b

In the NotReady node we can see this error in kubelet

Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411346    7755 kubelet.go:308] "Adding static pod path" path="/etc/kubernetes/manifests"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411380    7755 file.go:69] "Watching path" path="/etc/kubernetes/manifests"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411406    7755 kubelet.go:319] "Adding apiserver pod source"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411426    7755 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.414274    7755 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="cri-o" version="1.28.4-4.rhaos4.15.git92d1839.el8" apiVersion="v1"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: E0315 17:03:31.414963    7755 kuberuntime_manager.go:273] "Failed to register CRI auth plugins" err="plugin binary executable /usr/libexec/kubelet-image-credential-provider-plugins/acr-credential-provider did not exist"
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: Failed to start Kubernetes Kubelet.
Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Consumed 155ms CPU time

Expected results:

The upgrade should be executed without failures

Additional info:

In the first comment you can find the must-gather file and the journal.logs

Bug OCPBUGS-32202: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2121

Bug OCPBUGS-22003: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-workload-identity/pull/13

Bug OCPBUGS-29185: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2064

Bug OCPBUGS-35258: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/242

Bug OCPBUGS-38577: nodeip-configuration.service is disabled to start during system startup on RHCOS on OCP cluster nodes on AWS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34046~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-31572~~. The following is the description of the original issue:
—
Description of problem:

nodeip-configuration.service is disabled to start during system startup on RHCOS on OCP cluster nodes on AWS. Because of this kubelet is picking a wrong node-ip when the OCP node on AWS has two IP addresses assigned to two interfaces. 

$ grep nodeip-configuration.service sos_commands/systemd/systemctl_list-unit-files nodeip-configuration.service                                                         disabled        disabledCluster Version

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

- Install a 4.14 cluster on AWS 
- Login to any node and check that nodeip-configuration.service will be found disabled for system startup.

https://github.com/openshift/cloud-provider-aws/pull/92

Bug OCPBUGS-42384: Unable to edit "until" in silences (of alerts) from the Developer perspective

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41996~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34533~~. The following is the description of the original issue:
—
Description of problem:

   I have a CU who reported that they are not able to edit the "Until" option from developers perspective.

Version-Release number of selected component (if applicable):

    OCP v4.15.11

Screenshot
https://redhat-internal.slack.com/archives/C04BSV48DJS/p1716889816419439

https://github.com/openshift/console/pull/14322

Bug OCPBUGS-33166: When creating a Network attachment Definition it always puts it in default from GUI

View the Description View the linked PRs

Description of problem:

Working with a customer on 4.15.9 and every time they try to create a Network Attachment Definition from the GUI it will always drop it into the Default namespace and change your current project to Default

Version-Release number of selected component (if applicable):

4.15.9

How reproducible:

very

Steps to Reproduce:

    1. In the GUI create new project or change to project
    2. go to NetworkAttachmentDefinitions
    3. Create new NetworkAttachmentDefinitions      
    4. Save
    5. NetworkAttachmentDefinitions is now in the Default namespace and you have been switched to it as well.

Actual results:

NetworkAttachmentDefinitions is in Default namespace and user is switched to default

Expected results:

NetworkAttachmentDefinitions is in whatever namespace you were in when you clicked create

Additional info:

https://github.com/openshift/console/pull/13865

Bug OCPBUGS-19132: Update 4.15 csi-livenessprobe image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/47

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/47

Bug OCPBUGS-20407: Builder fails to expose repository secrets for RUN

View the Description View the linked PRs

Description of problem:

When setting up transient mounts, which are used for exposing CA certificates and RPM package repositories to a build, a recent change we made in the builder attempted to replace simple bind mounts with overlay mounts.  While this might have made things easier for unprivileged builds, we overlooked that overlay mounts can't be made to files, only directories, so we need to revert the change.

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Always

Steps to Reproduce:

Per https://redhat-internal.slack.com/archives/C014MHHKUSF/p1696882408656359?thread_ts=1696882334.352129&cid=C014MHHKUSF,
1. oc new-app - l app=pvg-nodejs --name pvg-nodejs pvg-nodejs https://github.com/openshift/nodejs-ex.git

Actual results:

mount /var/lib/containers/storage/overlay-containers/9c3877f3062cc18b01f30db310e0e2bd0a1cd4527d74f41c313399e48fa81d23/userdata/overlay/145259665/merge:/run/secrets/redhat.repo (via /proc/self/fd/6), data: lowerdir=/tmp/redhat.repo-copy2014834134/redhat.repo,upperdir=/var/lib/containers/storage/overlay-containers/9c3877f3062cc18b01f30db310e0e2bd0a1cd4527d74f41c313399e48fa81d23/userdata/overlay/145259665/upper,workdir=/var/lib/containers/storage/overlay-containers/9c3877f3062cc18b01f30db310e0e2bd0a1cd4527d74f41c313399e48fa81d23/userdata/overlay/145259665/work: *invalid argument*"

Expected results:

Successful setup for a transient mount to the redhat.repo file for a RUN instruction.

Additional info:

Bug introduced in https://github.com/openshift/builder/pull/349, should be fixed in https://github.com/openshift/builder/pull/359.

https://github.com/openshift/builder/pull/359

Bug OCPBUGS-22113: ARO builds should not generate azure-cloud-provider credentials in Manual mode

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

4.14.0 and 4.15.0

How reproducible:

Every time.

Steps to Reproduce:

1. git clone https://github.com/openshift/installer.git
2. export TAGS=aro
3. hack/build.sh
4. export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="${RELEASE_IMAGE}"
5. export OPENSHIFT_INSTALL_INVOKER="ARO"
6. Run ccoctl to generate ID resources
7. ./openshift-install create manifests
8. ./openshift-install create cluster --log-level=debug

Actual results:

azure-cloud-provider gets generated with aadClientId = service principal clientID used by the installer.

Expected results:

This step should be skipped and kube-controller-manager should rely on file assets.

Additional info:

Open pull request: https://github.com/openshift/installer/pull/7608

https://github.com/openshift/installer/pull/7608

Bug OCPBUGS-23377: Cluster-version operator "Running sync"/"Done syncing" steady-state log volume

View the Description View the linked PRs

Description of problem:

The cluster-version operator is very chatty, and this can cause problems in clusters where logs are shipped off to external storage. We worked on this in rbhz#2034493, which taught 4.10 and later to move to level 2 logging, mostly to drop the client-side throttling messages. And we have been pushing OTA-923 to make logging tunable, to avoid the need to make "will we want to hear about this?" decisions in one place for all clusters at all times. But there is interest in reducing the amount of logging in older releases in ways that do not require a tunable knob, and this bug tracks another step in that direction: the Running sync / Done syncing messages.

h2 Version-Release number of selected component (if applicable):

All 4.y releases log these lines at high volume, but 4.10 and earlier are end-of-life, and 4.11 and 4.12 are in maintenance mode.

How reproducible:

Every time.

Steps to Reproduce:

1. Install a cluster.
2. Wait at least 30m since install or the most recent update completes, because we want the CVO to be chatty during those exciting times, and this bug is about steady-state log volume.
3. Collect CVO logs for the past 30m: oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 >cvo.log.

Actual results:

$ oc adm upgrade
Cluster version is 4.13.21
...
$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 > cvo.log
$ grep -o 'apply.*in state.*' cvo.log | uniq -c
     10 apply: 4.13.21 on generation 77 in state Reconciling at attempt 0
$ wc cvo.log 
  20043  242930 3071956 cvo.log
$ sed -n 's/^.* \([^ ]*[.]go:[0-9]*\).*/\1/p' cvo.log | sort | uniq -c | sort -n | tail -n5
    194 sync_worker.go:490
    314 sync_worker.go:978
    807 task_graph.go:477
   7971 sync_worker.go:1007
   7973 sync_worker.go:987
$ grep 'sync_worker.go:987' cvo.log | tail -n2
I1116 22:10:08.739999       1 sync_worker.go:987] Running sync for serviceaccount "openshift-cloud-credential-operator/cloud-credential-operator" (271 of 842)
I1116 22:10:08.785081       1 sync_worker.go:987] Running sync for flowschema "openshift-apiserver" (457 of 842)
$ grep 'sync_worker.go:1007' cvo.log | tail -n2
I1116 22:10:08.739967       1 sync_worker.go:1007] Done syncing for configmap "openshift-cloud-credential-operator/cco-trusted-ca" (270 of 842)
I1116 22:10:08.785043       1 sync_worker.go:1007] Done syncing for flowschema "openshift-apiserver-sar" (456 of 842)

So that's 3071956 bytes / 30 minutes * 60 minutes / 1 hour ~= 6 MB / hour, the bulk of which is Running sync and Done syncing logs.

Expected results:

$ grep -v 'sync_worker.go:\(987\|1007\)]' cvo.log | wc
   4099   51602  861709

So something closer to 861709 bytes / 30 minutes * 60 minutes / 1 hour ~= 2 MB / hour would be acceptable.

Additional info:

The CVO has a randomized sleep to cool off between sync cycles, and per-sync-cycle log volume will depend on (among other things) what that CVO container happened to choose for that sleep.

https://github.com/openshift/cluster-version-operator/pull/997

Bug OCPBUGS-27596: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-rukpak/pull/72

Bug OCPBUGS-37813: When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37707~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-29664~~. The following is the description of the original issue:
—
Description of problem:

Created Net-attach-def with 2 IPs in range. After that created deployment with 2 replicas using that net-attach-def. Whereabouts daemoneset is created also cronjob is enable reconsiling at every one min. 
When i poweroff the node one which one of pod is deployded gracefully(poweroff)/ungracefully(poweroff --force) new pod is getting created on healthy node and stuck in container creating state

Version-Release number of selected component (if applicable):

    4.14.11

How reproducible:

- Create whereabout daemon set with help of [documentation]([https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network)]
- Update the reconciler_cron_expression to: "*/1 * * * *"
- Create net-attach-def with 2 IPs in range
- Create deployment with 2 replicas
- Powreoff the node on which on of the POD is running
- New Pod spawned on new healthy node with Container Creating in status.

Steps to Reproduce:

1. On fresh cluster with version 4.14.11
2. Create whereabout daemon set with help of documentation   
3. Update the reconciler_cron_expression to: "*/1 * * * *"
$ oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/1 * * * *"

4. Create new project
$ oc new-project nadtesting

5. Apply below nad.yaml
$ cat nad.yaml 
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-net-attach1
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "br-ex",
      "mode": "bridge",
      "ipam": {
        "type": "whereabouts",
        "datastore": "kubernetes",
        "range": "172.17.20.0/24",
        "range_start": "172.17.20.11",
        "range_end": "172.17.20.12"
      }
    }'

6. Create deployment using net-attach-def with two replica,
$ cat naddeployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment1
  labels:
    app: macvlan1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: macvlan1
  template:
    metadata:
      annotations:
           k8s.v1.cni.cncf.io/networks: macvlan-net-attach1
      labels:
        app: macvlan1
    spec:
      containers:
      - name: google
        image: gcr.io/google-samples/kubernetes-bootcamp:v1
        ports:
        - containerPort: 8080

7. Two Pod will be created
$ oc get pods -o wide
NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE                                       NOMINATED NODE   READINESS GATES
deployment1-fbfdf5cbc-d6sgr   1/1     Running   0          15m   10.129.2.9    ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2   <none>           <none>
deployment1-fbfdf5cbc-njkpz   1/1     Running   0          15m   10.128.2.16   ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh   <none>           <none>

8. Power off the node using debug
$ oc debug node/ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh 
# chroot /host
# shutdown

9. Wait for sometime new pod will created on healthy node which stuck in containercreating 
$ oc get pod -o wide
NAME                          READY   STATUS              RESTARTS   AGE     IP            NODE                                       NOMINATED NODE   READINESS GATES
deployment1-fbfdf5cbc-6cb8d   0/1     ContainerCreating   0          9m53s   <none>        ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk   <none>           <none>
deployment1-fbfdf5cbc-d6sgr   1/1     Running             0          28m     10.129.2.9    ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2   <none>           <none>
deployment1-fbfdf5cbc-njkpz   1/1     Terminating         0          28m     10.128.2.16   ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh   <none>           <none>

10. Node status just for reference,
$ oc get nodes  
NAME                                       STATUS     ROLES                  AGE   VERSION
ci-ln-xvfy762-c1627-h7xzk-master-0         Ready      control-plane,master   59m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-master-1         Ready      control-plane,master   59m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-master-2         Ready      control-plane,master   58m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh   NotReady   worker                 43m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk   Ready      worker                 43m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2   Ready      worker                 43m   v1.27.10+28ed2d

Actual results:

Shutdown node's pod stuck in terminating state and not releasing IP. New Pod is stuck in container creating status.

Expected results:

New Pod should start smoothly on new-node.

Additional info:

- Just for information : If i follow manual approach the this issue will resolve for that i need to follow this step
1. remove that termination IP from overlapping
$ oc delete overlappingrangeipreservations.whereabouts.cni.cncf.io <IP>

2. remove that termination IP from ippools.whereabouts.cni.cncf.io
$ oc edit ippools.whereabouts.cni.cncf.io <IP Pool> 
Remove that stale IP from list

Also, the whereabouts-reconciler logs on the Terminating pod's node report:
2024-02-19T10:48:00Z [debug] Added IP 172.17.20.12 for pod nadtesting/deployment1-fbfdf5cbc-njkpz
2024-02-19T10:48:00Z [debug] the IP reservation: IP: 172.17.20.12 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-njkpz
2024-02-19T10:48:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-njkpz matches allocation; Allocation IP: 172.17.20.12; PodIPs: map[172.17.20.12:{}]
2024-02-19T10:48:00Z [debug] no IP addresses to cleanup
2024-02-19T10:48:00Z [verbose] reconciler success

i.e. it fails to recognize the need to remove the allocation.

https://github.com/openshift/whereabouts-cni/pull/308

Bug OCPBUGS-23263: Update i18next-parser dev dependency in console

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13330

Bug OCPBUGS-24583: push violation regression check into the default requirement result

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28433

Bug OCPBUGS-36030: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/720

Bug OCPBUGS-34904: Removing old weak ciphers from security profile for Hypershift hosted cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34801~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-30986~~. The following is the description of the original issue:
—
Description of problem:

After we applied the old tlsSecurityProfile to the Hypershift hosted clsuter, the apiserver ran into CrashLoopBackOff failure, this blocked our test.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-03-13-061822   True        False         129m    Cluster version is 4.16.0-0.nightly-2024-03-13-061822

How reproducible:

    always

Steps to Reproduce:

    1. Specify KUBECONFIG with kubeconfig of the Hypershift management cluster
    2. hostedcluster=$( oc get -n clusters hostedclusters -o json | jq -r .items[].metadata.name)
    3. oc patch hostedcluster $hostedcluster -n clusters --type=merge -p '{"spec": {"configuration": {"apiServer": {"tlsSecurityProfile":{"old":{},"type":"Old"}}}}}'
hostedcluster.hypershift.openshift.io/hypershift-ci-270930 patched
    4. Checked the tlsSecurityProfile,
    $ oc get HostedCluster $hostedcluster -n clusters -ojson | jq .spec.configuration.apiServer
{
  "audit": {
    "profile": "Default"
  },
  "tlsSecurityProfile": {
    "old": {},
    "type": "Old"
  }
}

Actual results:

One of the kube-apiserver of Hosted cluster ran into CrashLoopBackOff, stuck in this status, unable to complete the old tlsSecurityProfile configuration.

$ oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster}
NAME                              READY   STATUS             RESTARTS      AGE
kube-apiserver-5b6fc94b64-c575p   5/5     Running            0             70m
kube-apiserver-5b6fc94b64-tvwtl   5/5     Running            0             70m
kube-apiserver-84c7c8dd9d-pnvvk   4/5     CrashLoopBackOff   6 (20s ago)   7m38s

Expected results:

    Applying the old tlsSecurityProfile should be successful.

Additional info:

   This also can be reproduced on 4.14, 4.15. We have the last passed log of the test case as below:
  passed      API_Server       2024-02-19 13:34:25(UTC)    aws 	4.14.0-0.nightly-2024-02-18-123855   hypershift 	
  passed      API_Server	  2024-02-08 02:24:15(UTC)   aws 	4.15.0-0.nightly-2024-02-07-062935 	hypershift
  passed      API_Server	  2024-02-17 08:33:37(UTC)   aws 	4.16.0-0.nightly-2024-02-08-073857 	hypershift

From the history of the test, it seems that some code changes were introduced in February that caused the bug.

Bug OCPBUGS-37158: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/coredns/pull/125

Task MON-3302: Request for RHACS telemetry metrics

View the Description View the linked PRs

Request for sending data via telemetry

The goal is to collect metrics about RHACS installations to capture billing and and overall usage metrics for the product. We would also like to request a backport of the telemeter config to existing OpenShift cluster versions such that telemetry metrics become available sooner as they provide critical information to our product management.

rhacs:telemetry:rox_central_secured_clusters

rhacs:telemetry:rox_central_secured_nodes

rhacs:telemetry:rox_central_secured_vcpu

Central is the main backend component of RHACS ("hub"). The metrics shows installation info about Central, as well as usage data via three gauges (secured clusters, secured nodes, secured vCPU). This is a recording rule where unnecessary labels like instance and job have already been removed.

Labels

branding: StackRox, RHACS
build: internal, release
central_id: uuid that identifies the Central instance
central_version: RHACS product version (e.g. 4.2.0)
hosting: cloud-service, self-managed
install_method: manifest, helm, rhacs-operator

rhacs:telemetry:rox_sensor_nodes

rhacs:telemetry:rox_sensor_vcpu

Sensor is a component installed on clusters managed by RHACS. The metrics shows installation info about Sensor, as well as usage data via two gauges (secured nodes, secured vCPU). The cardinality of the metric series is 1. This is a recording rule where unnecessary labels like instance and job have already been removed.

branding: StackRox, RHACS
build: internal, release
central_id: uuid that identifies the Central instance
hosting: cloud-service, self-managed
install_method: manifest, helm, rhacs-operator
sensor_id: uuid that identifies the Sensor instance
sensor_version: RHACS product version (e.g. 4.2.0)

The cardinality of the metrics per cluster is 1.

https://github.com/openshift/cluster-monitoring-operator/pull/2062

Bug OCPBUGS-18517: Fail to install with Kuryr due to issue when validating certificate for the API

View the Description View the linked PRs

Description of problem:

Installation with Kuryr is failing because multiple components are attempting to connect to the API and fail with the following error:

failed checking apiserver connectivity: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-service-ca/leases/service-ca-controller-lock": tls: failed to verify certificate: x509: cannot validate certificate for 172.30.0.1 because it doesn't contain any IP SANs

$ oc get po -A -o wide |grep -v Running |grep -v Pending |grep -v Completed
NAMESPACE                                          NAME                                                        READY   STATUS             RESTARTS          AGE     IP              NODE                   NOMINATED NODE   READINESS GATES
openshift-apiserver-operator                       openshift-apiserver-operator-559d855c56-c2rdr               0/1     CrashLoopBackOff   42 (2m28s ago)    3h44m   10.128.16.86    kuryr-5sxhw-master-2   <none>           <none>
openshift-apiserver                                apiserver-6b9f5d48c4-bj6s6                                  0/2     CrashLoopBackOff   92 (4m25s ago)    3h36m   10.128.70.10    kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-csi-drivers                      manila-csi-driver-operator-75b64d8797-fckf5                 0/1     CrashLoopBackOff   42 (119s ago)     3h41m   10.128.56.21    kuryr-5sxhw-master-0   <none>           <none>
openshift-cluster-csi-drivers                      openstack-cinder-csi-driver-operator-84dfd8d89f-kgtr8       0/1     CrashLoopBackOff   42 (82s ago)      3h41m   10.128.56.9     kuryr-5sxhw-master-0   <none>           <none>
openshift-cluster-node-tuning-operator             cluster-node-tuning-operator-7fbb66545c-kh6th               0/1     CrashLoopBackOff   46 (3m5s ago)     3h44m   10.128.6.40     kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-storage-operator                 cluster-storage-operator-5545dfcf6d-n497j                   0/1     CrashLoopBackOff   42 (2m23s ago)    3h44m   10.128.21.175   kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-storage-operator                 csi-snapshot-controller-ddb9469f9-bc4bb                     0/1     CrashLoopBackOff   45 (2m17s ago)    3h41m   10.128.20.106   kuryr-5sxhw-master-1   <none>           <none>
openshift-cluster-storage-operator                 csi-snapshot-controller-operator-6d7b66dbdd-xdwcs           0/1     CrashLoopBackOff   42 (92s ago)      3h44m   10.128.21.220   kuryr-5sxhw-master-2   <none>           <none>
openshift-config-operator                          openshift-config-operator-c5d5d964-2w2bv                    0/1     CrashLoopBackOff   80 (3m39s ago)    3h44m   10.128.43.39    kuryr-5sxhw-master-2   <none>           <none>
openshift-controller-manager-operator              openshift-controller-manager-operator-754d748cf7-rzq6f      0/1     CrashLoopBackOff   42 (3m6s ago)     3h44m   10.128.25.166   kuryr-5sxhw-master-2   <none>           <none>
openshift-etcd-operator                            etcd-operator-76ddc94887-zqkn7                              0/1     CrashLoopBackOff   49 (30s ago)      3h44m   10.128.32.146   kuryr-5sxhw-master-2   <none>           <none>
openshift-ingress-operator                         ingress-operator-9f76cf75b-cjx9t                            1/2     CrashLoopBackOff   39 (3m24s ago)    3h44m   10.128.9.108    kuryr-5sxhw-master-2   <none>           <none>
openshift-insights                                 insights-operator-776cd7cfb4-8gzz7                          0/1     CrashLoopBackOff   46 (4m21s ago)    3h44m   10.128.15.102   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-apiserver-operator                  kube-apiserver-operator-64f4db777f-7n9jv                    0/1     CrashLoopBackOff   42 (113s ago)     3h44m   10.128.18.199   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-apiserver                           installer-5-kuryr-5sxhw-master-1                            0/1     Error              0                 3h35m   10.128.68.176   kuryr-5sxhw-master-1   <none>           <none>
openshift-kube-controller-manager-operator         kube-controller-manager-operator-746497b-dfbh5              0/1     CrashLoopBackOff   42 (2m23s ago)    3h44m   10.128.13.162   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-controller-manager                  installer-4-kuryr-5sxhw-master-0                            0/1     Error              0                 3h35m   10.128.65.186   kuryr-5sxhw-master-0   <none>           <none>
openshift-kube-scheduler-operator                  openshift-kube-scheduler-operator-695fb4449f-j9wqx          0/1     CrashLoopBackOff   42 (63s ago)      3h44m   10.128.44.194   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-scheduler                           installer-5-kuryr-5sxhw-master-0                            0/1     Error              0                 3h35m   10.128.60.44    kuryr-5sxhw-master-0   <none>           <none>
openshift-kube-storage-version-migrator-operator   kube-storage-version-migrator-operator-6c5cd46578-qpk5z     0/1     CrashLoopBackOff   42 (2m18s ago)    3h44m   10.128.4.120    kuryr-5sxhw-master-2   <none>           <none>
openshift-machine-api                              cluster-autoscaler-operator-7b667675db-tmlcb                1/2     CrashLoopBackOff   46 (2m53s ago)    3h45m   10.128.28.146   kuryr-5sxhw-master-2   <none>           <none>
openshift-machine-api                              machine-api-controllers-fdb99649c-ldb7t                     3/7     CrashLoopBackOff   184 (2m55s ago)   3h40m   10.128.29.90    kuryr-5sxhw-master-0   <none>           <none>
openshift-route-controller-manager                 route-controller-manager-d8f458684-7dgjm                    0/1     CrashLoopBackOff   43 (100s ago)     3h36m   10.128.55.11    kuryr-5sxhw-master-2   <none>           <none>
openshift-service-ca-operator                      service-ca-operator-654f68c77f-g4w55                        0/1     CrashLoopBackOff   42 (2m2s ago)     3h45m   10.128.22.30    kuryr-5sxhw-master-2   <none>           <none>
openshift-service-ca                               service-ca-5f584b7d75-mxllm                                 0/1     CrashLoopBackOff   42 (45s ago)      3h42m   10.128.49.250   kuryr-5sxhw-master-0   <none>           <none>

$ oc get svc -A |grep  172.30.0.1 
default                                            kubernetes                                       ClusterIP   172.30.0.1       <none>        443/TCP                           3h50m

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1988

Bug OCPBUGS-19094: Update 4.15 ose-multus-admission-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/multus-admission-controller/pull/69

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/multus-admission-controller/pull/69

Bug OCPBUGS-24014: Reduce shared informer mermory usage

View the Description View the linked PRs

Reduce shared informer memory usage by stripping object fields we don't care about.

https://github.com/openshift/ovn-kubernetes/pull/1962

Bug OCPBUGS-24116: Update 4.15 ose-machine-api-provider-aws-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-aws/pull/93

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-aws/pull/93

Bug OCPBUGS-29787: excessive Back-off restarting failed containers

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27760~~. The following is the description of the original issue:
—
I noticed this today when looking at component readiness. A ~5% decrease in instability may seem minor, but these can certainly add up. This test passed 713 times in a row on 4.14. You can see today's failure here.

Details below:

-------

Component Readiness has found a potential regression in [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers.

Probability of significant regression: 99.96%

Sample (being evaluated) Release: 4.15
Start Time: 2024-01-17T00:00:00Z
End Time: 2024-01-23T23:59:59Z
Success Rate: 94.83%
Successes: 55
Failures: 3
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 100.00%
Successes: 713
Failures: 0
Flakes: 4

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=ovn%20upgrade-minor%20amd64%20gcp%20rt&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=gcp&platform=gcp&sampleEndTime=2024-01-23%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-01-17%2000%3A00%3A00&testId=openshift-tests-upgrade%3A37f1600d4f8d75c47fc5f575025068d2&testName=%5Bsig-cluster-lifecycle%5D%20pathological%20event%20should%20not%20see%20excessive%20Back-off%20restarting%20failed%20containers&upgrade=upgrade-minor&upgrade=upgrade-minor&variant=rt&variant=rt

https://github.com/openshift/cluster-baremetal-operator/pull/405

Bug OCPBUGS-35586: GHSA-6wvf-f2vw-3425: ose-installer-container: containers/image allows unexpected authenticated registry accesses

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35527~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34037~~. The following is the description of the original issue:
—
Open Github Security Advisory for: containers/image

https://github.com/advisories/GHSA-6wvf-f2vw-3425

The ARO SRE team became aware of this advisory against our installer fork. Upstream installer is also pinning a vulnerable version of containerd.

Advisory recommends to update to versions 5.30.1

https://github.com/openshift/installer/pull/8621

Bug OCPBUGS-38712: [4.15.z] SCC pinning for all workloads in platform namespaces (openshift-*-infra)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the openshift-*-infra.

Namespaces with workloads that need pinning:

openshift-kni-infra
openshift-openstack-infra
openshift-vsphere-infra

See 4.17 PR for more info on what needs pinning.

https://github.com/openshift/machine-config-operator/pull/4540

Bug OCPBUGS-7638: Adding a 2nd default route to an OCP Cluster

View the Description View the linked PRs

Description of problem:

We have a customer on OCP 4.10.47, using OVN-K8S in local gateway mode requiring either updating or adding an additional default route.  The question we have is whether there is a way to do this using the interface hints such that the new default route would have a higher/better priority then the day-0 default route and on node reboot and/or cluster upgrade, this does not affect OVN (based on the interface hints, OVN can use the original default route even though it would have a lower priority..

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-18392: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1989

Bug OCPBUGS-23290: CannotRetrieveUpdates should provide command-line next-step advice

View the Description View the linked PRs

Description of problem:

The CannotRetrieveUpdates alert currently provides a link to the web-console so the responding admin can find the RetrievedUpdates=False message. But some admins lack convenient console access (e.g. they're SSHing in to a restricted network or the cluster does not have the Console capability enabled. Those admins would benefit from oc ... command-line advice.

Version-Release number of selected component (if applicable):

The alert is new in 4.6:

$ for Y in $(seq 5 12); do git --no-pager grep CannotRetrieveUpdates "origin/release-4.${Y}"; done | head -n1
origin/release-4.6:docs/user/status.md:When CVO is unable to retrieve recommended updates the CannotRetrieveUpdates alert will fire containing the reason. This alert will not fire when the reason updates cannot be retrieved is NoChannel.

and has never provided command-line advice.

How reproducible:

Consistently.

Steps to Reproduce:

1. Install a cluster.
2. Set an impossible channel, such as oc adm upgrade channel testing.
3. Wait an hour.
4. Check firing alerts in /monitoring/alerts.
5. Click through to CannotRetrieveUpdates.

Actual results:

Failure to retrieve updates means that cluster administrators...

description does not provide oc ... advice.

Expected results:

Failure to retrieve updates means that cluster administrators...

description does provide oc ... advice.

https://github.com/openshift/cluster-version-operator/pull/995

Bug OCPBUGS-30180: Export external platform to telemetry 4.15

View the Description View the linked PRs

https://issues.redhat.com//browse/MGMT-16984

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1639

Story AGENT-337: Support both IPv4 and IPv6 VIPs simultaneously

View the Description View the linked PRs

A change to the installConfig in 4.12 means a user can now specify both an IPv4 and IPv6 address for the API and/or Ingress VIPs when running dual-stack on the baremetal or vsphere platforms. (Previously, only an IPv4 VIP could be used on dual-stack clusters.)

Once the assisted-service and ZTP support this, we'll want to allow passing that information through.

Bug OCPBUGS-26239: pathological events test failed multiple times for ns/openshift-kube-scheduler

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24537~~. The following is the description of the original issue:
—
Description of problem:

    4.15 nightly payloads have been affected by this test multiple times:

: [sig-arch] events should not repeat pathologically for ns/openshift-kube-scheduler expand_less0s{ 1 events happened too frequently

event happened 21 times, something is wrong: namespace/openshift-kube-scheduler node/ci-op-2gywzc86-aa265-5skmk-master-1 pod/openshift-kube-scheduler-guard-ci-op-2gywzc86-aa265-5skmk-master-1 hmsg/2652c73da5 - reason/ProbeError Readiness probe error: Get "https://10.0.0.7:10259/healthz": dial tcp 10.0.0.7:10259: connect: connection refused result=reject
body:
 From: 08:41:08Z To: 08:41:09Z}

In each of the 10 jobs aggregated, 2 to 3 jobs failed with this test. Historically this test passed 100%. But with the past two days test data, the passing rate has dropped to 97% and aggregator started allowing this in the latest payload: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-azure-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1732295947339173888

The first payload this started appearing is https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-12-05-071627.

All the events happened during cluster-operator/kube-scheduler progressing.

For comparison, here is a passed job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1731936539870498816

Here is a failed one: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1731936538192777216

They both have the same set of probe error events. For the passing jobs, the frequency is lower than 20, while for the failed job, one of those events repeated more than 20 times and therefore results in the test failure.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28504

Bug OCPBUGS-29166: [4.15] configure-ovs.sh fails to correctly bring up "br-ex" connection during upgrade of OpenShift from 4.12.15 to 4.12.42

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24356~~. The following is the description of the original issue:
—

Description of problem:

After updating the cluster to 4.12.42 (from 4.12.15), the customer noticed some issues for the scheduled PODs to start on the node.

The initial thought was a multus issue, and then we realised that the script /usr/local/bin/configure-ovs.sh was modified and reverting the modification fixed the issue.

Modification:

>     if nmcli connection show "$vlan_parent" &> /dev/null; then
>       # if the VLAN connection is configured with a connection UUID as parent, we need to find the underlying device
>       # and create the bridge against it, as the parent connection can be replaced by another bridge.
>       vlan_parent=$(nmcli --get-values GENERAL.DEVICES conn show ${vlan_parent})
>     fi

Reference:

Version-Release number of selected component (if applicable):

4.12.42

How reproducible:

Should be reproducible by setting inactive nmcli connections with the same names as the active once

Steps to Reproduce:

Not tested, but this should be something like
1. create inactive same nmcli connections
2. run the script

Actual results:

Script failing

Expected results:

Script should manage the connection using the UUID instead of using the Name.
Or maybe it's an underline issue how nmcli is managing the relationship between objects.

Additional info:

The issue may be related to the way that nmcli is working, as it should use the UUID to match the `vlan.parent` as it does with the `connection.master`

https://github.com/openshift/machine-config-operator/pull/4171

Bug OCPBUGS-29844: Long catalog source display name will flow out of tile on operatorhub page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25771~~. The following is the description of the original issue:
—
Description of problem:

Check on OperatorHub page, the long catalogsource display name will overflow the operator item tile

    Version-Release number of selected component (if applicable):{code:none}
4.15.0-0.nightly-2023-12-19-033450

How reproducible:

Always

Steps to Reproduce:

    1. Create a catalogsource with a long display name.
    2. Check operator items supplied by the created catalogsource on OperatorHub page
    3.

Actual results:

2. The catalogsource display name overflows from the item tile

Expected results:

2. Show show catalogsource display name in the item tile dynamically without overflow.

Additional info:

screenshot: https://drive.google.com/file/d/1GOHJOxoBmtZX3QWDsIvc2RT5a2inkpzM/view?usp=sharing

https://github.com/openshift/console/pull/13629

Bug OCPBUGS-30581: [HCP] PSA labels on namespaces in HyperShift guest cluster enforce "restricted" while OCP of same version is good without such issue

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30580~~. The following is the description of the original issue:
—
Description of problem:

KAS labels on projects created should be consistent with OCP - enforce: privileged

Version-Release number of selected component (if applicable):

4.16.0

How reproducible:

See https://issues.redhat.com/browse/OCPBUGS-20526.

Steps to Reproduce:

See https://issues.redhat.com/browse/OCPBUGS-20526.

Actual results:

See https://issues.redhat.com/browse/OCPBUGS-20526.

Expected results:

See https://issues.redhat.com/browse/OCPBUGS-20526.

Additional info:

See https://issues.redhat.com/browse/OCPBUGS-20526.

https://github.com/openshift/hypershift/pull/3684

Bug TRT-1340: openshift-test run command should offer options to disable/enable monitor tests

View the Description View the linked PRs

Some IBM jobs using openshift-test run failed due to recent monitor refactor. They request command options to disable monitor tests in openshift-test run. This is already implemented in openshift-test run-monitor.

IBM-Roks needs this, will link to slack thread

https://github.com/openshift/origin/pull/28371

Bug OCPBUGS-22358: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28353

Bug OCPBUGS-31801: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-kubevirt/pull/39

Bug OCPBUGS-38023: [4.15.z] SCC pinning for all workloads in platform namespaces (openshift-manila-csi-driver)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the openshift-manila-csi-driver.

Namespaces with workloads that need pinning:

openshift-manila-csi-driver

See 4.17 PR for more info on what needs pinning.

https://github.com/openshift/csi-driver-manila-operator/pull/236

Bug OCPBUGS-42214: Failed to provision private HC on AWS

View the Description View the linked PRs

Description of problem:

Private HC provision failed on AWS.

How reproducible:

Always.

Steps to Reproduce:

Create a private HC on AWS following the steps in https://hypershift-docs.netlify.app/how-to/aws/deploy-aws-private-clusters/:

RELEASE_IMAGE=registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-20-005211
HO_IMAGE=quay.io/hypershift/hypershift-operator:latest
BUCKET_NAME=fxie-hcp-bucket
REGION=us-east-2
AWS_CREDS="$HOME/.aws/credentials"
CLUSTER_NAME=fxie-hcp-1
BASE_DOMAIN=qe.devcluster.openshift.com
EXT_DNS_DOMAIN=hypershift-ext.qe.devcluster.openshift.com
PULL_SECRET="/Users/fxie/Projects/hypershift/.dockerconfigjson"

hypershift install --oidc-storage-provider-s3-bucket-name $BUCKET_NAME --oidc-storage-provider-s3-credentials $AWS_CREDS --oidc-storage-provider-s3-region $REGION --private-platform AWS --aws-private-creds $AWS_CREDS --aws-private-region=$REGION --wait-until-available --hypershift-image $HO_IMAGE

hypershift create cluster aws --pull-secret=$PULL_SECRET --aws-creds=$AWS_CREDS --name=$CLUSTER_NAME --base-domain=$BASE_DOMAIN --node-pool-replicas=2 --region=$REGION --endpoint-access=Private --release-image=$RELEASE_IMAGE --generate-ssh

Additional info:

From the MC:
$ for k in $(oc get secret -n clusters-fxie-hcp-1 | grep -i kubeconfig | awk '{print $1}'); do echo $k; oc extract secret/$k -n clusters-fxie-hcp-1 --to - 2>/dev/null | grep -i 'server:'; done
admin-kubeconfig
    server: https://a621f63c3c65f4e459f2044b9521b5e9-082a734ef867f25a.elb.us-east-2.amazonaws.com:6443
aws-pod-identity-webhook-kubeconfig
    server: https://kube-apiserver:6443
bootstrap-kubeconfig
    server: https://api.fxie-hcp-1.hypershift.local:443
cloud-credential-operator-kubeconfig
    server: https://kube-apiserver:6443
dns-operator-kubeconfig
    server: https://kube-apiserver:6443
fxie-hcp-1-2bsct-kubeconfig
    server: https://kube-apiserver:6443
ingress-operator-kubeconfig
    server: https://kube-apiserver:6443
kube-controller-manager-kubeconfig
    server: https://kube-apiserver:6443
kube-scheduler-kubeconfig
    server: https://kube-apiserver:6443
localhost-kubeconfig
    server: https://localhost:6443
service-network-admin-kubeconfig
    server: https://kube-apiserver:6443

The bootstrap-kubeconfig uses an incorrect KAS port (should be 6443 since the KAS is exposed through LB), causing kubelet on each HC node to use the same incorrect port. As a result AWS VMs are provisioned but cannot join the HC as nodes.

From a bastion:
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 6443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.143.91:6443.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

Besides, the CNO also passes the wrong KAS port to Network components on the HC.

Same for HA proxy configuration on the VMs:

frontend local_apiserver
  bind 172.20.0.1:6443
  log global
  mode tcp
  option tcplog
  default_backend remote_apiserver

backend remote_apiserver
  mode tcp
  log global
  option httpchk GET /version
  option log-health-checks
  default-server inter 10s fall 3 rise 3
  server controlplane api.fxie-hcp-1.hypershift.local:443

https://github.com/openshift/hypershift/pull/4749

Bug OCPBUGS-36499: CNO backport for configurable subnets

View the linked PRs

https://github.com/openshift/cluster-network-operator/pull/2375

Task HOSTEDCP-1552: Update RHTAP tekton files for 0.3 -> 0.4 migration

View the Description View the linked PRs

Update the tekton files per the migration instructions for 4.14, 4.15, & 4.16 branches.

https://github.com/openshift/hypershift/pull/3957

Bug OCPBUGS-22199: Console login flow forgot query parameters / Deeplinking doesn't work

View the Description View the linked PRs

Description of problem:
The RHDP-Developer/DXP team wants to deep-link some catalog pages with a filter on the Developer Sandbox cluster. The target page was shown without any query parameter when the user wasn't logged in.

Version-Release number of selected component (if applicable):
At least 4.13 (Dev Sandbox clusters run 4.13.13 currently.)

How reproducible:
Always when not logged in

Steps to Reproduce:

Login
Switch to Developer perspective
Navigate to Add > Developer Catalog > Builder Images > Add filter for ".NET" (for example)
1. Users are applied to different clusters, so the exact URL isn't known, but the Path and Query parameters should look like this:
```
/catalog/ns/cjerolim-dev?catalogType=BuilderImage&keyword=.NET
```
2. Save the full URL incl. these query parameters
Logout
Enter the full URL from above
Login

Actual results:
The Developer Catalog is opened, but the catalog type "Build Images" and keyword filter ".NET" are not applied.

All Developer Catalog items are shown.

Expected results:
The Developer Catalog should open with the catalog type "Build Images" and the keyword filter ".NET" applied.

Exactly one catalog item should be shown.

Additional info:

https://github.com/openshift/console/pull/13268

Bug OCPBUGS-25346: Unable to use oc-mirror on RHEL9 Host with FIPS enabled OCP cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23550~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/764

Bug OCPBUGS-33619: EgressIP Healthcheck silently breaks 18 days after ovn-cert rotation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32203~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/2167

Bug OCPBUGS-34404: [4.15] geneve port not created for a set of nodes and causing POD to POD connectivity issue

View the Description View the linked PRs

Description of problem:

    Geneve port has not been created for a set of nodes.

~~~
[arghosh@supportshell-1 03826869]$ omg get nodes |grep -v NAME|wc -l
83
~~~
~~~
# crictl exec -ti `crictl ps --name nbdb -q` ovn-nbctl show transit_switch | grep tstor-prd-fc-shop09a | wc -l
73
# crictl exec -ti `crictl ps --name nbdb -q` ovn-sbctl list chassis | grep -c ^hostname
41
# ovs-appctl ofproto/list-tunnels | wc -l
40
~~~

Version-Release number of selected component (if applicable):

    4.14.17

How reproducible:

    Not Sure

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    POD to POD connectivity issue when PODs are hosted on different nodes

Expected results:

    POD to POD connectivity should work fine

Additional info:

    As per customer https://github.com/openshift/ovn-kubernetes/pull/2179 resolves the issue.

https://github.com/openshift/ovn-kubernetes/pull/2182

Bug OCPBUGS-22403: azure techpreview jobs are failing

View the Description View the linked PRs

Azure techpreview is permafail for about a week:

https://sippy.dptools.openshift.org/sippy-ng/jobs/4.15/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview%22%7D%5D%7D

Example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1717112156069040128

There's a pod stuck in image pull backoff

NAME                                                       READY   STATUS             RESTARTS        AGE
azureserviceoperator-controller-manager-6b8fc86684-qgrvc   0/2     ImagePullBackOff   0               6h54m
capi-controller-manager-6f96987c5c-zmkpc                   1/1     Running            0               6h54m
capi-operator-controller-manager-578b9bd48f-gkgzv          2/2     Running            1 (6h55m ago)   7h2m
capz-controller-manager-5c6cb77b99-sh98n                   1/1     Running            0               6h54m
cluster-capi-operator-5974b7684b-4qjwn                     1/1     Running            0               7h2m

  containerStatuses:
  - image: registry.ci.openshift.org/openshift:kube-rbac-proxy
    imageID: ""
    lastState: {}
    name: kube-rbac-proxy
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: Back-off pulling image "registry.ci.openshift.org/openshift:kube-rbac-proxy"
        reason: ImagePullBackOff
  - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8cc3384be7d81e745ce671c668465ceef75f65652354ce305d7bee3ae21a5976
    imageID: ""
    lastState: {}
    name: manager
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: secret "aso-controller-settings" not found
        reason: CreateContainerConfigError

https://github.com/openshift/cluster-capi-operator/pull/141

Bug OCPBUGS-30915: [4.15] CEO aliveness check should only detect deadlocks

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30873~~. The following is the description of the original issue:
—
Description of problem:

From a test run in [1] we can't be sure whether the call to etcd was really deadlocked or just waiting for a result.

Currently the CheckingSyncWrapper only defines "alive" as a sync func that has not returned an error. This can be wrong in scenarios where a member is down and perpetually not reachable. 
Instead, we wanted to detect deadlock situations where the sync loop is just stuck for a prolonged period of time.

[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-upgrade-from-stable-4.15-e2e-metal-ipi-upgrade-ovn-ipv6/1762965898773139456/

Version-Release number of selected component (if applicable):

>4.14

How reproducible:

Always

Steps to Reproduce:

    1. create a healthy cluster
    2. make sure one etcd member never responds, but the node is still there (ie kubelet shutdown, blocking the etcd ports on a firewall)
    3. wait for the CEO to restart pod on failing health probe and dump its stack

Actual results:

CEO controllers are returning errors, but might not deadlock, which currently results in a restart

Expected results:

CEO should mark the member as unhealthy and continue its service without getting deadlocked and should not restart its pod by failing the health probe

Additional info:

highly related to ~~OCPBUGS-30169~~

https://github.com/openshift/cluster-etcd-operator/pull/1225

Bug OCPBUGS-15599: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/image-customization-controller/pull/109

Bug OCPBUGS-19394: Document usage of kebab menu in the TableData component

View the Description View the linked PRs

Description of problem:

We should document how to preserve kebab menu in the TableData component when building a list page for a dynamic plugin.
Currently {className: "pf-c-table__action", id: ""} need to be set on the component in order for the column to be preserved, which is definitely not obvious for plugin creators.
There is also an upstream issue which should address this issue, either with making the setting more obvious or at least better documented.
Either way we should be documenting the current state in our docs/code/examples.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13177

Bug OCPBUGS-21924: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-23745: monitoring ClusterOperator should not blip Available=False on quick etcd leader changes

View the Description View the linked PRs

Description of problem:

Seen in 4.15 update CI:

: [bz-Monitoring] clusteroperator/monitoring should not change condition/Available expand_less
Run #0: Failed expand_less 1h16m1s
{ 1 unexpected clusteroperator state transitions during e2e test run

Nov 21 04:20:56.837 - 19s E clusteroperator/monitoring condition/Available reason/UpdatingPrometheusK8SFailed status/False reconciling Prometheus Federate Route failed: retrieving Route object failed: etcdserver: leader changed}

While the Kube API server is supposed to buffer clients from etcd leader transitions, an issue that only persists for 19s is not long enough to warrant immediate admin intervention. Teaching the monitoring operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required.

Version-Release number of selected component (if applicable):

A bunch of 4.15 jobs are impacted, almost all update jobs:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/monitoring+should+not+change+condition/Available&#39; | grep '^periodic-.*4[.]15.*failures match' | sort
periodic-ci-openshift-cluster-etcd-operator-release-4.15-periodics-e2e-aws-etcd-recovery (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-kubevirt-conformance (all) - 2 runs, 50% failed, 200% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 6 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 50 runs, 56% failed, 4% of failures match = 2% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-upgrade (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 80 runs, 44% failed, 9% of failures match = 4% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 30% failed, 17% of failures match = 5% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 43% failed, 38% of failures match = 16% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 52 runs, 15% failed, 175% of failures match = 27% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-sdn-upgrade (all) - 5 runs, 60% failed, 33% of failures match = 20% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial (all) - 5 runs, 100% failed, 40% of failures match = 40% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-ibmcloud-csi (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-sdn-bm-upgrade (all) - 5 runs, 100% failed, 20% of failures match = 20% impact

Hit rates are low enough there that I haven't checked older 4.y. I'm not sure if all of those hits are UpdatingPrometheusK8SFailed or not, it seems likely that Kube API hiccups could impact a number of control loops. And there may be other triggers going on besides Kube API hiccups.

How reproducible:

16% impact in periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade looks like the current largest impact percentage among the jobs with double-digit run counts.

Steps to Reproduce:

Run periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade or another job with a combination of high-ish impact percentage and high run counts, watching the monitoring ClusterOperator's Available condition.

Actual results:

Blips of Available=False that resolve more quickly than a responding admin could be expected to show up.

Expected results:

Only going Available=False when it seems reasonable to summon an emergency admin response.

Additional info:

I have no problem if folks decide to push for Kube API server / etcd perfection, but that seems like a hard goal to reach reliably in the mess of the real world, so even if you do push those folks for improvements, I think it makes sense to relax your response to those kinds of issues to only complain when things like Route object retrieval failures go on for long enough for the operator to be seriously

https://github.com/openshift/cluster-monitoring-operator/pull/2179

Bug OCPBUGS-26410: Explore making that remote write failure less intrusive

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22399~~. The following is the description of the original issue:
—

Users are encountering an issue when attempting to "Create hostedcluster on BM+disconnected+ipv6 through MCE." This issue is related to the default settings of `--enable-uwm-telemetry-remote-write` being true. Which might mean that that in the default case with disconnected and whatever is configured in the configmap for UWM e.g (
  minBackoff: 1s
url: https://infogw.api.openshift.com/metrics/v1/receive
Is not reachable with disconneced.

So we should look into reporting the issue and remdiating vs. Fataling on it for disconnected scenarios.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

In MCE 2.4, we currently document to disable `--enable-uwm-telemetry-remote-write` if the hosted control plane feature is used in a disconnected environment.

https://github.com/stolostron/rhacm-docs/blob/lahinson-acm-7739-disconnected-bare-[…]s/hosted_control_planes/monitor_user_workload_disconnected.adoc

Once this Jira is fixed, the documentation needs to be removed, users do not need to disable `--enable-uwm-telemetry-remote-write`. The HO is expected to fail gracefully on `--enable-uwm-telemetry-remote-write` and continue to be operational.

Bug OCPBUGS-31879: Separate oidc certificate authority and cluster certificate authority

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31498~~. The following is the description of the original issue:
—
Description of problem:

Separate oidc certificate authority and cluster certificate authority.

Version-Release number of selected component (if applicable):

oc 4.16 / 4.15

How reproducible:

Always

Steps to Reproduce:

1. Launch HCP external OIDC cluster. The external OIDC uses keycloak. The keycloak server is created outside of the cluster and its serving certificate is not trusted, its CA is separate than cluster's any CA.

2. Test oc login
$ curl -sSI --cacert $ISSUER_CA_FILE $ISSUER_URL/.well-known/openid-configuration | head -n 1
HTTP/1.1 200 OK

$ oc login --exec-plugin=oc-oidc --issuer-url=$ISSUER_URL --client-id=$CLI_CLIENT_ID --extra-scopes=email,profile --callback-port=8080 --certificate-authority $ISSUER_CA_FILE
The server uses a certificate signed by an unknown authority.
You can bypass the certificate check, but any data you send to the server could be intercepted by others.
Use insecure connections? (y/n): n

error: The server uses a certificate signed by unknown authority. You may need to use the --certificate-authority flag to provide the path to a certificate file for the certificate authority, or --insecure-skip-tls-verify to bypass the certificate check and use insecure connections.

Actual results:

2. oc login with --certificate-authority pointing to $ISSUER_CA_FILE fails.

The reason is, oc login not only communicates with the oidc server, but also communicates the test cluster's kube-apiserver which is also self signed. Need more action for the --certificate-authority flag, i.e. need combine test cluster's kube-apiserver's CA and $ISSUER_CA_FILE:
$ grep certificate-authority-data $KUBECONFIG | grep -Eo "[^ ]+$" | base64 -d > hostedcluster_kubeconfig_ca.crt

$ cat $ISSUER_CA_FILE hostedcluster_kubeconfig_ca.crt > combined-ca.crt
$ oc login --exec-plugin=oc-oidc --issuer-url=$ISSUER_URL --client-id=$CLI_CLIENT_ID --extra-scopes=email,profile --callback-port=8080 --certificate-authority combined-ca.crt
Please visit the following URL in your browser: http://localhost:8080

Expected results:

For step 2, per https://redhat-internal.slack.com/archives/C060D1W96LB/p1711624413149659?thread_ts=1710836566.326359&cid=C060D1W96LB discussion, separate trust like:

$ oc login api-server --oidc-certificate-auhority=$ISSUER_CA_FILE [--certificate-authority=hostedcluster_kubeconfig_ca.crt]

The [--certificate-authority=hostedcluster_kubeconfig_ca.crt] should be optional if it is included in $KUBECONFIG's certificate-authority-data already.

https://github.com/openshift/oc/pull/1730

Bug OCPBUGS-14819: CA bundles for hosted cluster monitoring not created

View the Description View the linked PRs

Description of problem:

alertmanager-trusted-ca-bundle, prometheus-trusted-ca-bundle, telemeter-trusted-ca-bundle, thanos-querier-trusted-ca-bundle are empty on the hosted cluster. This results in CMO not creating the prometheus CR, resulting in no prometheus pods. 

This issue prevents us from monitoring the hosted cluster.

Version-Release number of selected component (if applicable):

4.13.z

How reproducible:

Rare: Found only one occurence for now.

Steps to Reproduce:

1.
2.
3.

Actual results:

Certs are not created, prometheus doesn't create prometheus pods

Expected results:

Certs are created and CMO can create prometheus pods

Additional info:

Linked Must Gather of the MC, inspect of the openshift-monitoring DP namespace

Bug OCPBUGS-19175: Update 4.15 ose-cluster-kube-scheduler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-scheduler-operator/pull/493

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-scheduler-operator/pull/493

Bug OCPBUGS-36586: CSI pods are not restarted when changing enable_topology value

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35730~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-30949~~. The following is the description of the original issue:
—
Description of problem: After changing the value of enable_topology in the openshift-config/cloud-provider-config config map, the CSI controller pods should restart to pick up the new value. This is not happening.

It seems like our understanding in https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/127#issuecomment-1780967488 was wrong.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/175

Bug OCPBUGS-37659: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-vsphere/pull/71

Bug OCPBUGS-19176: Update 4.15 ose-service-ca-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/service-ca-operator/pull/221

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/service-ca-operator/pull/221

Bug OCPBUGS-23711: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/34

Bug OCPBUGS-30465: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4616

Bug OCPBUGS-45940: aws-sdk-go-v2 fails to authenticate AssumeRoleWithWebIdentity on AWS STS clusters

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-45939~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45938. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45937. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41727. The following is the description of the original issue:
—
Original bug title:

cert-manager [v1.15 Regression] Failed to issue certs with ACME Route53 dns01 solver in AWS STS env

Description of problem:

    When using Route53 as the dns01 solver to create certificates, it fails in both automated and manual tests. For the full log, please refer to the "Actual results" section.

Version-Release number of selected component (if applicable):

    cert-manager operator v1.15.0 staging build

How reproducible:

    Always

Steps to Reproduce: also documented in gist

    1. Install the cert-manager operator 1.15.0
    2. Follow the doc to auth operator with AWS STS using ccoctl: https://docs.openshift.com/container-platform/4.16/security/cert_manager_operator/cert-manager-authenticate.html#cert-manager-configure-cloud-credentials-aws-sts_cert-manager-authenticate
     3. Create a ACME issuer with Route53 dns01 solver
     4. Create a cert using the created issuer

OR:

Refer by running `/pj-rehearse pull-ci-openshift-cert-manager-operator-master-e2e-operator-aws-sts` on https://github.com/openshift/release/pull/59568

Actual results:

1. The certificate is not Ready.
2. The challenge of the cert is stuck in the pending status:

PresentError: Error presenting challenge: failed to change Route 53 record set: operation error Route 53: ChangeResourceRecordSets, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, failed to resolve service endpoint, endpoint rule error, Invalid Configuration: Missing Region

Expected results:

The certificate should be Ready. The challenge should succeed.

Additional info:

The only way to get it working again seems to be injecting the "AWS_REGION" environment variable into the controller pod. See upstream discussion/change:

I couldn't find a way to inject the env var into our operator-managed operands, so I only verified this workaround using the upstream build v1.15.3. After applying the patch with the following command, the challenge succeeded and the certificate became Ready.

oc patch deployment cert-manager -n cert-manager \
--patch '{"spec": {"template": {"spec": {"containers": [{"name": "cert-manager-controller", "env": [{"name": "AWS_REGION", "value": "aws-global"}]}]}}}}'

https://github.com/openshift/cloud-credential-operator/pull/802

Bug OCPBUGS-46080: Targeted deletion of Machine Does not Override deletePolicy

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-45998~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42414~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42388. The following is the description of the original issue:
—
Description of problem:

Customer MachineSets are configured with deletePolicy: Oldest, and they annotated a specific machine with machine.openshift.io/cluster-api-delete-machine="true" to target it for deletion during a scale-down.

However, instead of deleting the annotated machine, another machine (based on the Oldest policy) was selected for deletion.

Interestingly, when they removed the deletePolicy: Oldest from the MachineSet (which defaulted to deletePolicy: Random), the annotation was correctly honored, and the specific machine targeted was deleted during the scale-down operation.

Based on the documentation [1] , the annotation should override the deletePolicy, but this doesn’t seem to be happening when deletePolicy: Oldest is in place. The issue was resolved only when they removed the Oldest policy and allowed it to default to Random.
[1]. https://docs.openshift.com/container-platform/4.14/machine_management/manually-scaling-machineset.html#machineset-delete-policy_manually-scaling-machineset

OCP 4.14.33.

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-operator/pull/1314

Bug OCPBUGS-23120: [IBM ROKS] cluster-storage-operator does not set upgradeable=True

View the Description View the linked PRs

Description of problem:

There is a problem with IBM ROKS (managed service) running 4.14+

cluster-storage-operator never sets the upgradeable=True condition, so it shows up as Unknown:

  - lastTransitionTime: "2023-11-08T19:07:01Z"
    reason: NoData
    status: Unknown
    type: Upgradeable

This is a regression from 4.13.

In 4.13, pkg/operator/snapshotcrd/controller.go was the one that set `upgradeable: True`

    upgradeable := operatorapi.OperatorCondition{
        Type:   conditionsPrefix + operatorapi.OperatorStatusTypeUpgradeable,
        Status: operatorapi.ConditionTrue,
    }

In the 4.13 bundle from IBM ROKS, these two conditions are set in cluster-scoped-resources/operator.openshift.io/storages/cluster.yaml

  - lastTransitionTime: "2023-11-08T14:22:21Z"
    status: "True"
    type: SnapshotCRDControllerUpgradeable
  - lastTransitionTime: "2023-11-08T14:22:21Z"
    reason: AsExpected
    status: "False"
    type: SnapshotCRDControllerDegraded

So the SnapshotCRDController is running and sets `upgradeable: True` on 4.13.

But in the 4.14 bundle, SnapshotCRDController no longer exists.

https://github.com/openshift/cluster-storage-operator/pull/385/commits/fa9af3aad65b9d0e9c618453825e4defeaad59ac

So in 4.14+ it's pkg/operator/defaultstorageclass/controller.go that should set the condition

https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/defaultstorageclass/controller.go#L97-L100

But that only happens if `syncErr == unsupportedPlatformError`...
and not if `if syncErr == supportedByCSIError` like the case with the IBM VPC driver.

  - lastTransitionTime: "2023-11-08T14:22:23Z"
    message: 'DefaultStorageClassControllerAvailable: StorageClass provided by supplied
      CSI Driver instead of the cluster-storage-operator'
    reason: AsExpected
    status: "True"
    type: Available

So what controller will set `upgradeable: True` for IBM VPC?
IBM VPC uses this StatusFilter function for ROKS:

https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/csidriveroperator/csioperatorclient/ibm-vpc-block.go#L17-L27

ROKS and AzureStack are the only deployments using a StatusFilter function...
So shouldRunController returns false here because the platform is ROKS:

https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/csidriveroperator/driver_starter.go#L347-L349

Which means there is no controller to set `upgradeable: True`

Version-Release number of selected component (if applicable):

4.14.0+

How reproducible:

Always

Steps to Reproduce:

1. Install 4.14 via IBM ROKS
2. Check status conditions in cluster-scoped-resources/config.openshift.io/clusteroperators/storage.yaml

Actual results:

upgradeable=Unknown

Expected results:

upgradeable=True

Additional info:

4.13 IBM ROKS must-gather:
https://github.com/Joseph-Goergen/ibm-roks-toolkit/releases/download/test/must-gather-4.13.tar.gz

4.14 IBM ROKS must-gather: 
https://github.com/Joseph-Goergen/ibm-roks-toolkit/releases/download/test/must-gather.tar.gz

https://github.com/openshift/cluster-storage-operator/pull/417

Bug OCPBUGS-25734: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1193

Bug OCPBUGS-30281: Add support to disable machine management components

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30102~~. The following is the description of the original issue:
—
Description of problem:

    For high scalability, we need an option to disable unused machine management control plane components.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1. Create HostedCluster/HostedControlPlane
    2. 
    3.

Actual results:

    Machine management components (cluster-api, machine-approver, auto-scaler, etc) are deployed

Expected results:

    Should have option to disable as some use cases they provide no utility.

Additional info:

https://github.com/openshift/hypershift/pull/3670

Bug OCPBUGS-29370: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/461

Bug OCPBUGS-34703: [release-4.15] When editing a ConfigMap in the Form View from console, the value window/box is no longer resizable in OpenShift 4.15 versions.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34393~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34200~~. The following is the description of the original issue:
—
Description of problem:

 The value box in the ConfigMap Form view is no longer resizable. It is resizable as expected in  OCP version 4.14.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

OCP Console -> Administrator -> Workloads -> ConfigMaps -> Create ConfigMap -> Form view -> value

Actual results:

    Value window box should be resizable.

Expected results:

    It is not resizable anymore in 4.15 OpenShift Clusters.

Additional info:

https://github.com/openshift/console/pull/13913

Bug OCPBUGS-39112: [CI-Watcher] knative integration tests failing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34656~~. The following is the description of the original issue:
—
knative-ci.feature testi is failing with:

  Logging in as kubeadmin
      Installing operator: "Red Hat OpenShift Serverless"
      Operator Red Hat OpenShift Serverless was not yet installed.
      Performing Serverless post installation steps
      User has selected namespace knative-serving
  1) "before all" hook for "Create knative workload using Container image with extrenal registry on Add page: KN-05-TC05 (example #1)"

  0 passing (3m)
  1 failing

  1) Perform actions on knative service and revision
       "before all" hook for "Create knative workload using Container image with extrenal registry on Add page: KN-05-TC05 (example #1)":
     AssertionError: Timed out retrying after 40000ms: Expected to find element: `[title="knativeservings.operator.knative.dev"]`, but never found it.

Because this error occurred during a `before all` hook we are skipping all of the remaining tests.

Although you have test retries enabled, we do not retry tests when `before all` or `after all` hooks fail
      at createKnativeServing (webpack:////go/src/github.com/openshift/console/frontend/packages/dev-console/integration-tests/support/pages/functions/knativeSubscriptions.ts:15:5)
      at performPostInstallationSteps (webpack:////go/src/github.com/openshift/console/frontend/packages/dev-console/integration-tests/support/pages/functions/installOperatorOnCluster.ts:176:26)
      at verifyAndInstallOperator (webpack:////go/src/github.com/openshift/console/frontend/packages/dev-console/integration-tests/support/pages/functions/installOperatorOnCluster.ts:221:2)
      at verifyAndInstallKnativeOperator (webpack:////go/src/github.com/openshift/console/frontend/packages/dev-console/integration-tests/support/pages/functions/installOperatorOnCluster.ts:231:27)
      at Context.eval (webpack:///./support/commands/hooks.ts:7:33)



[mochawesome] Report JSON saved to /go/src/github.com/openshift/console/frontend/gui_test_screenshots/cypress_report_knative.json


  (Results)

  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ Tests:        16                                                                               │
  │ Passing:      0                                                                                │
  │ Failing:      1                                                                                │
  │ Pending:      0                                                                                │
  │ Skipped:      15                                                                               │
  │ Screenshots:  1                                                                                │
  │ Video:        true                                                                             │
  │ Duration:     3 minutes, 8 seconds                                                             │
  │ Spec Ran:     knative-ci.feature                                                               │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘


  (Screenshots)

  -  /go/src/github.com/openshift/console/frontend/gui_test_screenshots/cypress/scree     (1280x720)
     nshots/knative-ci.feature/Create knative workload using Container image with ext               
     renal registry on Add page KN-05-TC05 (example #1) -- before all hook (failed).p               
     ng

Serch link https://search.dptools.openshift.org/?search=Installing+operator%3A+%22Red+Hat+OpenShift+Serverless%22&maxAge=168h&context=1&type=build-log&name=pull-ci-openshift-console-release-4.15-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/console/pull/14197

Bug OCPBUGS-18162: [Multi-NIC]EgressIP was not correctly reassigned when label/unlabel egress node

View the Description View the linked PRs

Description of problem:

[Multi-NIC]EgressIP was not correctly reassigned when label/unlabel egress node

Version-Release number of selected component (if applicable):

Tested PR openshift/cluster-network-operator#1969,openshift/ovn-kubernetes#1832
together

How reproducible:

Steps to Reproduce:

1.  Label worker-0 node as egress node, and create one egressip object
# oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-0        172.22.0.100

2. Create another egressIP object, the egressIP located on  worker-0  as well.
# oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-0        172.22.0.100
egressip-2   172.22.0.101   worker-0        172.22.0.101

3. Checked secondary NIC on egress node, the two IPs were correctly added
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:da:86:9b:3e:ac brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.86/24 brd 172.22.0.255 scope global dynamic noprefixroute enp1s0
       valid_lft 96sec preferred_lft 96sec
    inet 172.22.0.100/32 scope global enp1s0ovn
       valid_lft forever preferred_lft forever
    inet 172.22.0.101/32 scope global enp1s0ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::2da:86ff:fe9b:3eac/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

4. Label another node worker-1 as egress node
5. Delete egressip-2 and recreated it, egressip-2 is on worker-1  
# oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-0        172.22.0.100
egressip-2   172.22.0.101   worker-1        172.22.0.101
6. Unlabel egress from worker-1, 172.22.0.101 was reassigned to  worker-0 
# oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   172.22.0.100   worker-0        172.22.0.100
egressip-2   172.22.0.101   worker-0        172.22.0.101

7, Check the  worker-0's and  worker-1' secondary NIC
3.

Actual results:

EgressIP was not removed from worker-1
# oc debug node/worker-1
Starting pod/worker-1-debug-pw7xk ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.111.24
If you don't see a command prompt, try pressing enter.
sh-4.4# ip a show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:da:86:9b:3e:b0 brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.90/24 brd 172.22.0.255 scope global dynamic noprefixroute enp1s0
       valid_lft 115sec preferred_lft 115sec
    inet 172.22.0.101/32 scope global enp1s0ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::2da:86ff:fe9b:3eb0/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

172.22.0.100 was missed from worker-0
# oc debug node/worker-0
Starting pod/worker-0-debug-8nz5f ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.111.23
If you don't see a command prompt, try pressing enter.
sh-4.4# ip a show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:da:86:9b:3e:ac brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.86/24 brd 172.22.0.255 scope global dynamic noprefixroute enp1s0
       valid_lft 68sec preferred_lft 68sec
    inet 172.22.0.101/32 scope global enp1s0ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::2da:86ff:fe9b:3eac/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Expected results:

The egressIP should be correctly reassigned to correct egress node

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1911

Bug OCPBUGS-21637: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPCLOUD-2169: Test in Origin suites that CPMS does not rollout

View the Description View the linked PRs

Background

The origin test suite does not test CPMS, so, it should never have a CPMS rollout occur during a run.

We should add a test that checks that, early in the suite, the control plane machines are all named <cluster-name>~~master~~<index>. If for any reason we see a control plane machine matching <cluster-name>~~master~~<random>-<index> we know that the CPMS has rolled out and the test should be aborted until we work out why the CPMS rolled out.

The hope here is that it becomes very obvious when there are issues with CPMS, even when these issues are introduced by other repositories.

Steps

Add test to origin as per above description
Discuss with TRT to make sure the test is running in regular and serial jobs and is early enough to be detect the issue if other failures may kill the job early
Tests will be added into https://github.com/openshift/origin/tree/master/test/extended/machines

Stakeholders

Cluster infra
Installer
TRT

Definition of Done

Origin tests fail with an obvious CPMS related issue when the CPMS attempts a rollout

Docs

<Add docs requirements for this card>

Testing

<Explain testing that will be added>

Bug OCPBUGS-32886: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-pod-identity-webhook/pull/188

Bug OCPBUGS-37452: PAC: PLRs log link is broken

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36620~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-30841~~. The following is the description of the original issue:
—
Description of problem:

    PAC provide the log link in git to see log of the PLR. Which is broken on 4.15 after this change https://github.com/openshift/console/pull/13470. This PR changes the log URL after react route package upgrade.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/14083

Bug OCPBUGS-18850: Update 4.15 golang-github-prometheus-node_exporter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/node_exporter/pull/131

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/node_exporter/pull/131

Bug OCPBUGS-22747: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13292

Bug OCPBUGS-24138: Update 4.15 cluster-version-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-version-operator/pull/1000

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-version-operator/pull/1000

Bug OCPBUGS-32871: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-controller-manager/pull/333

Bug OCPBUGS-18906: Remove dependency on k8s.io/kubernetes packages

View the Description View the linked PRs

Using packages from k8s.io/kubernetes is not supported: https://github.com/kubernetes/kubernetes/issues/79384#issuecomment-505627280

This came about in this slack thread: https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1694210392218409?thread_ts=1694207119.447459&cid=C02CZNQHGN8

https://github.com/openshift/machine-config-operator/pull/3913

Bug OCPBUGS-19024: remove duplicate metric for techpreview featuregate

View the Description View the linked PRs

When moving the controller, the existing wasn't removed.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1547

Bug OCPBUGS-24135: Update 4.15 ose-aws-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-aws/pull/58

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-aws/pull/58

Task HOSTEDCP-1312: Fix update-codegen.sh to work locally

View the linked PRs

https://github.com/openshift/hypershift/pull/3214

Bug OCPBUGS-32229: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3884

Bug OCPBUGS-30869: TaskRun with same name in different project don't show 2 entries when listing in all namespace

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30297~~. The following is the description of the original issue:
—
Description of problem:

    If there is a taskRun with same name in 2 different namespace, then in TaskRuns list page for All namespace, showing only one record due to same name

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1. Create TaskRun using https://gist.github.com/karthikjeeyar/eb1bbdf9157431f5c875eb55ce47580c in 2 different namespace
    2. Go to TaskRun list page
    3. Select All Projects

Actual results:

    Only one entry is shown

Expected results:

    Both entries should be visible

Additional info:

https://github.com/openshift/console/pull/13668

Bug OCPBUGS-35010: [4.16.z] SCC pinning for all workloads in platform namespaces

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34799~~. The following is the description of the original issue:
—
Backport of AUTH-482

https://github.com/openshift/machine-config-operator/pull/4393

Bug OCPBUGS-36151: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-config-operator)

View the Description View the linked PRs

Backport to 4.15 of ~~OCPBUGS-35007~~ specifically for the cluster-config-operator.

All workloads of the following namespaces need SCC pinning:

openshift-config-operator

https://github.com/openshift/cluster-config-operator/pull/420

Bug OCPBUGS-19909: Build timing tests failing due to faster run times

View the Description View the linked PRs

Description of problem:

Build timing test is failing due to faster run times on Bare Metal

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. run [sig-builds][Feature:Builds][timing] capture build stages and durations should record build stages and durations for docker 2.
3.

Actual results:

{  fail [github.com/openshift/origin/test/extended/builds/build_timing.go:101]: Stage PushImage ran for 95, expected greater than 100ms
Expected
    <bool>: true
to be false
Ginkgo exit error 1: exit with code 1}

Expected results:

Test should pass

Additional info:

https://github.com/openshift/origin/pull/28288

Bug OCPBUGS-21760: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kube-state-metrics/pull/100

Bug OCPBUGS-22541: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/70

Bug OCPBUGS-24109: Update 4.15 ose-ibm-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-ibm/pull/60

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-ibm/pull/60

Bug OCPBUGS-29983: image registry operator displays panic in status from move-blobs command

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29932~~. The following is the description of the original issue:
—
Description of problem:

    Sample job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-qe-ocp-qe-perfscale-ci-main-azure-4.15-nightly-x86-data-path-9nodes/1760228008968327168

Version-Release number of selected component (if applicable):

How reproducible:

    Anytime there is an error from the move-blobs command

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    An error message is shown

Expected results:

    A panic is shown followed by the error message

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1009

Bug OCPBUGS-24451: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-olm-operator/pull/36

Bug OCPBUGS-25684: pinned packages in ironic-agent-image breaks ART pipeline

View the Description View the linked PRs

because of the pin in the packages list the ART pipeline is rebuilding packages all the time
unfortunately we need to remove the strong pins and move back to relaxed ones

once that's done we need to merge https://github.com/openshift-eng/ocp-build-data/pull/4097

https://github.com/openshift/ironic-agent-image/pull/102

Bug OCPBUGS-24338: Update 4.15 ose-csi-external-snapshotter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/123

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/123

Bug OCPBUGS-34478: UI inconsistency in topology when application grouping is collapsed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34141~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33090~~. The following is the description of the original issue:
—
Description of problem:

When application grouping is unchecked in display filters under the expand section the topology display is distorted and Application name is also missing.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Have some deployments
    2. In topology unselect the application grouping in the display filter 
    3.

Actual results:

Topology shows distorted UI and Application name is missing.

Expected results:

UI should be in the correct condition and Apllication name should present.

Additional info:

Screenshot:

https://drive.google.com/file/d/1z80qLrr5v-K8ZFDa3P-n7SoDMaFtuxI7/view?usp=sharing

https://github.com/openshift/console/pull/13898

Bug OCPBUGS-28549: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/614

Bug OCPBUGS-41598: [FLAKE] e2e: upgrade CRD with deprecated version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41557~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-41498~~. The following is the description of the original issue:
—
Description of problem:

The e2e test "upgrade CRD with deprecated version" in the test/e2e/installplan_e2e_test.go suite is flaking

Version-Release number of selected component (if applicable):

How reproducible:

Hard to reproduce, could be related to other tests running at the same time, or any number of things.

Steps to Reproduce:

It might be worthwhile trying to re-rerun the test multiple times against a ClusterBot, or OpenShift Local, cluster

Actual results:

Expected results:

Additional info:

https://github.com/openshift/operator-framework-olm/pull/862

Bug MGMT-15559: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5445

Bug OCPBUGS-18113: CPMS failure domains should be omitted when a single failure domain is present

View the Description View the linked PRs

Description of problem:

When the installer generates a CPMS, it should only add the `failureDomains` field when there is more than one failure domain. When there is only one failure domain, the fields from the failure domain, eg the zone, should be injected directly into the provider spec and the failure domain should be omitted.

By doing this, we avoid having to care about failure domain injection logic for single zone clusters. Potentially avoiding bugs (such as some we have seen recently).

IIRC we already did this for OpenStack, but AWS, Azure and GCP may not be affected.

Version-Release number of selected component (if applicable):

How reproducible:

Can be demonstrated on Azure on the westus region which has no AZs available. Currently the installer creates the following, which we can omit entirely:
```
failureDomains:
  platform: Azure
  azure:
  - zone: ""
```

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7448

Bug OCPBUGS-18317: ovnkube-node requires namespaces/status permissions in interconnect

View the Description View the linked PRs

With IC ovnkube-node requires namespaces/status permissions.

After talking to Tim Rozet it seems that this is not necessary, we previously used that approach because ovnkube-node only listened for local pods it needs to know this information/event from a remote gateway pod. Now since ovnkube-node is watching all pods, it can just listen for the remote pod and then sync conntrack.

https://github.com/openshift/ovn-kubernetes/pull/1917

Bug OCPBUGS-18876: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7513

Bug OCPBUGS-19149: Update 4.15 ose-baremetal-installer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7494

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7494

Bug OCPBUGS-33780: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-network-config-controller/pull/145

Bug OCPBUGS-42480: Upgrade to 4.16 is blocked because root certificate has weak SHA1 signature algorithm

View the Description View the linked PRs

Description of problem:

Upgrade to OCP v4.16 is blocked because root certificate has weak SHA-1 signature algorithm

Actual results:

Upgrade is blocked

Expected results:

Upgrade should be possible because serving certificate has sha256WithRSAEncryption algorithm

Additional info:

In openshift v4.15 clusterversion is showing that cluster cannot upgrade because certificate contains weak SHA-1 algorithm for default cert,

~~~
    - lastTransitionTime: "2024-08-08T06:03:44Z"
      message: 'Cluster operator ingress should not be upgraded between minor versions:
        Some ingresscontrollers are not upgradeable: ingresscontroller "default" is
        not upgradeable: OperandsNotUpgradeable: One or more managed resources are
        not upgradeable: certificate in secret openshift-ingress/custom-certs-default
        has weak SHA1 signature algorithm: SHA1-RSA (see https://docs.openshift.com/container-platform/4.16/release_notes/ocp-4-16-release-notes.html#ocp-4-16-sha-haproxy-support-removed_release-notes
        for more details)'
      reason: IngressControllersNotUpgradeable
      status: "False"
      type: UpgradeableClusterOperators
~~~    

While checking the secret, there are 3 certificate present in cert chain and only 1 cert has SHA-1 as signature algorithm which is a root certificate. 

Serving cert of secret is usng sha256WithRSAEncryption.

https://github.com/openshift/cluster-ingress-operator/pull/1154

Bug OCPBUGS-26936: Image registry operator does not support new PowerVS regions

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26767~~. The following is the description of the original issue:
—
Description of problem:

[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc get co/image-registry
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry             False       True          True       50m     Available: The deployment does not exist...
[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc describe co/image-registry
...
    Message:               Progressing: Unable to apply resources: unable to sync storage configuration: cos region corresponding to a powervs region wdc not found
...

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-ppc64le-2024-01-10-083055

How reproducible:

Always

Steps to Reproduce:

    1. Deploy a PowerVS cluster in wdc06 zone

Actual results:

See above error message

Expected results:

Cluster deploys

https://github.com/openshift/cluster-image-registry-operator/pull/988

Bug OCPBUGS-26960: e2e test failure: [sig-network][Feature:EgressFirewall] when using openshift ovn-kubernetes should ensure egressfirewall is created"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18577~~. The following is the description of the original issue:
—
Description of problem:

Job link: powervs » zstreams » zstream-ocp4x-powervs-london06-p9-current-upgrade #282 Console [Jenkins] (ibm.com)

Must-gather link

long snippet from e2e log

external internet 09/01/23 07:26:09.624
Sep  1 07:26:09.624: INFO: Running 'oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 http://www.google.com:80'
STEP: creating an egressfirewall object 09/01/23 07:26:09.903
STEP: calling oc create -f /tmp/fixture-testdata-dir978363556/test/extended/testdata/egress-firewall/ovnk-egressfirewall-test.yaml 09/01/23 07:26:09.903
Sep  1 07:26:09.904: INFO: Running 'oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/root/.kube/config create -f /tmp/fixture-testdata-dir978363556/test/extended/testdata/egress-firewall/ovnk-egressfirewall-test.yaml'
egressfirewall.k8s.ovn.org/default createdSTEP: sending traffic to control plane nodes should work 09/01/23 07:26:22.122
Sep  1 07:26:22.130: INFO: Running 'oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 -k https://193.168.200.248:6443'
Sep  1 07:26:23.358: INFO: Error running /usr/local/bin/oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 -k https://193.168.200.248:6443:
StdOut>
command terminated with exit code 28
StdErr>
command terminated with exit code 28[AfterEach] [sig-network][Feature:EgressFirewall]
  github.com/openshift/origin/test/extended/util/client.go:180
STEP: Collecting events from namespace "e2e-test-egress-firewall-e2e-2vvzx". 09/01/23 07:26:23.358
STEP: Found 4 events. 09/01/23 07:26:23.361
Sep  1 07:26:23.361: INFO: At 2023-09-01 07:26:08 -0400 EDT - event for egressfirewall: {multus } AddedInterface: Add eth0 [10.131.0.89/23] from ovn-kubernetes
Sep  1 07:26:23.361: INFO: At 2023-09-01 07:26:08 -0400 EDT - event for egressfirewall: {kubelet lon06-worker-0.rdr-qe-ocp-upi-7250.redhat.com} Pulled: Container image "quay.io/openshift/community-e2e-images:e2e-quay-io-redhat-developer-nfs-server-1-1-dlXGfzrk5aNo8EjC" already present on machine
Sep  1 07:26:23.361: INFO: At 2023-09-01 07:26:08 -0400 EDT - event for egressfirewall: {kubelet lon06-worker-0.rdr-qe-ocp-upi-7250.redhat.com} Created: Created container egressfirewall-container
Sep  1 07:26:23.361: INFO: At 2023-09-01 07:26:08 -0400 EDT - event for egressfirewall: {kubelet lon06-worker-0.rdr-qe-ocp-upi-7250.redhat.com} Started: Started container egressfirewall-container
Sep  1 07:26:23.363: INFO: POD             NODE                                           PHASE    GRACE  CONDITIONS
Sep  1 07:26:23.363: INFO: egressfirewall  lon06-worker-0.rdr-qe-ocp-upi-7250.redhat.com  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2023-09-01 07:26:07 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2023-09-01 07:26:09 -0400 EDT  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2023-09-01 07:26:09 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2023-09-01 07:26:07 -0400 EDT  }]
Sep  1 07:26:23.363: INFO: 
Sep  1 07:26:23.367: INFO: skipping dumping cluster info - cluster too large
Sep  1 07:26:23.383: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-egress-firewall-e2e-2vvzx-user}, err: <nil>
Sep  1 07:26:23.398: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-egress-firewall-e2e-2vvzx}, err: <nil>
Sep  1 07:26:23.414: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  sha256~X_2HPGEj3O9hpd-3XKTckrp9bO23s_7zlJ3Tkn7ncBE}, err: <nil>
[AfterEach] [sig-network][Feature:EgressFirewall]
  github.com/openshift/origin/test/extended/util/client.go:180
STEP: Collecting events from namespace "e2e-test-no-egress-firewall-e2e-84f48". 09/01/23 07:26:23.414
STEP: Found 0 events. 09/01/23 07:26:23.416
Sep  1 07:26:23.417: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Sep  1 07:26:23.417: INFO: 
Sep  1 07:26:23.421: INFO: skipping dumping cluster info - cluster too large
Sep  1 07:26:23.446: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-no-egress-firewall-e2e-84f48-user}, err: <nil>
Sep  1 07:26:23.451: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-no-egress-firewall-e2e-84f48}, err: <nil>
Sep  1 07:26:23.457: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  sha256~2Lk8-jWfwpdyo59E9YF7kQFKH2LBUSvnbJdKj7rOzn4}, err: <nil>
[DeferCleanup (Each)] [sig-network][Feature:EgressFirewall]
  dump namespaces | framework.go:196
STEP: dump namespace information after failure 09/01/23 07:26:23.457
[DeferCleanup (Each)] [sig-network][Feature:EgressFirewall]
  tear down framework | framework.go:193
STEP: Destroying namespace "e2e-test-no-egress-firewall-e2e-84f48" for this suite. 09/01/23 07:26:23.457
[DeferCleanup (Each)] [sig-network][Feature:EgressFirewall]
  dump namespaces | framework.go:196
STEP: dump namespace information after failure 09/01/23 07:26:23.462
[DeferCleanup (Each)] [sig-network][Feature:EgressFirewall]
  tear down framework | framework.go:193
STEP: Destroying namespace "e2e-test-egress-firewall-e2e-2vvzx" for this suite. 09/01/23 07:26:23.463
fail [github.com/openshift/origin/test/extended/networking/egress_firewall.go:155]: Unexpected error:
    <*fmt.wrapError | 0xc001dd50a0>: {
        msg: "Error running /usr/local/bin/oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 -k https://193.168.200.248:6443:\nStdOut>\ncommand terminated with exit code 28\nStdErr>\ncommand terminated with exit code 28\nexit status 28\n",
        err: <*exec.ExitError | 0xc001dd5080>{
            ProcessState: {
                pid: 140483,
                status: 7168,
                rusage: {
                    Utime: {Sec: 0, Usec: 149480},
                    Stime: {Sec: 0, Usec: 19930},
                    Maxrss: 222592,
                    Ixrss: 0,
                    Idrss: 0,
                    Isrss: 0,
                    Minflt: 1536,
                    Majflt: 0,
                    Nswap: 0,
                    Inblock: 0,
                    Oublock: 0,
                    Msgsnd: 0,
                    Msgrcv: 0,
                    Nsignals: 0,
                    Nvcsw: 596,
                    Nivcsw: 173,
                },
            },
            Stderr: nil,
        },
    }
    Error running /usr/local/bin/oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 -k https://193.168.200.248:6443:
    StdOut>
    command terminated with exit code 28
    StdErr>
    command terminated with exit code 28
    exit status 28
    
occurred
Ginkgo exit error 1: exit with code 1failed: (18.7s) 2023-09-01T11:26:23 "[sig-network][Feature:EgressFirewall] when using openshift ovn-kubernetes should ensure egressfirewall is created  [Suite:openshift/conformance/parallel]"

Version-Release number of selected component (if applicable):

4.13.11

How reproducible:

This e2e failure is not consistently reproduceable.

Steps to Reproduce:

1.Start a Z stream Job via Jenkins
2.monitor e2e

Actual results:

e2e is getting failed

Expected results:

e2e should pass

Additional info:

https://github.com/openshift/origin/pull/28519

Bug OCPBUGS-31747: HostedCluster cannot recover from invalid release image

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31415~~. The following is the description of the original issue:
—
Description of problem:

 After setting an invalid release image on a HostedCluster, it is not possible to fix it by editing the HostedCluster and setting a valid release image.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Create a HostedCluster with an invalid release image
    2. Edit HostedCluster and specify a valid release image
    3.

Actual results:

    HostedCluster does not start using the new valid release image

Expected results:

    HostedCluster starts using the valid release image.

Additional info:

https://github.com/openshift/hypershift/pull/3839

Bug OCPBUGS-24132: Update 4.15 ose-oauth-apiserver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-apiserver/pull/93

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-apiserver/pull/93

Bug OCPBUGS-19129: Update 4.15 openshift-enterprise-cli image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1542

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1542

Bug OCPBUGS-20181: unit test job failure rates are high in oc

View the Description View the linked PRs

Description of problem:

unit test failures rates are high https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-oc-master-unit

TestNewAppRunAll/emptyDir_volumes is failing

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/1557/pull-ci-openshift-oc-master-unit/1710206848667226112

Version-Release number of selected component (if applicable):

How reproducible:

Run local or in CI and see that unit test job is failing

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1559

Bug OCPBUGS-24122: Update 4.15 ose-alibaba-cloud-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/alibaba-cloud-csi-driver/pull/42

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/alibaba-cloud-csi-driver/pull/42

Bug OCPBUGS-25351: [4.15] Number of clusters failing install on Ironic Inspection has increased with 502 proxy error in logs

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-agent-image/pull/99

Bug OCPBUGS-26927: Frequent SAST false positives

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26765~~. The following is the description of the original issue:
—
Description of problem:

The SAST scans keep coming up with bogus positive results from test and vendor files. This bug is just a placeholder to allow us to backport the change to ignore those files.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/293

Bug OCPBUGS-27421: flakiness in local/shared gateway migration jobs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27264~~. The following is the description of the original issue:
—
Description of problem:

The e2e-aws-ovn-shared-to-local-gateway-mode-migration and e2e-aws-ovn-local-to-shared-gateway-mode-migration jobs fail about 50% of the time with

+ oc patch Network.operator.openshift.io cluster --type=merge --patch '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":{"routingViaHost":false}}}}}'
network.operator.openshift.io/cluster patched
+ oc wait co network --for=condition=PROGRESSING=True --timeout=60s
error: timed out waiting for the condition on clusteroperators/network

https://github.com/openshift/cluster-network-operator/pull/2212

Bug OCPBUGS-30507: On SNO with DU profile(RT kernel) tuned profile is always degraded due to net.core.busy_read, net.core.busy_poll and kernel.numa_balancing sysctl not existing in RT kernel

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23167~~. The following is the description of the original issue:
—
Description of problem:

On SNO with DU profile(RT kernel) tuned profile is always degraded due to net.core.busy_read, net.core.busy_poll and kernel.numa_balancing sysctl not existing in RT kernel

Version-Release number of selected component (if applicable):

4.14.1

How reproducible:

100%

Steps to Reproduce:

1. Deploy SNO with DU profile(RT kernel)
2. Check tuned profile

Actual results:

oc -n openshift-cluster-node-tuning-operator get profile -o yaml
apiVersion: v1
items:
- apiVersion: tuned.openshift.io/v1
  kind: Profile
  metadata:
    creationTimestamp: "2023-11-09T18:26:34Z"
    generation: 2
    name: sno.kni-qe-1.lab.eng.rdu2.redhat.com
    namespace: openshift-cluster-node-tuning-operator
    ownerReferences:
    - apiVersion: tuned.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Tuned
      name: default
      uid: 4e7c05a2-537e-4212-9009-e2724938dec9
    resourceVersion: "287891"
    uid: 5f4d5819-8f84-4b3b-9340-3d38c41501ff
  spec:
    config:
      debug: false
      tunedConfig: {}
      tunedProfile: performance-patch
  status:
    conditions:
    - lastTransitionTime: "2023-11-09T18:26:39Z"
      message: TuneD profile applied.
      reason: AsExpected
      status: "True"
      type: Applied
    - lastTransitionTime: "2023-11-09T18:26:39Z"
      message: 'TuneD daemon issued one or more error message(s) during profile application.
        TuneD stderr: net.core.rps_default_mask'
      reason: TunedError
      status: "True"
      type: Degraded
    tunedProfile: performance-patch
kind: List
metadata:
  resourceVersion: ""

Expected results:

Not degraded

Additional info:

Looking at the tuned log the following errors show up which are probably causing the profile to get into degraded state:

2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.core.busy_read', the parameter does not exist
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: sysctl option net.core.busy_read will not be set, failed to read the original value.
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.core.busy_poll', the parameter does not exist
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: sysctl option net.core.busy_poll will not be set, failed to read the original value.
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'kernel.numa_balancing', the parameter does not exist
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: sysctl option kernel.numa_balancing will not be set, failed to read the original value.

These sysctl parameters seem not to be available with RT kernel.

https://github.com/openshift/cluster-node-tuning-operator/pull/984

Bug OCPBUGS-23878: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/141

Bug OCPBUGS-24145: Update 4.15 ose-cluster-openshift-apiserver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/560

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/560

Bug OCPBUGS-30605: OpenShift Document for AWS Cloudformation Template on Worker Nodes needs updated description

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29388~~. The following is the description of the original issue:
—
Description of problem:[link Worker CloudFormation Template|[Installing a cluster on AWS using CloudFormation templates - Installing on AWS | Installing | OpenShift Container Platform 4.13|https://docs.openshift.com/container-platform/4.13/installing/installing_aws/installing-aws-user-infra.html#installation-cloudformation-worker_installing-aws-user-infra]]

    In OpenShift Documentation under Manual AWS Cloudformation Templates. Within the cloudformation template for Worker Nodes. The description for Subnet and WorkerSecurityGroupId refer to the Master Nodes. Based on the variable names the descriptions should refer to Worker Nodes instead.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8125

Bug OCPBUGS-42611: Topology screen crashes when completed pod is selected

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42015~~. The following is the description of the original issue:

—
Description of problem:

Topology screen crashes and reports "Oh no! something went wrong" when a pod in completed state is selected.

Version-Release number of selected component (if applicable):

RHOCP 4.15.18

How reproducible:

100%

Steps to Reproduce:

1. Switch to developer mode
2. Select Topology
3. Select a project that has completed cron jobs like openshift-image-registry
4. Click the green CronJob Object
5. Observe Crash

Actual results:

The Topology screen crashes with error "Oh no! Something went wrong."

Expected results:

After clicking the completed pod / workload, the screen should display the information related to it.

Additional info:

https://github.com/openshift/console/pull/14353

Bug OCPBUGS-18326: CMO manifest lack capability annotations

View the Description View the linked PRs

The CVO managed manifest, that CMO ships lack capability annotations as defined in https://github.com/openshift/enhancements/blob/master/enhancements/installer/component-selection.md#manifest-annotations.

The dashboards should be tied to the console capability so that when CMO deploys on a cluster without the Console capability, CVO doesn't deploy the dashboards configmap.

Bug OCPBUGS-30275: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13648

Bug OCPBUGS-30927: Cluster-network-operator doesn't use node local kube-apiserver loadbalancer when templating in cluster resources

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30103~~. The following is the description of the original issue:
—
Description of problem:

The cluster-network-operator in hypershift when templating in cluster resources does not use the node local address of the client side haproxy load balancer that runs on all nodes. This bypasses a level of health checks for the backend redundant apiserver addresses that is performed by the local kube-apiserver-proxy pods that run on every node in a hypershift environment. In environments where the backend api servers are not fronted through an additional cloud load balancer: this leads to a percentage of request failures from the in cluster components occuring when a control plane endpoint goes down even if other endpoints are available.

Version-Release number of selected component (if applicable):

  4.16 4.15 4.14

How reproducible:

    100%

Steps to Reproduce:

    1. Setup a hypershift cluster in a baremetal/non cloud environment where there are redundant API servers behind a DNS that point directly to the node IPs.
    2. Power down one of the control plane nodes
    3. Schedule workload into cluster that depends on kube-proxy and/or multus to setup networking configuration
    4. You will see errors like the following 
```
add): Multus: [openshiftai/moe-8b-cmisale-master-0/9c1fd369-94f5-481c-a0de-ba81a3ee3583]: error getting pod: Get "https://[p9d81ad32fcdb92dbb598-6b64a6ccc9c596bf59a86625d8fa2202-c000.us-east.satellite.appdomain.cloud]:30026/api/v1/namespaces/openshiftai/pods/moe-8b-cmisale-master-0?timeout=1m0s": dial tcp 192.168.98.203:30026: connect: timeout
```

Actual results:

    When a control plane node fails intermittent timeouts occur when kube-proxy/multus resolve the dns and a failed control plane node ip is returned

Expected results:

    No requests fail (which will occur if all traffic is routed through the node local load balancer instance

Additional info:

    Additionally: control plane components in the management cluster that live next to the apiserver are adding uneeded dependencies by using an external DNS entry to talk to the kube-apiserver when it can use the local kube-apiserver address to have it all go over cluster local networking

https://github.com/openshift/cluster-network-operator/pull/2310

Bug OCPBUGS-35720: InstallPlan fails with "updated validation is too restrictive" when multiple CRD versions are served

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35373~~. The following is the description of the original issue:
—
Description of problem:

InstallPlan fails with "updated validation is too restrictive" when:

* Previous CRs and CRDs exist, and 
* Multiple CRD versions are served (ex. v1alpha1 and v1alpha2)

Version-Release number of selected component (if applicable):

This is reproducible on the OpenShift 4.15.3 rosa cluster, and not reproducible on 4.14.15 or 4.13.

How reproducible:

Always

Steps to Reproduce:

1.Create the following catalogsource and subscription
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: devworkspace-operator-catalog
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/devfile/devworkspace-operator-index:release
  publisher: Red Hat
  displayName: DevWorkspace Operator Catalog
  updateStrategy:
    registryPoll:
      interval: 5m
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  namespace: openshift-operators
  name: devworkspace-operator
spec:
  channel: fast
  installPlanApproval: Manual
  name: devworkspace-operator
  source: devworkspace-operator-catalog
  sourceNamespace: openshift-marketplace

2. Approve the installplan

3. Create a CR instance (DevWorkspace CR):
$ curl https://raw.githubusercontent.com/devfile/devworkspace-operator/main/samples/empty.yaml | kubectl apply -f - 

4. Delete the subscription and csv
$ oc project openshift-operators
$ oc delete sub devworkspace-operator
$ oc get csv
$ oc delete csv devworkspace-operator.v0.26.0 

5. Create the subscription from step 1 again, and approve the installplan

6. View the "updated validation is too restrictive" error in the installplan's status.conditions:
---
error validating existing CRs against new CRD's schema for "devworkspaces.workspace.devfile.io": error validating workspace.devfile.io/v1alpha1, Kind=DevWorkspace "openshift-operators/empty-devworkspace": updated validation is too restrictive: [].status.workspaceId: Required value
---

Actual results:

InstallPlan fails and the operator is not installed

Expected results:

InstallPlan succeeds

Additional info:

For this specific scenario, a workaround is to temporarily un-serve the v1alpha1 version before approving the installplan:


$ oc patch crd devworkspacetemplates.workspace.devfile.io --type='json' -p='[{"op": "replace", "path": "/spec/versions/0/served", "value": false}]'
$ oc patch crd devworkspaces.workspace.devfile.io --type='json' -p='[{"op": "replace", "path": "/spec/versions/0/served", "value": false}]'

Another workaround is to delete the existing CR before approving the new installplan.

https://github.com/openshift/operator-framework-olm/pull/789

Task MON-3502: Update node_exporter to v1.7.0

View the linked PRs

https://github.com/openshift/node_exporter/pull/139

Bug OCPBUGS-18141: disruption_tests: [sig-instrumentation] Prometheus metrics should be available after an upgrade failing

View the Description View the linked PRs

Description of problem:

I'm seeing Prometheus disruption failures in upgrade tests

Version-Release number of selected component (if applicable):

How reproducible:

Sporadically

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28228

Bug OCPBUGS-19095: Update 4.15 ose-cluster-olm-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-olm-operator/pull/31

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-olm-operator/pull/31

Bug OCPBUGS-5823: system:openshift:controller:service-serving-cert-controller referencing non existing serviceAccount

View the Description View the linked PRs

Description of problem:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.20   True        False         43h     Cluster version is 4.11.20

$ oc get clusterrolebinding system:openshift:controller:service-serving-cert-controller -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: "2023-01-11T13:19:24Z"
  name: system:openshift:controller:service-serving-cert-controller
  resourceVersion: "11410"
  uid: 8b3e8c56-9f25-4f89-9159-5300585cc129
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:controller:service-serving-cert-controller
subjects:
- kind: ServiceAccount
  name: service-serving-cert-controller
  namespace: openshift-infra

$ oc get sa service-serving-cert-controller -n openshift-infra
Error from server (NotFound): serviceaccounts "service-serving-cert-controller" not found

The serviceAccount service-serving-cert-controller does not exist. Neither in openshift-infra nor in any other namespace.

It's therefore not clear what this ClusterRoleBinding does, what use-case it does fulfill and why it references non existing serviceAccount.

From Security point of view, it's recommended to remove non serviceAccounts from ClusterRoleBindings as a potential attacker could abuse the current state by creating the necessary serviceAccount and gain undesired permissions.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4 (all version from what we have found)

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4
2. Run oc get clusterrolebinding system:openshift:controller:service-serving-cert-controller -o yaml

Actual results:

$ oc get clusterrolebinding system:openshift:controller:service-serving-cert-controller -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: "2023-01-11T13:19:24Z"
  name: system:openshift:controller:service-serving-cert-controller
  resourceVersion: "11410"
  uid: 8b3e8c56-9f25-4f89-9159-5300585cc129
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:controller:service-serving-cert-controller
subjects:
- kind: ServiceAccount
  name: service-serving-cert-controller
  namespace: openshift-infra

$ oc get sa service-serving-cert-controller -n openshift-infra
Error from server (NotFound): serviceaccounts "service-serving-cert-controller" not found

Expected results:

The serviceAccount called service-serving-cert-controller to exist or otherwise the ClusterRoleBinding to be removed.

Additional info:

Finding related to a Security review done on the OpenShift Container Platform 4 - Platform

https://github.com/openshift/openshift-apiserver/pull/388

Bug OCPBUGS-19666: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1724

Bug OCPBUGS-23108: Should reference configmaps instead of secrets

View the Description View the linked PRs

Description of problem:

Code calls secrets instead of configmaps

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource/pull/151

Bug OCPBUGS-30822: Number of configured control plane replicas should be validated

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25462~~. The following is the description of the original issue:
—
The number of control plane replicas defined in install-config.yaml (or agent-cluster-install.yaml) should be validated to check its set to 3, or 1 in the case of SNO. If set to another value the "create image" command should fail.

We recently had a case where the number of replicas was set to 2 and the installation failed. It would be good to catch this misconfiguration prior to the install.

https://github.com/openshift/installer/pull/8143

Bug OCPBUGS-34477: Import from Git allow users to import an app with Build option Pipeline also when no Pipeline is available

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34142~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-32476~~. The following is the description of the original issue:
—
Description of problem:
After installing the Pipelines Operator on a local cluster (OpenShift local), the Pipelines features was shown the Console.

But when selecting the Build option "Pipelines" a warning was shown:

The pipeline template for Dockerfiles is not available at this time.

Anyway it was possible to push the Create button and create a Deployment. But because there is no build process created, it couldn't successful start.

After ~20 min after the Pipeline operator says that it was successfully installed, the Pipeline templates in the openshift-pipelines namespaces appear, and I could create valid Deployment.

Version-Release number of selected component (if applicable):

OpenShift cluster 4.14.7
Pipelines operator 1.14.3

How reproducible:
Sometimes, maybe depending on the internet connection speed.

Steps to Reproduce:

Install OpenShift Local
Install Pipelines Opeartor
Import from Git and select Pipeline as option

Actual results:

Error message was shown: The pipeline template for Dockerfiles is not available at this time.
The user can create the Deployment anyway.

Expected results:

The error message is fine.
But as long as the error message is shown I would expect that the user can not click on Create.

Additional info:

https://github.com/openshift/console/pull/13897

Bug OCPBUGS-36084: The details of this Jira Card are restricted (Restricts access to project administrators and users who are involved in resolving the issue)

View the Description View the linked PRs

The details of this Jira Card are restricted (Restricts access to project administrators and users who are involved in resolving the issue)

https://github.com/openshift/machine-api-provider-ibmcloud/pull/54

Bug MGMT-15683: Custom Manifest - able to create with api manifest with empty name .yaml

View the Description View the linked PRs

Description of the problem:
I am able to create a custom manifest with name .yaml
I blieve API should block this

How reproducible:
Using test infra i create a manifest with .yaml filename

Steps to reproduce:

1. using v2_create_cluster_manifest i am able to create manifest with ".yaml " filename

2.

3.

Actual results:
manifest created , no error thrown and i am able to list the manifest and see it is applied to cluster

Expected results:
should throw 422 exception

https://github.com/openshift/assisted-service/pull/5635

Bug OCPBUGS-19247: Update 4.15 csi-node-driver-registrar image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-node-driver-registrar/pull/49

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-node-driver-registrar/pull/49

Bug OCPBUGS-25355: setting TLSSecurityProfile with no minTLSVersion crashes controller

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24226~~. The following is the description of the original issue:
—
Maxim Patlasov pointed this out in ~~STOR-1453~~ but still somehow we missed it. I tested this on 4.15.0-0.ci-2023-11-29-021749.

It is possible to set a custom TLSSecurityProfile without minTLSversion:

$ oc edit apiserver cluster
...
spec:
tlsSecurityProfile:
type: Custom
custom:
ciphers:
- ECDHE-ECDSA-CHACHA20-POLY1305
- ECDHE-ECDSA-AES128-GCM-SHA256

This causes the controller to crash loop:

$ oc get pods -n openshift-cluster-csi-drivers
NAME READY STATUS RESTARTS AGE
aws-ebs-csi-driver-controller-589c44468b-gjrs2 6/11 CrashLoopBackOff 10 (18s ago) 37s
...

because the `${TLS_MIN_VERSION}` placeholder is never replaced:

- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}

The observed config in the ClusterCSIDriver shows an empty string:

$ oc get clustercsidriver ebs.csi.aws.com -o json | jq .spec.observedConfig
{
"targetcsiconfig": {
"servingInfo":

{ "cipherSuites": [ "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256", "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256" ], "minTLSVersion": "" }

}
}

which means minTLSVersion is empty when we get to this line, and the string replacement is not done:

[https://github.com/openshift/library-go/blob/c7f15dcc10f5d0b89e8f4c5d50cd313ae158de20/pkg/operator/csi/csidrivercontrollerservicecontroller/helpers.go#L234]

So it seems we have a couple of options:

1) completely omit the --tls-min-version arg if minTLSVersion is empty, or
2) set --tls-min-version to the same default value we would use if TLSSecurityProfile is not present in the apiserver object

Bug OCPBUGS-19263: Update 4.15 ose-cluster-kube-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/747

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/747

Bug OCPBUGS-19376: [gcp] IPI installation using the service account attached to a GCP VM always fail with error "unable to parse credentials"

View the Description View the linked PRs

Description of problem:

IPI installation using the service account attached to a GCP VM always fail with error "unable to parse credentials"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-15-233408

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit install-config.yaml to insert "credentialsMode: Manual"
3. "create manifests"
4. manually create the required credentials and copy the manifests to installation-dir/manifests directory
5. launch the bastion host along with binding to the pre-configured service account ipi-on-bastion-sa@openshift-qe.iam.gserviceaccount.com and scopes being "cloud-platform"
6. copy the installation-dir and openshift-install to the bastion host
7. try "create cluster" on the bastion host

Actual results:

The installation failed on "Creating infrastructure resources"

Expected results:

The installation should succeed.

Additional info:

(1) FYI the 4.12 epic: https://issues.redhat.com/browse/CORS-2260

(2) 4.12.34 doesn't have the issue (Flexy-install/234112/). 

(3) 4.13.13 doesn’t have the issue (Flexy-install/234126/).

(4) The 4.14 errors (Flexy-install/234113/):
09-19 16:13:44.919  level=info msg=Consuming Master Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Consuming Bootstrap Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Consuming Worker Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Credentials loaded from gcloud CLI defaults
09-19 16:13:49.071  level=info msg=Creating infrastructure resources...
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=Error: unable to parse credentials
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=  with provider["openshift/local/google"],
09-19 16:13:50.950  level=error msg=  on main.tf line 10, in provider "google":
09-19 16:13:50.950  level=error msg=  10: provider "google" {
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=unexpected end of JSON input
09-19 16:13:50.950  level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "cluster" stage: failed to create cluster: failed to apply Terraform: exit status 1
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=Error: unable to parse credentials
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=  with provider["openshift/local/google"],
09-19 16:13:50.950  level=error msg=  on main.tf line 10, in provider "google":
09-19 16:13:50.950  level=error msg=  10: provider "google" {
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=unexpected end of JSON input
09-19 16:13:50.950  level=error

https://github.com/openshift/installer/pull/7519

Story GITOPS-3575: [DynamicPlugin] Set a flag to toggle static plugin

View the Description View the linked PRs

Story (Required)

As a developer trying to release GitOps Dynamic Plugin I want to have a flag to toggle static plugin so that it would be possible to backport to old static plugin.

Background (Required)

The reason of this ticket is that OCP will have a release where they leave the static plugin as a fallback.

Slack thread: https://redhat-internal.slack.com/archives/C011BL0FEKZ/p1698853635030619

Related to ~~GITOPS-2369~~: [DynamicPlugin] Remove static plugin from Console

Out of scope

<Defines what is not included in this story>

Approach (Required)

Set up a flag initialized by the dynamic plugin and disable the static plugin when the flag is set.

Dependencies

<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>

Acceptance Criteria (Mandatory)

Only one of static plugin and dynamic plugin will be displayed in console.

INVEST Checklist

Dependencies identified

Blockers noted and expected delivery timelines set

Design is implementable

Acceptance criteria agreed upon

Story estimated

Legend

Unknown

Verified

Unsatisfied

Done Checklist

Code is completed, reviewed, documented and checked in
Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
Continuous Delivery pipeline(s) is able to proceed with new code included
Customer facing documentation, API docs etc. are produced/updated, reviewed and published
Acceptance criteria are met

https://github.com/openshift/console/pull/13307

Bug OCPBUGS-19114: Update 4.15 csi-provisioner image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-provisioner/pull/69

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-provisioner/pull/69

Bug OCPBUGS-19233: Update 4.15 monitoring-plugin image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/monitoring-plugin/pull/75

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/monitoring-plugin/pull/75

Bug OCPBUGS-19244: Update 4.15 ose-azure-workload-identity-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-workload-identity/pull/6

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-workload-identity/pull/6

Bug OCPBUGS-24026: Installer TLS artifacts should have ownership annotations

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7749

Bug OCPBUGS-35837: multus-admission-controller does not preserve modified resource requests/limits

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31878~~. The following is the description of the original issue:
—
The multus-admission-controller does not retain its container resource requests/limits if manually set. The cluster-network-operator overwrites any modifications on the next reconciliation. This resource preservation support has already been added to all other components in https://github.com/openshift/hypershift/pull/1082 and https://github.com/openshift/hypershift/pull/3120. Similar changes should be made for the multus-admission-controller so all hosted control plane components demonstrate the same resource preservation behavior.

https://github.com/openshift/cluster-network-operator/pull/2420

Bug OCPBUGS-36355: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/1095

Bug OCPBUGS-31863: OpenShift vSphere Connection Configuration Does Not Appropriately Insert Escaped Strings

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25942~~. The following is the description of the original issue:
—
Recently a user was attempting to change the Virtual Machine Folder for a cluster installed on vSphere. The user used the configuration panel "vSphere Connection Configuration" to complete this process. Upon updating the path and clicking "Save Configuration" cluster wide issues emerged including nodes not coming back online after a reboot.

OpenShift nodes eventually crashed with an error resultant of an incorrectly parsed folder due to the string literal " " characters missing.

While this was exhibited on OCP 4.13, other versions may be affected.

https://github.com/openshift/console/pull/13737

Bug OCPBUGS-33368: [Jira:"NetworkEdge"] monitor test service-type-load-balancer-availability setup fails frequently in 4.14 & 4.15 PowerVS CI jobs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18534~~. The following is the description of the original issue:
—
Description of problem:

Following tests fails consistently in 4.14 powerVS runs

{}[Jira:"NetworkEdge"] XXXitor test service-type-load-balancer-availability setup

JobLink

Issue 1 analysis:

Error Description :

{ failed during setup error waiting for replicaset: failed waiting for pods to be running: timeout waiting for 2 pods to be ready}

Some Observations:

while creating a TCP service service-test with type=LoadBalancer for starting SimultaneousPodIPController it is failing to get loadbalancers details from cloud which is resulting to the error before starting data collection for e2e test and leading to the failure of test case "[Jira:"NetworkEdge"] XXXitor test service-type-load-balancer-availability setup".

https://github.com/openshift/origin/pull/28821

Bug OCPBUGS-35220: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4398

Story METAL-730: ironic-image sync 2023-10

View the Description View the linked PRs

this should happen after we add the ipv6 CI jobs

https://github.com/openshift/ironic-image/pull/407

Bug OCPBUGS-18761: revert "force cert rotation every couple days for development" in 4.15

View the Description View the linked PRs

Description of problem:

revert "force cert rotation every couple days for development" in 4.15

Below is the steps to verify this bug:

# oc adm release info --commits registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-06-25-081133|grep -i cluster-kube-apiserver-operator
  cluster-kube-apiserver-operator                https://github.com/openshift/cluster-kube-apiserver-operator                7764681777edfa3126981a0a1d390a6060a840a3

# git log --date local --pretty="%h %an %cd - %s" 776468 |grep -i "#1307"
08973b820 openshift-ci[bot] Thu Jun 23 22:40:08 2022 - Merge pull request #1307 from tkashem/revert-cert-rotation

# oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-25-081133   True        False         64m     Cluster version is 4.11.0-0.nightly-2022-06-25-081133

$ cat scripts/check_secret_expiry.sh
FILE="$1"
if [ ! -f "$1" ]; then
  echo "must provide \$1" && exit 0
fi
export IFS=$'\n'
for i in `cat "$FILE"`
do
  if `echo "$i" | grep "^#" > /dev/null`; then
    continue
  fi
  NS=`echo $i | cut -d ' ' -f 1`
  SECRET=`echo $i | cut -d ' ' -f 2`
  rm -f tls.crt; oc extract secret/$SECRET -n $NS --confirm > /dev/null
  echo "Check cert dates of $SECRET in project $NS:"
  openssl x509 -noout --dates -in tls.crt; echo
done

$ cat certs.txt
openshift-kube-controller-manager-operator csr-signer-signer
openshift-kube-controller-manager-operator csr-signer
openshift-kube-controller-manager kube-controller-manager-client-cert-key
openshift-kube-apiserver-operator aggregator-client-signer
openshift-kube-apiserver aggregator-client
openshift-kube-apiserver external-loadbalancer-serving-certkey
openshift-kube-apiserver internal-loadbalancer-serving-certkey
openshift-kube-apiserver service-network-serving-certkey
openshift-config-managed kube-controller-manager-client-cert-key
openshift-config-managed kube-scheduler-client-cert-key
openshift-kube-scheduler kube-scheduler-client-cert-key

Checking the Certs,  they are with one day expiry times, this is as expected.
# ./check_secret_expiry.sh certs.txt
Check cert dates of csr-signer-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:41:38 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of csr-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:52:21 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-kube-controller-manager:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator:
notBefore=Jun 27 04:41:37 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of aggregator-client in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of external-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of internal-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:49 2022 GMT
notAfter=Jul 27 04:52:50 2022 GMT

Check cert dates of service-network-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:28 2022 GMT
notAfter=Jul 27 04:52:29 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-kube-scheduler:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT
# 

# cat check_secret_expiry_within.sh
#!/usr/bin/env bash
# usage: ./check_secret_expiry_within.sh 1day # or 15min, 2days, 2day, 2month, 1year
WITHIN=${1:-24hours}
echo "Checking validity within $WITHIN ..."
oc get secret --insecure-skip-tls-verify -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | . != null and fromdateiso8601<='$( date --date="+$WITHIN" +%s )') | "\(.metadata.annotations."auth.openshift.io/certificate-not-before")  \(.metadata.annotations."auth.openshift.io/certificate-not-after")  \(.metadata.namespace)\t\(.metadata.name)"'

# ./check_secret_expiry_within.sh 1day
Checking validity within 1day ...
2022-06-27T04:41:37Z  2022-06-28T04:41:37Z  openshift-kube-apiserver-operator	aggregator-client-signer
2022-06-27T04:52:26Z  2022-06-28T04:41:37Z  openshift-kube-apiserver	aggregator-client
2022-06-27T04:52:21Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer
2022-06-27T04:41:38Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer-signer

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1598

Bug OCPBUGS-19103: Update 4.15 ose-sdn image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/sdn/pull/574

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/sdn/pull/574

Bug OCPBUGS-22839: Failed to create the sandbox-plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF [Release-4.15]

View the Description View the linked PRs

Description of problem:

In the 4.14 z-stream rollback job, I'm seeing test-case "[sig-network] pods should successfully create sandboxes by adding pod to network " fail. 

The job link is here https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-upgrade-rollback-oldest-supported/1719037590788640768

The error is:

56 failures to create the sandbox

ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-48-75.us-east-2.compute.internal - 3314.57 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_95d1a457-3e1b-4ae3-8b57-8023eec5937d_0(5b36bc12b2964e85bcdbe60b275d6a12ea68cb18b81f16622a6cb686270c4eb3): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF
ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-48-75.us-east-2.compute.internal - 3321.57 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_95d1a457-3e1b-4ae3-8b57-8023eec5937d_0(3cc0afc5bec362566e4c3bdaf822209377102c2e39aaa8ef5d99b0f4ba795aaf): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": dial unix /run/multus/socket/multus.sock: connect: connection refused

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-30-170011

How reproducible:

Flaky

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

The rollback test is testing by installing 4.14.0, then upgrade to the latest 4.14.nightly, at some random point, rolling back to 4.14.0

Bug OCPBUGS-29914: [release-4.15] Console should be using SelfSubjectReview API from frontend

View the Description View the linked PRs

Description of problem:

Currently console frontend and backend is using OpenShift centric UserKind type. In order for the console to work without OAuth server, iow. with. external OIDC it needs to use k8s UserInfo type, which is retrieved querying SelfSubjectReview API

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Console is not working with external OIDC provider

Expected results:

Console will be working with external OIDC provider

Additional info:

This is mainly an API change.

https://github.com/openshift/console/pull/13636

Bug OCPBUGS-17408: The InstallPlan has two duplicate items in the clusterServiceVersionNames array, which causes duplicate items to displayed on multiple pages in the console.

View the Description View the linked PRs

Description of problem:

An operator installPlan has duplicate key values for installPlan?.spec.clusterServiceVersionNames which is displayed in multiple pages in the management console.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-31-181848

How reproducible:

Always

Expected results:

In the screenshots linked below the clusterServiceVersionNames value should only display one item, but because their are duplicate key values it lists it twice.

Additional info:

This bug causes duplicate values to be shown in several pages of the Management Console. screenshots
https://drive.google.com/file/d/1OwiLXU8iETNusCf6N2AhB5y-ykXwgyBU/view?usp=drive_link

https://drive.google.com/file/d/1qfMso1x-s--samU7OmDKU-3NVfxqsxWD/view?usp=drive_link

https://drive.google.com/file/d/1Z9mGRllp4ZLN2OlSNKZY2QTIDx8QpyVS/view?usp=drive_link

https://drive.google.com/file/d/1CYWMpKy_KmUV_KfIxCjS1FAWHYbYA6rw/view?usp=drive_link

https://github.com/openshift/operator-framework-olm/pull/596

Bug OCPBUGS-21671: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-server/pull/137

Bug OCPBUGS-22284: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13266

Bug OCPBUGS-24084: Update 4.15 ose-gcp-pd-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver/pull/51

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver/pull/52

Bug OCPBUGS-19196: Update 4.15 operator-registry image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/563

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/563

Bug OCPBUGS-34350: Create Serverless form does not create BuildConfig

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34143~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-32405~~. The following is the description of the original issue:
—
Description of problem:

    When creating a serverless function in create serverless form, BuildConfig is not created

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1.Install Serverless operator
    2.Add https://github.com/openshift-dev-console/kn-func-node-cloudevents in create serverless form     
    3.Create the function and check BuildConfig page

Actual results:

    BuildConfig is not created

Expected results:

    Should create BuildConfig

Additional info:

https://github.com/openshift/console/pull/13891

Bug OCPBUGS-19256: Update 4.15 kube-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-state-metrics/pull/97

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-state-metrics/pull/97

Bug OCPBUGS-23787: After PatternFly5 update: Quickstarts catalog item count is not vertical aligned

View the Description View the linked PRs

Issue 58 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

Quickstarts catalog item count isn't vertical aligned anymore

Screenshot: https://drive.google.com/file/d/1hxh5VI2S7jLKRdNlDQsdlAXL_G7TxtME/view?usp=sharing

https://github.com/openshift/console/pull/13367

Bug OCPBUGS-24031: Bump FCOS to latest stable

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7779

Bug OCPBUGS-30098: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8089

Bug OCPBUGS-30615: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2304

Bug OCPBUGS-21745: Azure CCM unable to manage Load Balancer in Azure Managed Identity Installs

View the Description View the linked PRs

Description of problem:

Upon installing 4.14.0-rc.6 in a cluster with private load balancer publishing and existing vnets Service type LoadBalancers lack permissions necessary to sync.

Version-Release number of selected component (if applicable):

4.14.0-rc.6

How reproducible:

Seemingly 100%

Steps to Reproduce:

1. Install w/ azure Managed Identity into an existing vnet with private LB publishing
2.
3.

Actual results:

                One or more other status conditions indicate a degraded state: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 403, RawError: {"error":{"code":"AuthorizationFailed","message":"The client '194d5669-cb47-4199-a673-4b32a4a110be' with object id '194d5669-cb47-4199-a673-4b32a4a110be' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/subnets/read' over scope '/subscriptions/14b86a40-8d8f-4e69-abaf-42cbb0b8a331/resourceGroups/net/providers/Microsoft.Network/virtualNetworks/rnd-we-net/subnets/paas1' or the scope is invalid. If access was recently granted, please refresh your credentials."}}

Operators dependent on Ingress are failing as well.
authentication                             4.14.0-rc.6   False       False         True       149m    OAuthServerRouteEndpointAccessibleControllerAvailable: Get https://oauth-openshift.apps.cnb10161.rnd.westeurope.example.com/healthz: dial tcp: lookup oauth-openshift.apps.cnb10161.rnd.westeurope.example.com on 10.224.0.10:53: no such host (this is likely result of malfunctioning DNS server)
console                                    4.14.0-rc.6   False       True          False      142m    DeploymentAvailable: 0 replicas available for console deployment...

Expected results:

Successful install

Additional info:

The client ID in the error correspond to “openshift-cloud-controller-manager-azure-cloud-credentials” which indeed when checking its Azure managed identity only has access to cluster RG and not the network RG.

Additionally, they note that this permission is granted to the MAPI roles just not the CCM roles.

https://github.com/openshift/cloud-credential-operator/pull/607

Bug OCPBUGS-23140: install cannot be go on if the apiVIP and ingressVIP are same ip when using external LB

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7803

Bug OCPBUGS-32179: dual-stack UPI: network.yaml not compatible with different ansible versions

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27844~~. The following is the description of the original issue:
—
Description of problem:

The network resource provisioning playbook for 4.15 dualstack UPI contains a task for adding an IPv6 subnet to the existing external router [1].
This task fails with:
- ansible-2.9.27-1.el8ae.noarch & ansible-collections-openstack-1.8.0-2.20220513065417.5bb8312.el8ost.noarch in OSP 16 env (RHEL 8.5) or
- openstack-ansible-core-2.14.2-4.1.el9ost.x86_64 & ansible-collections-openstack-1.9.1-17.1.20230621074746.0e9a6f2.el9ost.noarch in OSP 17 env (RHEL 9.2)

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-22-160236

How reproducible:

Always

Steps to Reproduce:

1. Set the os_subnet6 in the inventory file for setting dualstack
2. Run the 4.15 network.yaml playbook

Actual results:

Playbook fails:
TASK [Add IPv6 subnet to the external router] ********************************** fatal: [localhost]: FAILED! => {"changed": false, "extra_data": {"data": null, "details": "Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}.", "response": "{\"NeutronError\": {\"type\": \"HTTPBadRequest\", \"message\": \"Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}.\", \"detail\": \"\"}}"}, "msg": "Error updating router 8352c9c0-dc39-46ed-94ed-c038f6987cad: Client Error for url: https://10.46.43.81:13696/v2.0/routers/8352c9c0-dc39-46ed-94ed-c038f6987cad, Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}."}

Expected results:

Successful playbook execution

Additional info:

The router can be created in two different tasks, the playbook [2] worked for me.

[1] https://github.com/openshift/installer/blob/1349161e2bb8606574696bf1e3bc20ae054e60f8/upi/openstack/network.yaml#L43
[2] https://file.rdu.redhat.com/juriarte/upi/network.yaml

https://github.com/openshift/installer/pull/8262

Bug OCPBUGS-18893: pods assigned with Multus whereabouts IP get stuck in ContainerCreating state after OCP upgrading

View the Description View the linked PRs

Description of problem:

pods assigned with Multus whereabouts IP get stuck in ContainerCreating state after OCP upgrading from 4.12.15 to 4.12.22. Not sure if upgrading cause the issue or node rebooting directly cause the issue.

The error message is:
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox mypod-0-0-1-0_testproject_8c8500e1-1643-4716-8fd7-e032292c62ab_0(2baa045a1b19291769ed56bab288b60802179ff3138ffe0d16a14e78f9cb5e4f): error adding pod testproject_mypod-0-0-1-0 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [testproject/mypod-0-0-1-0/8c8500e1-1643-4716-8fd7-e032292c62ab:testproject-net-svc-kernel-bond]: error adding container to network "testproject-net-svc-kernel-bond": error at storage engine: k8s get error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

Version-Release number of selected component (if applicable):

How reproducible:

Not sure if it is reproducible

Steps to Reproduce:

1.
2.
3.

Actual results:

Pods stuck in ContainerCreating state

Expected results:

Pods creates normally

Additional info:

Customer responded deleting statefulset and recreated it didn't work.
The pods can be created normally after deleting corresponding ippools.whereabouts.cni.cncf.io manually
$ oc delete ippools.whereabouts.cni.cncf.io 172.21.24.0-22 -n openshift-multus

Bug OCPBUGS-22930: Remove collapsible toggle for conditional update risk details

View the Description View the linked PRs

Description of problem:

When a user selects a supported-but-not-recommended update target, it's currently rendered as a DropdownWithSwitch that is collapsed by default. That forces the user to perform an extra click to see the message explaining the risk they are considering accepting. We should remove the toggle and always expand that message, because understanding the risk is a critical part of deciding whether you accept it.

Version-Release number of selected component (if applicable):

Since console landed support for conditional update risks. Not a big enough deal to backport that whole way.

How reproducible:

Every time.

Steps to Reproduce:

~~OTA-520~~ explains how to create dummy data for testing the conditional update UX pre-merge and/or on nightly builds that are not part of the usual channels yet.

Actual results:

Expected results:

but without the down-v, because the text should not be collapsible.

https://github.com/openshift/console/pull/13306

Bug OCPBUGS-29525: Default Internal Registry cleans custom images stored on it from 4.13 to 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29003~~. The following is the description of the original issue:
—
Description of problem:

After upgrade from 4.13.x to 4.14.10, the workload images that the customer stored inside the internal registry are lost, resulting the applications pods into error 
"Back-off pulling image".

Even when manually pulling with podman, it fails then with "manifest unknown" because the image cannot be found in the registry anymore.


- This behavior was found and reproduced 100% on ARO clusters, where the internal registry is by default backed up by the Storage Account created by the ARO RP service principal, which is the Containers blob service.

- I do not know if in non-managed Azure clusters or any other architecture the same behavior is found.

Version-Release number of selected component (if applicable):

4.14.10

How reproducible:

100% with an ARO cluster (Managed cluster)

Steps to Reproduce: Attached.

The workaround found so far is to rebuild the apps or re-import the images. But those tasks are lengthy and costly specially if it is a production cluster.

Bug OCPBUGS-4242: OCP 4.11 console UI is not consistent in showing what namespaces are managed

View the Description View the linked PRs

Description of problem:

OCP 4.11 console UI is not consistent in showing what namespaces are managed.

Below are the Results,  I have also attached the respective images,

1. Viewing installed operators for cp4i namespace shows the multi-namespace operators as managing All namespaces (but really these operators are restricted to 2 namespaces) ------>> Image multins-cp4i.png 
2. Viewing installed operators for ibm-common-services namespace shows the multi-namespace operators as managing 2 namespaces------>> image multins-ibm-cs.png
3. Viewing installed operators for All Projects shows the multi-namespace operators as managing 2 namespaces ---->> Image multins-all.p

Slack Thread: Slack Thread https://coreos.slack.com/archives/C6A3NV5J9/p1668535310411939

How reproducible:

 1.install operator into "cp4i" namespace (operator group is OwnNamespace with just "cp4i")

 2.install operator(s) into "ibm-common-services" namespace (operator group is OwnNamespace with just "ibm-common-services")

 3. edit the OperatorGroup in the "ibm-common-services" namespace and add the "cp4i" namespace -now the operators in "ibm-common-services" are included in both "ibm-common-services" and "cp4i" namespaces

 4. review the installed operators in the OCP 4.11 console for "cp4i", "ibm-common-services", and "All Projects"

Actual results:

Installed operators in cp4i project incorrectly shows Managed Namespaces as "All Namespaces". More can be seen in image----> multins-cp4i.png

Expected results:

Installed operators in cp4i project correctly shows Managed Namespaces

Additional info:

Slack Thread: Slack Thread https://coreos.slack.com/archives/C6A3NV5J9/p1668535310411939

https://github.com/openshift/console/pull/13194

Bug OCPBUGS-18122: ForceUpgradeTo Annotation should override current upgrade

View the Description View the linked PRs

Description of problem:

There is currently no way to interrupt a stuck HostedCluster upgrade because we don't allow another upgrade until the current upgrade is finished. At the very least we should allow overriding the upgrade with the ForceUpgradeTo annotation.

The function name doesn't honour the behaviour.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Install hosted cluster
2. Start upgrade to a bad release that will not complete
3. Attempt to override the current upgrade with a different release via annotation.

Actual results:

The override upgrade is not applied because the initial upgrade is not completed.

Expected results:

The override upgrade starts and completes successfully.

Additional info:

https://github.com/openshift/hypershift/blob/572a75655f0d86d6e2139f27e14eb1b168a5842b/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go#L4123-L4135

https://github.com/openshift/hypershift/pull/2955

Bug OCPBUGS-28889: VolumeSnapshots data is not displayed in PVC > VolumeSnapshots tab

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26236~~. The following is the description of the original issue:
—
Description of problem:

    VolumeSnapshots data  is not displayed in PVC >  VolumeSnapshots tab

Version-Release number of selected component (if applicable):

    4.16.0-0.ci-2024-01-05-050911

How reproducible:

Steps to Reproduce:

    1. Create a PVC i.e. "my-pvc"
    2. Create a Pod and bind it to the "my-pvc"
    3. Create a VolumeSnapshots and associate it with the "my-pvc"
    4. Goto to PVC detail > VolumeSnapshots tab

Actual results:

  VolumeSnapshots data  is not displayed in PVC >  VolumeSnapshots tab

Expected results:

 VolumeSnapshots data should be displayed in PVC >  VolumeSnapshots tab

Additional info:

https://github.com/openshift/console/pull/13570

Bug OCPBUGS-30011: Power VS: PlatformCredsCheck relies on endpoint that has been removed.

View the Description View the linked PRs

Description of problem:

    On February 27th endpoints were turned off that were being queried for account details. The check is not vital so we are fine with removing it, however it is currently blocking all Power VS installs.

Version-Release number of selected component (if applicable):

    4.13.0 - 4.16.0

How reproducible:

    Easily

Steps to Reproduce:

    1. Try to deploy with Power VS
    2. Fail at the platform credentials check

Actual results:

    Check fails

Expected results:

    Check should succeed

Additional info:

https://github.com/openshift/installer/pull/8076

Bug OCPBUGS-23131: IPv6 BMC cannot reach image on provisioning network

View the Description View the linked PRs

The final iteration (of 3) of the fix for ~~OCPBUGS-4248~~ - https://github.com/openshift/cluster-baremetal-operator/pull/341 - uses the (IPv6) API VIP as the IP address for IPv6 BMCs to contact Apache to download the image to mount via virtualmedia.

When the provisioning network is active, this should use the (IPv6) Provisioning VIP unless the virtualMediaViaExternalNetwork flag is true.

https://github.com/openshift/cluster-baremetal-operator/pull/380

Bug OCPBUGS-29217: Revision tab and routes tab in serving details page showing no resource found

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28718~~. The following is the description of the original issue:
—
Description of problem:

    In service details page, under Revision and Route tabs, user is able to see No resource found message although Revision and Route is created for that service

Version-Release number of selected component (if applicable):

    4.15.z

How reproducible:

    Always

Steps to Reproduce:

    1.Install serverless operator
    2.Create serving instance
    3.Create knative service/ function
    4.Go to details page

Actual results:

    User is not able to see Revision and Route created for the service

Expected results:

     User should be able to see Revision and Route created for the service

Additional info:

https://github.com/openshift/console/pull/13590

Bug OCPBUGS-29522: Operation cannot be fulfilled on networks.operator.openshift.io during OVN live migration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26492~~. The following is the description of the original issue:
—
Description of problem:

Operation cannot be fulfilled on networks.operator.openshift.io during OVN live migration

Version-Release number of selected component (if applicable):

How reproducible:

Not always

Steps to Reproduce:

1. Enable features of egressfirewall, externalIP,multicast, multus, network-policy, service-idle.
2. Start migrate SDN to OVN cluster

Actual results:

[weliang@weliang ~]$ oc delete validatingwebhookconfigurations.admissionregistration.k8s.io/sre-techpreviewnoupgrade-validation
validatingwebhookconfiguration.admissionregistration.k8s.io "sre-techpreviewnoupgrade-validation" deleted
[weliang@weliang ~]$ oc edit featuregate cluster
featuregate.config.openshift.io/cluster edited
[weliang@weliang ~]$ oc get node
NAME                          STATUS   ROLES                  AGE   VERSION
ip-10-0-20-154.ec2.internal   Ready    control-plane,master   86m   v1.28.5+9605db4
ip-10-0-45-93.ec2.internal    Ready    worker                 80m   v1.28.5+9605db4
ip-10-0-49-245.ec2.internal   Ready    worker                 74m   v1.28.5+9605db4
ip-10-0-57-37.ec2.internal    Ready    infra,worker           60m   v1.28.5+9605db4
ip-10-0-60-0.ec2.internal     Ready    infra,worker           60m   v1.28.5+9605db4
ip-10-0-62-121.ec2.internal   Ready    control-plane,master   86m   v1.28.5+9605db4
ip-10-0-62-56.ec2.internal    Ready    control-plane,master   86m   v1.28.5+9605db4
[weliang@weliang ~]$ for f in $(oc get nodes -o jsonpath='{.items[*].metadata.name}') ; do oc debug node/"${f}" --  chroot /host cat /etc/kubernetes/kubelet.conf | grep NetworkLiveMigration ; done
Starting pod/ip-10-0-20-154ec2internal-debug-9wvd8 ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-45-93ec2internal-debug-rwvls ...
To use host binaries, run `chroot /host`
    "NetworkLiveMigration": true,Removing debug pod ...
Starting pod/ip-10-0-49-245ec2internal-debug-rp9dt ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-57-37ec2internal-debug-q5thk ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-60-0ec2internal-debug-zp78h ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-62-121ec2internal-debug-42k2g ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-62-56ec2internal-debug-s99ls ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
[weliang@weliang ~]$ oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/live-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
network.config.openshift.io/cluster patched
[weliang@weliang ~]$ 
[weliang@weliang ~]$ oc get co network
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
network   4.15.0-0.nightly-2024-01-06-062415   True        False         True       4h1m    Internal error while updating operator configuration: could not apply (/, Kind=) /cluster, err: failed to apply / update (operator.openshift.io/v1, Kind=Network) /cluster: Operation cannot be fulfilled on networks.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
[weliang@weliang ~]$ oc get node
NAME                          STATUS   ROLES                  AGE     VERSION
ip-10-0-2-52.ec2.internal     Ready    worker                 3h54m   v1.28.5+9605db4
ip-10-0-26-16.ec2.internal    Ready    control-plane,master   4h2m    v1.28.5+9605db4
ip-10-0-32-116.ec2.internal   Ready    worker                 3h54m   v1.28.5+9605db4
ip-10-0-32-67.ec2.internal    Ready    infra,worker           3h38m   v1.28.5+9605db4
ip-10-0-35-11.ec2.internal    Ready    infra,worker           3h39m   v1.28.5+9605db4
ip-10-0-39-125.ec2.internal   Ready    control-plane,master   4h2m    v1.28.5+9605db4
ip-10-0-6-117.ec2.internal    Ready    control-plane,master   4h2m    v1.28.5+9605db4
[weliang@weliang ~]$ oc get Network.operator.openshift.io/cluster -o json
{
    "apiVersion": "operator.openshift.io/v1",
    "kind": "Network",
    "metadata": {
        "creationTimestamp": "2024-01-08T13:28:07Z",
        "generation": 417,
        "name": "cluster",
        "resourceVersion": "236888",
        "uid": "37fb36f0-c13c-476d-aea1-6ebc1c87abe8"
    },
    "spec": {
        "clusterNetwork": [
            {
                "cidr": "10.128.0.0/14",
                "hostPrefix": 23
            }
        ],
        "defaultNetwork": {
            "openshiftSDNConfig": {
                "enableUnidling": true,
                "mode": "NetworkPolicy",
                "mtu": 8951,
                "vxlanPort": 4789
            },
            "ovnKubernetesConfig": {
                "egressIPConfig": {},
                "gatewayConfig": {
                    "ipv4": {},
                    "ipv6": {},
                    "routingViaHost": false
                },
                "genevePort": 6081,
                "mtu": 8901,
                "policyAuditConfig": {
                    "destination": "null",
                    "maxFileSize": 50,
                    "maxLogFiles": 5,
                    "rateLimit": 20,
                    "syslogFacility": "local0"
                }
            },
            "type": "OVNKubernetes"
        },
        "deployKubeProxy": false,
        "disableMultiNetwork": false,
        "disableNetworkDiagnostics": false,
        "kubeProxyConfig": {
            "bindAddress": "0.0.0.0"
        },
        "logLevel": "Normal",
        "managementState": "Managed",
        "migration": {
            "mode": "Live",
            "networkType": "OVNKubernetes"
        },
        "observedConfig": null,
        "operatorLogLevel": "Normal",
        "serviceNetwork": [
            "172.30.0.0/16"
        ],
        "unsupportedConfigOverrides": null,
        "useMultiNetworkPolicy": false
    },
    "status": {
        "conditions": [
            {
                "lastTransitionTime": "2024-01-08T13:28:07Z",
                "status": "False",
                "type": "ManagementStateDegraded"
            },
            {
                "lastTransitionTime": "2024-01-08T17:29:52Z",
                "status": "False",
                "type": "Degraded"
            },
            {
                "lastTransitionTime": "2024-01-08T13:28:07Z",
                "status": "True",
                "type": "Upgradeable"
            },
            {
                "lastTransitionTime": "2024-01-08T17:26:38Z",
                "status": "False",
                "type": "Progressing"
            },
            {
                "lastTransitionTime": "2024-01-08T13:28:20Z",
                "status": "True",
                "type": "Available"
            }
        ],
        "readyReplicas": 0,
        "version": "4.15.0-0.nightly-2024-01-06-062415"
    }
}
[weliang@weliang ~]$

Expected results:

OVN live migration pass

Additional info:

must-gather: https://people.redhat.com/~weliang/must-gather1.tar.gz

https://github.com/openshift/cluster-network-operator/pull/2275

Bug MGMT-15926: [STG][OLM] Assisted Installer failed to install MCE operator on Multi Node cluster

View the Description View the linked PRs

Some operators failed to install
Multicluster engine (MCE) failed to install. Due to this, the cluster will be degraded, but you can try to install the operator from the Operator Hub. Please check the installation log for more information.

OpenShift version 4.14.0-rc.4

While installed successfully on OpenShift version 4.13.13

Steps to reproduce:
1. Create cluster on AI SaaS version OCP 4.14.0-rc.4
2. Select MCE operator
3. Continue settings and start installaiton

Actual results:
Cluster installed but
Operators
Multicluster engine failed

Expected results:
Operators
Multicluster engine installed

https://github.com/openshift/assisted-installer/pull/748

Bug OCPBUGS-19229: Update 4.15 ose-gcp-pd-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver-operator/pull/85

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/85

Bug OCPBUGS-25140: [release-4.15] Node Overview Pane not displaying

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24408~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13435

Bug OCPBUGS-32989: multus-admission-controller stuck in CrashLoopBackOff when egress IPs are created at scale [4.15]

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2379

Bug OCPBUGS-45330: Bump golang.org/x/net to 0.31.0

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-45329~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-45328~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45327. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45324. The following is the description of the original issue:
—
golang.org/x/net is a CVE-prone dependency, and even if we are not actually exposed to some issues, carrying an old dep exposes us to version-based vulnerability scanners.

https://github.com/openshift/cluster-version-operator/pull/1119

Bug OCPBUGS-18304: vsphere IPI: missing guestinfo.domain in bootstrap VM

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/installer/pull/6770 reverted part of https://github.com/openshift/installer/pull/5788 which has set guestinfo.domain for bootstrap machine. This breaks some OKD installations, which require that setting

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7451

Bug OCPBUGS-25236: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/68

Bug OCPBUGS-34596: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-37464: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-samples-operator)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the cluster-samples-operator.

Namespaces with workloads that need pinning:

openshift-cluster-samples-operator

See 4.16 PR for more info on what needs pinning.

https://github.com/openshift/cluster-samples-operator/pull/548

Bug OCPBUGS-29338: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/299

Bug OCPBUGS-29390: IPSec - ovn-ipsec-containerized ds typo

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29305~~. The following is the description of the original issue:
—
Description of problem:

There's a typo in the openssl commands within the ovn-ipsec-containerized/ovn-ipsec-host daemonsets. The correct parameter is "-checkend", not "-checkedn".

Version-Release number of selected component (if applicable):

# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.10   True        False         7s      Cluster version is 4.14.10

How reproducible:

Steps to Reproduce:

1. Enable IPsec encryption

# oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec": 
 {"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'

Actual results:

Examining the initContainer (ovn-keys) logs

# oc logs ovn-ipsec-containerized-7bcd2 -c ovn-keys
...
+ openssl x509 -noout -dates -checkedn 15770000 -in /etc/openvswitch/keys/ipsec-cert.pem
x509: Use -help for summary.

# oc get ds
NAME                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
ovn-ipsec-containerized   1         1         0       1            0           beta.kubernetes.io/os=linux   159m
ovn-ipsec-host            1         1         1       1            1           beta.kubernetes.io/os=linux   159m
ovnkube-node              1         1         1       1            1           beta.kubernetes.io/os=linux   3h44m

# oc get ds ovn-ipsec-containerized -o yaml | grep edn
if ! openssl x509 -noout -dates -checkedn 15770000 -in $cert_pem; then     

# oc get ds ovn-ipsec-host -o yaml | grep edn
if ! openssl x509 -noout -dates -checkedn 15770000 -in $cert_pem; then

https://github.com/openshift/cluster-network-operator/pull/2272

Bug OCPBUGS-30078: [4.15] okd build ironic-agent-image job is failing

View the Description View the linked PRs

the okd build image job in ironic-agent-image is failing with the error message

Complete!
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    14  100    14    0     0     73      0 --:--:-- --:--:-- --:--:--    73
  File "<stdin>", line 1
    404: Not Found
    ^
SyntaxError: illegal target for annotation
INFO[2024-02-29T08:06:27Z] Ran for 4m3s                                 
ERRO[2024-02-29T08:06:27Z] Some steps failed:                           
ERRO[2024-02-29T08:06:27Z] 
  * could not run steps: step ironic-agent failed: error occurred handling build ironic-agent-amd64: the build ironic-agent-amd64 failed after 1m57s with reason DockerBuildFailed: Dockerfile build strategy has failed. 
INFO[2024-02-29T08:06:27Z] Reporting job state 'failed' with reason 'executing_graph:step_failed:building_project_image'

https://github.com/openshift/ironic-agent-image/pull/115

Bug OCPBUGS-18754: tuned pod in the guest cluster uses control plane release image after controlplane release upgrade

View the Description View the linked PRs

Description of problem:

After control plane release upgrade, in the guest cluster pod 'tuned' uses control plane release image

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. create a cluster in 4.14.0-0.ci-2023-09-06-180503
2. control plane release upgrade to 4.14-2023-09-07-180503
3. in the guest cluster check container image in pod tuned

Actual results:

pod tuned uses control plane release image 4.14-2023-09-07-180503

Expected results:

pod tuned uses release image 4.14.0-0.ci-2023-09-06-180503

Additional info:

After controlplane release upgrade, in control plane namespace, cluster-node-tuning-operator uses control plane release image:

jiezhao-mac:hypershift jiezhao$ oc get pods cluster-node-tuning-operator-6dc549ffdf-jhj2k -n clusters-jie-test -ojsonpath='{.spec.containers[].name}{"\n"}'
cluster-node-tuning-operator
jiezhao-mac:hypershift jiezhao$ oc get pods cluster-node-tuning-operator-6dc549ffdf-jhj2k -n clusters-jie-test -ojsonpath='{.spec.containers[].image}{"\n"}'
registry.ci.openshift.org/ocp/4.14-2023-09-07-180503@sha256:60bd6e2e8db761fb4b3b9d68c1da16bf0371343e3df8e72e12a2502640173990

https://github.com/openshift/hypershift/pull/3003

Bug OCPBUGS-20033: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3113

Bug OCPBUGS-22655: Bump FCOS image to latest stable

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7644

Bug OCPBUGS-24108: Update 4.15 baremetal-machine-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/205

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/205

Bug OCPBUGS-26210: LB not getting External-IP

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25483~~. The following is the description of the original issue:
—
Description of problem:

A regression was identified creating LoadBalancer services in ARO in new 4.14 clusters (handled for new installations in OCPBUGS-24191)

The same regression has been also confirmed in ARO clusters upgraded to 4.14

Version-Release number of selected component (if applicable):

4.14.z

How reproducible:

On any ARO cluster upgraded to 4.14.z

Steps to Reproduce:

    1. Install an ARO cluster
    2. Upgrade to 4.14 from fast channel
    3. oc create svc loadbalancer test-lb -n default --tcp 80:8080

Actual results:

# External-IP stuck in Pending
$ oc get svc test-lb -n default
NAME      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
test-lb   LoadBalancer   172.30.104.200   <pending>     80:30062/TCP   15m


# Errors in cloud-controller-manager being unable to map VM to nodes
$ oc logs -l infrastructure.openshift.io/cloud-controller-manager=Azure  -n openshift-cloud-controller-manager
I1215 19:34:51.843715       1 azure_loadbalancer.go:1533] reconcileLoadBalancer for service(default/test-lb) - wantLb(true): started
I1215 19:34:51.844474       1 event.go:307] "Event occurred" object="default/test-lb" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I1215 19:34:52.253569       1 azure_loadbalancer_repo.go:73] LoadBalancerClient.List(aro-r5iks3dh) success
I1215 19:34:52.253632       1 azure_loadbalancer.go:1557] reconcileLoadBalancer for service(default/test-lb): lb(aro-r5iks3dh/mabad-test-74km6) wantLb(true) resolved load balancer name
I1215 19:34:52.528579       1 azure_vmssflex_cache.go:162] Could not find node () in the existing cache. Forcely freshing the cache to check again...
E1215 19:34:52.714678       1 azure_vmssflex.go:379] fs.GetNodeNameByIPConfigurationID(/subscriptions/fe16a035-e540-4ab7-80d9-373fa9a3d6ae/resourceGroups/aro-r5iks3dh/providers/Microsoft.Network/networkInterfaces/mabad-test-74km6-master0-nic/ipConfigurations/pipConfig) failed. Error: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0
E1215 19:34:52.714888       1 azure_loadbalancer.go:126] reconcileLoadBalancer(default/test-lb) failed: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0
I1215 19:34:52.714956       1 azure_metrics.go:115] "Observed Request Latency" latency_seconds=0.871261893 request="services_ensure_loadbalancer" resource_group="aro-r5iks3dh" subscription_id="fe16a035-e540-4ab7-80d9-373fa9a3d6ae" source="default/test-lb" result_code="failed_ensure_loadbalancer"
E1215 19:34:52.715005       1 controller.go:291] error processing service default/test-lb (will retry): failed to ensure load balancer: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0

Expected results:

# The LoadBalancer gets an External-IP assigned
$ oc get svc test-lb -n default 
NAME         TYPE           CLUSTER-IP       EXTERNAL-IP                            PORT(S)        AGE 
test-lb      LoadBalancer   172.30.193.159   20.242.180.199                         80:31475/TCP   14s

Additional info:

In cloud-provider-config cm in openshift-config namespace, vmType=""

When vmType gets changed to "standard" explicitly, the provisioning of the LoadBalancer completes and an ExternalIP gets assigned without errors.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/319

Bug OCPBUGS-18863: Update 4.15 ironic-rhcos-downloader image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-rhcos-downloader/pull/93

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-rhcos-downloader/pull/93

Bug OCPBUGS-19174: Update 4.15 prometheus-config-reloader image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/243

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/243

Bug OCPBUGS-24085: Update 4.15 ose-cluster-kube-cluster-api-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-operator/pull/31

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-operator/pull/31

Bug OCPBUGS-30147: operator-lifecycle-manager-packageserver ClusterOperator should not blip Available=False on 4.14 to 4.15 updates

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23744~~. The following is the description of the original issue:
—

Description of problem:

Seen in 4.14 to 4.15 update CI:

: [bz-OLM] clusteroperator/operator-lifecycle-manager-packageserver should not change condition/Available expand_less
Run #0: Failed expand_less	1h34m55s
{  1 unexpected clusteroperator state transitions during e2e test run 

Nov 22 21:48:41.624 - 56ms  E clusteroperator/operator-lifecycle-manager-packageserver condition/Available reason/ClusterServiceVersionNotSucceeded status/False ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: APIServiceInstallFailed, message: APIService install failed: forbidden: User "system:anonymous" cannot get path "/apis/packages.operators.coreos.com/v1"}

While a brief auth failure isn't fantastic, an issue that only persists for 56ms is not long enough to warrant immediate admin intervention. Teaching the operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required. It's also possible that this is an incoming-RBAC vs. outgoing-RBAC race of some sort, and that shifting manifest filenames around could avoid the hiccup entirely.

Version-Release number of selected component (if applicable):

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/operator-lifecycle-manager-packageserver+should+not+change+condition/Available' | grep '^periodic-.*4[.]15.*failures match' | sort
periodic-ci-openshift-cluster-etcd-operator-release-4.15-periodics-e2e-aws-etcd-recovery (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-aws-ovn-arm64 (all) - 8 runs, 38% failed, 33% of failures match = 13% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 5 runs, 20% failed, 400% of failures match = 80% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-ppc64le (all) - 6 runs, 67% failed, 75% of failures match = 50% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 6 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-sdn-arm64 (all) - 5 runs, 20% failed, 300% of failures match = 60% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 5 runs, 40% failed, 100% of failures match = 40% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 43 runs, 51% failed, 36% of failures match = 19% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-upgrade (all) - 5 runs, 20% failed, 300% of failures match = 60% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 80 runs, 44% failed, 17% of failures match = 8% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 30% failed, 63% of failures match = 19% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-uwm (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 8 runs, 25% failed, 200% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 43% failed, 50% of failures match = 21% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 50 runs, 16% failed, 50% of failures match = 8% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-vsphere-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-from-stable-4.13-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial (all) - 5 runs, 100% failed, 80% of failures match = 80% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-upgrade-rollback-oldest-supported (all) - 4 runs, 25% failed, 100% of failures match = 25% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 50 runs, 18% failed, 178% of failures match = 32% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-sdn-bm-upgrade (all) - 6 runs, 83% failed, 20% of failures match = 17% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 6 runs, 83% failed, 60% of failures match = 50% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade-paused (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 6 runs, 17% failed, 100% of failures match = 17% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-sdn-bm-upgrade (all) - 5 runs, 100% failed, 40% of failures match = 40% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 6 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-okd-4.15-e2e-aws-ovn-upgrade (all) - 19 runs, 63% failed, 33% of failures match = 21% impact
periodic-ci-openshift-release-master-okd-scos-4.15-e2e-aws-ovn-upgrade (all) - 15 runs, 47% failed, 57% of failures match = 27% impact

I'm not sure if all of those are from this system:anonymous issue, or if some of them are other mechanisms. Ideally we fix all of the Available=False noise, while, again, still going Available=False when it is worth summoning an admin immediately. Checking for different reason and message strings in recent 4.15-touching update runs:

$ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/operator-lifecycle-manager-packageserver.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False.*message: \(.*\)|\1 \2 \3|' | sort | uniq -c | sort -n
      3 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded APIService install failed: Unauthorized
      3 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded install timeout
      4 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded install strategy failed: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1.packages.operators.coreos.com": the object has been modified; please apply your changes to the latest version and try again
      9 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded apiServices not installed
     23 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded install strategy failed: could not create service packageserver-service: services "packageserver-service" already exists
     82 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded APIService install failed: forbidden: User "system:anonymous" cannot get path "/apis/packages.operators.coreos.com/v1"

How reproducible:

Lots of hits in the above CI search. Running one of the 100% impact flavors has a good chance at reproducing.

Steps to Reproduce:

1. Install 4.14
2. Update to 4.15
3. Keep an eye on operator-lifecycle-manager-packageserver's ClusterOperator Available.

Actual results:

Available=False blips.

Expected results:

Available=True the whole time, or any Available=False looks like a serious issue where summoning an admin would have been appropriate.

Additional info

Causes also these testcases to fail (mentioning them here for Sippy to link here on relevant component readiness failures):

[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]

https://github.com/openshift/operator-framework-olm/pull/710

Bug OCPBUGS-32497: Improve Create serverless function error message

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28723~~. The following is the description of the original issue:
—
Description of problem:

 "create serverless function" functionality in the Openshift UI. When you add a (random) repository it shows a warning saying "func.yaml is not present and builder strategy is not s2i" but without any further link or information. That's not a very good UX imo.  Could we add a link to explain to the user what that entails?

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    https://redhat-internal.slack.com/archives/CJYKV1YAH/p1706639383940559

https://github.com/openshift/console/pull/13778

Bug OCPBUGS-42865: Built-in join subnet "100.64.0.0/16" overlaps cluster subnet "100.64.0.0/15" even though internalJoinSubnet is configured

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41840~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39209. The following is the description of the original issue:
—
Description of problem:
Attempting to Migrate from OpenShiftSDN to OVNKubernetes but experiencing the below Error once the Limited Live Migration is started.

+ exec /usr/bin/hybrid-overlay-node --node ip-10-241-1-192.us-east-2.compute.internal --config-file=/run/ovnkube-config/ovnkube.conf --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig --cert-dir=/etc/ovn/ovnkube-node-certs --cert-duration=24h
I0829 14:06:20.313928   82345 config.go:2192] Parsed config file /run/ovnkube-config/ovnkube.conf
I0829 14:06:20.314202   82345 config.go:2193] Parsed config: {Default:{MTU:8901 RoutableMTU:0 ConntrackZone:64000 HostMasqConntrackZone:0 OVNMasqConntrackZone:0 HostNodePortConntrackZone:0 ReassemblyConntrackZone:0 EncapType:geneve EncapIP: EncapPort:6081 InactivityProbe:100000 OpenFlowProbe:180 OfctrlWaitBeforeClear:0 MonitorAll:true OVSDBTxnTimeout:1m40s LFlowCacheEnable:true LFlowCacheLimit:0 LFlowCacheLimitKb:1048576 RawClusterSubnets:100.64.0.0/15/23 ClusterSubnets:[] EnableUDPAggregation:true Zone:global} Logging:{File: CNIFile: LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:4 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20} Monitoring:{RawNetFlowTargets: RawSFlowTargets: RawIPFIXTargets: NetFlowTargets:[] SFlowTargets:[] IPFIXTargets:[]} IPFIX:{Sampling:400 CacheActiveTimeout:60 CacheMaxFlows:0} CNI:{ConfDir:/etc/cni/net.d Plugin:ovn-k8s-cni-overlay} OVNKubernetesFeature:{EnableAdminNetworkPolicy:true EnableEgressIP:true EgressIPReachabiltyTotalTimeout:1 EnableEgressFirewall:true EnableEgressQoS:true EnableEgressService:true EgressIPNodeHealthCheckPort:9107 EnableMultiNetwork:true EnableMultiNetworkPolicy:false EnableStatelessNetPol:false EnableInterconnect:false EnableMultiExternalGateway:true EnablePersistentIPs:false EnableDNSNameResolver:false EnableServiceTemplateSupport:false} Kubernetes:{BootstrapKubeconfig: CertDir: CertDuration:10m0s Kubeconfig: CACert: CAData:[] APIServer:https://api-int.nonamenetwork.sandbox1730.opentlc.com:6443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:198.18.0.0/16 ServiceCIDRs:[] OVNConfigNamespace:openshift-ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes:migration.network.openshift.io/plugin= NoHostSubnetNodes:<nil> HostNetworkNamespace:openshift-host-network PlatformType:AWS HealthzBindAddress:0.0.0.0:10256 CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:openshift-dns DNSServiceName:dns-default} Metrics:{BindAddress: OVNMetricsBindAddress: ExportOVSMetrics:false EnablePprof:false NodeServerPrivKey: NodeServerCert: EnableConfigDuration:false EnableScaleMetrics:false} OvnNorth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} Gateway:{Mode:shared Interface: EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 V4MasqueradeSubnet:169.254.169.0/29 V6MasqueradeSubnet:fd69::/125 MasqueradeIPs:{V4OVNMasqueradeIP:169.254.169.1 V6OVNMasqueradeIP:fd69::1 V4HostMasqueradeIP:169.254.169.2 V6HostMasqueradeIP:fd69::2 V4HostETPLocalMasqueradeIP:169.254.169.3 V6HostETPLocalMasqueradeIP:fd69::3 V4DummyNextHopMasqueradeIP:169.254.169.4 V6DummyNextHopMasqueradeIP:fd69::4 V4OVNServiceHairpinMasqueradeIP:169.254.169.5 V6OVNServiceHairpinMasqueradeIP:fd69::5} DisablePacketMTUCheck:false RouterSubnet: SingleNode:false DisableForwarding:false AllowNoUplink:false} MasterHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} ClusterMgrHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} HybridOverlay:{Enabled:true RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789} OvnKubeNode:{Mode:full DPResourceDeviceIdsMap:map[] MgmtPortNetdev: MgmtPortDPResourceName:} ClusterManager:{V4TransitSwitchSubnet:100.88.0.0/16 V6TransitSwitchSubnet:fd97::/64}}
F0829 14:06:20.315468   82345 hybrid-overlay-node.go:54] illegal network configuration: built-in join subnet "100.64.0.0/16" overlaps cluster subnet "100.64.0.0/15"

The OpenShift Container Platform 4 - Cluster has been installed with the below configuration and therefore has a conflict because of the clusterNetwork with the Join Subnet of OVNKubernetes.

$ oc get cm -n kube-system cluster-config-v1 -o yaml
apiVersion: v1
data:
  install-config: |
    additionalTrustBundlePolicy: Proxyonly
    apiVersion: v1
    baseDomain: sandbox1730.opentlc.com
    compute:
    - architecture: amd64
      hyperthreading: Enabled
      name: worker
      platform: {}
      replicas: 3
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      platform: {}
      replicas: 3
    metadata:
      creationTimestamp: null
      name: nonamenetwork
    networking:
      clusterNetwork:
      - cidr: 100.64.0.0/15
        hostPrefix: 23
      machineNetwork:
      - cidr: 10.241.0.0/16
      networkType: OpenShiftSDN
      serviceNetwork:
      - 198.18.0.0/16
    platform:
      aws:
        region: us-east-2
    publish: External
    pullSecret: ""

So following the procedure, the below steps were executed but still the problem is being reported.

oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalJoinSubnet": "100.68.0.0/16"}}}}}'

Checking whether change was applied and one can see it being there/configured.

$ oc get network.operator cluster -o yaml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2024-08-29T10:05:36Z"
  generation: 376
  name: cluster
  resourceVersion: "135345"
  uid: 37f08c71-98fa-430c-b30f-58f82142788c
spec:
  clusterNetwork:
  - cidr: 100.64.0.0/15
    hostPrefix: 23
  defaultNetwork:
    openshiftSDNConfig:
      enableUnidling: true
      mode: NetworkPolicy
      mtu: 8951
      vxlanPort: 4789
    ovnKubernetesConfig:
      egressIPConfig: {}
      gatewayConfig:
        ipv4: {}
        ipv6: {}
        routingViaHost: false
      genevePort: 6081
      ipsecConfig:
        mode: Disabled
      ipv4:
        internalJoinSubnet: 100.68.0.0/16
      mtu: 8901
      policyAuditConfig:
        destination: "null"
        maxFileSize: 50
        maxLogFiles: 5
        rateLimit: 20
        syslogFacility: local0
    type: OpenShiftSDN
  deployKubeProxy: false
  disableMultiNetwork: false
  disableNetworkDiagnostics: false
  kubeProxyConfig:
    bindAddress: 0.0.0.0
  logLevel: Normal
  managementState: Managed
  migration:
    mode: Live
    networkType: OVNKubernetes
  observedConfig: null
  operatorLogLevel: Normal
  serviceNetwork:
  - 198.18.0.0/16
  unsupportedConfigOverrides: null
  useMultiNetworkPolicy: false

Following the above the Limited Live Migration is being triggered, which then suddently stops because of the Error shown.

oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'

Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.16.9

How reproducible:
Always

Steps to Reproduce:
1. Install OpenShift Container Platform 4 with OpenShiftSDN, the configuration shown above and then update to OpenShift Container Platform 4.16
2. Change internalJoinSubnet to prevent a conflict with the Join Subnet of OVNKubernetes (oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":

{"internalJoinSubnet": "100.68.0.0/16"}

}}}}')
3. Initiate the Limited Live Migration running oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
4. Check the logs of ovnkube-node using oc logs ovnkube-node-XXXXX -c ovnkube-controller

Actual results:

+ exec /usr/bin/hybrid-overlay-node --node ip-10-241-1-192.us-east-2.compute.internal --config-file=/run/ovnkube-config/ovnkube.conf --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig --cert-dir=/etc/ovn/ovnkube-node-certs --cert-duration=24h
I0829 14:06:20.313928   82345 config.go:2192] Parsed config file /run/ovnkube-config/ovnkube.conf
I0829 14:06:20.314202   82345 config.go:2193] Parsed config: {Default:{MTU:8901 RoutableMTU:0 ConntrackZone:64000 HostMasqConntrackZone:0 OVNMasqConntrackZone:0 HostNodePortConntrackZone:0 ReassemblyConntrackZone:0 EncapType:geneve EncapIP: EncapPort:6081 InactivityProbe:100000 OpenFlowProbe:180 OfctrlWaitBeforeClear:0 MonitorAll:true OVSDBTxnTimeout:1m40s LFlowCacheEnable:true LFlowCacheLimit:0 LFlowCacheLimitKb:1048576 RawClusterSubnets:100.64.0.0/15/23 ClusterSubnets:[] EnableUDPAggregation:true Zone:global} Logging:{File: CNIFile: LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:4 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20} Monitoring:{RawNetFlowTargets: RawSFlowTargets: RawIPFIXTargets: NetFlowTargets:[] SFlowTargets:[] IPFIXTargets:[]} IPFIX:{Sampling:400 CacheActiveTimeout:60 CacheMaxFlows:0} CNI:{ConfDir:/etc/cni/net.d Plugin:ovn-k8s-cni-overlay} OVNKubernetesFeature:{EnableAdminNetworkPolicy:true EnableEgressIP:true EgressIPReachabiltyTotalTimeout:1 EnableEgressFirewall:true EnableEgressQoS:true EnableEgressService:true EgressIPNodeHealthCheckPort:9107 EnableMultiNetwork:true EnableMultiNetworkPolicy:false EnableStatelessNetPol:false EnableInterconnect:false EnableMultiExternalGateway:true EnablePersistentIPs:false EnableDNSNameResolver:false EnableServiceTemplateSupport:false} Kubernetes:{BootstrapKubeconfig: CertDir: CertDuration:10m0s Kubeconfig: CACert: CAData:[] APIServer:https://api-int.nonamenetwork.sandbox1730.opentlc.com:6443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:198.18.0.0/16 ServiceCIDRs:[] OVNConfigNamespace:openshift-ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes:migration.network.openshift.io/plugin= NoHostSubnetNodes:<nil> HostNetworkNamespace:openshift-host-network PlatformType:AWS HealthzBindAddress:0.0.0.0:10256 CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:openshift-dns DNSServiceName:dns-default} Metrics:{BindAddress: OVNMetricsBindAddress: ExportOVSMetrics:false EnablePprof:false NodeServerPrivKey: NodeServerCert: EnableConfigDuration:false EnableScaleMetrics:false} OvnNorth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} Gateway:{Mode:shared Interface: EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 V4MasqueradeSubnet:169.254.169.0/29 V6MasqueradeSubnet:fd69::/125 MasqueradeIPs:{V4OVNMasqueradeIP:169.254.169.1 V6OVNMasqueradeIP:fd69::1 V4HostMasqueradeIP:169.254.169.2 V6HostMasqueradeIP:fd69::2 V4HostETPLocalMasqueradeIP:169.254.169.3 V6HostETPLocalMasqueradeIP:fd69::3 V4DummyNextHopMasqueradeIP:169.254.169.4 V6DummyNextHopMasqueradeIP:fd69::4 V4OVNServiceHairpinMasqueradeIP:169.254.169.5 V6OVNServiceHairpinMasqueradeIP:fd69::5} DisablePacketMTUCheck:false RouterSubnet: SingleNode:false DisableForwarding:false AllowNoUplink:false} MasterHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} ClusterMgrHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} HybridOverlay:{Enabled:true RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789} OvnKubeNode:{Mode:full DPResourceDeviceIdsMap:map[] MgmtPortNetdev: MgmtPortDPResourceName:} ClusterManager:{V4TransitSwitchSubnet:100.88.0.0/16 V6TransitSwitchSubnet:fd97::/64}}
F0829 14:06:20.315468   82345 hybrid-overlay-node.go:54] illegal network configuration: built-in join subnet "100.64.0.0/16" overlaps cluster subnet "100.64.0.0/15"

Expected results:
OVNKubernetes Limited Live Migration to recognize the change applied for internalJoinSubnet and don't report any CIDR/Subnet overlap during the OVNKubernetes Limited Live Migration

Additional info:
N/A

Affected Platforms:
OpenShift Container Platform 4.16 on AWS

https://github.com/openshift/cluster-network-operator/pull/2523

Bug OCPBUGS-16189: Dual-Stack Hosted Cluster: IPv6 should not be the default pod/service network IPFamily

View the Description View the linked PRs

Description of problem:

When deploying a dual stack HostedCluster the user can define networks like this:


  networking:
    clusterNetwork:      
    - cidr: fd01::/48             
      hostPrefix: 64
    - cidr: 10.132.0.0/14
      hostPrefix: 23
    networkType: OVNKubernetes             
    serviceNetwork:          
    - cidr: fd02::/112
    - cidr: 172.31.0.0/16

This will led to missconfiguration on the hosted cluster where services will have its ClusterIP set to IPv6 family (pod network will still default to IPv4 no matter what the order was).

When deployin a dualstack cluster with the openshift-install binary there is a validation in place that will prevent users from configuring default IPv6 networks when deploying dual-stack clusters:

ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: [networking.serviceNetwork: Invalid value: "fd02::/112, 172.30.0.0/16": IPv4 addresses must be listed before IPv6 addresses, networking.clusterNetwork: Invalid value: "fd01::/48, 10.132.0.0/14": IPv4 addresses must be listed before IPv6 addresses]

ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: networking.clusterNetwork: Invalid value: "fd01::/48, 10.132.0.0/14": IPv4 addresses must be listed before IPv6 addresses     

HyperShift should detect this and either block the cluster creation or swap the order so the cluster gets created with default IPv4 networks.

Version-Release number of selected component (if applicable):

latest

How reproducible:

Always

Steps to Reproduce:

1. Deploy a HC with the networking settings specified and using the image with dual stack patches included quay.io/jparrill/hypershift:OCPBUGS-15331-mix-413v12

Actual results:

Cluster gets deployed with default IPv6 family for services network.

Expected results:

Cluster creation gets blocked OR cluster gets deployed with default IPv4 family for services network.

Additional info:

https://github.com/openshift/hypershift/pull/3047

Bug OCPBUGS-23650: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-gcp/pull/217

Bug OCPBUGS-36374: [4.15.z] SCC pinning for all workloads in platform namespaces (cluster-storage-operator)

View the Description View the linked PRs

Backport to 4.15 of AUTH-482 specifically for the cluster-storage-operator.

Namespaces with workloads that need pinning:

openshift-cluster-csi-drivers
openshift-cluster-storage-operator

See 4.16 PR for more info on what needs pinning.

https://github.com/openshift/cluster-storage-operator/pull/484

Bug OCPBUGS-19017: dnsmasq failing to start on bootstrap VM

View the Description View the linked PRs

dnsmasq isn't starting on okd-scos in the bootstrap VM

logs should it failing with "Operation not permitted"

https://github.com/openshift/installer/pull/7487

Bug OCPBUGS-19127: Update 4.15 ose-containernetworking-plugins image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/containernetworking-plugins/pull/122

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/containernetworking-plugins/pull/122

Bug OCPBUGS-25818: CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25673~~. The following is the description of the original issue:
—
Description of problem:

CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition.

We see:

$ oc get csv
NAME                                       DISPLAY                    VERSION               REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.14.1   OpenShift Virtualization   4.14.1                kubevirt-hyperconverged-operator.v4.14.0   Replacing
kubevirt-hyperconverged-operator.v4.15.0   OpenShift Virtualization   4.15.0                kubevirt-hyperconverged-operator.v4.14.1   Pending

And on the v4.15.0 CSV:

$ oc get csv kubevirt-hyperconverged-operator.v4.15.0 -o yaml
....
status:
  cleanup: {}
  conditions:
  - lastTransitionTime: "2023-12-19T01:50:48Z"
    lastUpdateTime: "2023-12-19T01:50:48Z"
    message: requirements not yet checked
    phase: Pending
    reason: RequirementsUnknown
  - lastTransitionTime: "2023-12-19T01:50:48Z"
    lastUpdateTime: "2023-12-19T01:50:48Z"
    message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True"
      is outdated'
    phase: Pending
    reason: OperatorConditionNotUpgradeable
  lastTransitionTime: "2023-12-19T01:50:48Z"
  lastUpdateTime: "2023-12-19T01:50:48Z"
  message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True"
    is outdated'
  phase: Pending
  reason: OperatorConditionNotUpgradeable

and if we check the pending operator condition (v4.14.1) we see:

$ oc get operatorcondition kubevirt-hyperconverged-operator.v4.14.1 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2023-12-16T17:10:17Z"
  generation: 18
  labels:
    operators.coreos.com/kubevirt-hyperconverged.openshift-cnv: ""
  name: kubevirt-hyperconverged-operator.v4.14.1
  namespace: openshift-cnv
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: kubevirt-hyperconverged-operator.v4.14.1
    uid: 7db79d4b-e69e-4af8-9335-6269cf004440
  resourceVersion: "4116127"
  uid: 347306c9-865a-42b8-b2c9-69192b0e350a
spec:
  conditions:
  - lastTransitionTime: "2023-12-18T18:47:23Z"
    message: ""
    reason: Upgradeable
    status: "True"
    type: Upgradeable
  deployments:
  - hco-operator
  - hco-webhook
  - hyperconverged-cluster-cli-download
  - cluster-network-addons-operator
  - virt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  serviceAccounts:
  - hyperconverged-cluster-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
status:
  conditions:
  - lastTransitionTime: "2023-12-18T09:41:06Z"
    message: ""
    observedGeneration: 11
    reason: Upgradeable
    status: "True"
    type: Upgradeable

where metadata.generation (18) is not in sync with status.conditions[*].observedGeneration (11).

Even manually redacting spec.conditions.lastTransitionTime is causing a change in metadata.generation (as expected) but this doesn't trigger any reconciliation on the OLM and so status.conditions[*].observedGeneration remains at 11.

$ oc get operatorcondition kubevirt-hyperconverged-operator.v4.14.1 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2023-12-16T17:10:17Z"
  generation: 19
  labels:
    operators.coreos.com/kubevirt-hyperconverged.openshift-cnv: ""
  name: kubevirt-hyperconverged-operator.v4.14.1
  namespace: openshift-cnv
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: kubevirt-hyperconverged-operator.v4.14.1
    uid: 7db79d4b-e69e-4af8-9335-6269cf004440
  resourceVersion: "4147472"
  uid: 347306c9-865a-42b8-b2c9-69192b0e350a
spec:
  conditions:
  - lastTransitionTime: "2023-12-18T18:47:25Z"
    message: ""
    reason: Upgradeable
    status: "True"
    type: Upgradeable
  deployments:
  - hco-operator
  - hco-webhook
  - hyperconverged-cluster-cli-download
  - cluster-network-addons-operator
  - virt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  serviceAccounts:
  - hyperconverged-cluster-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
status:
  conditions:
  - lastTransitionTime: "2023-12-18T09:41:06Z"
    message: ""
    observedGeneration: 11
    reason: Upgradeable
    status: "True"
    type: Upgradeable

since its observedGeneration is out of sync, this check:
https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/olm/operatorconditions.go#L44C1-L48

fails and the upgrade never starts.

I suspect (I'm only guessing) that it could be a regression introduced with the memory optimization for https://issues.redhat.com/browse/OCPBUGS-17157 .

Version-Release number of selected component (if applicable):

    OCP 4.15.0-ec.3

How reproducible:

- Not reproducible (with the same CNV bundles) on OCP v4.14.z.
- Pretty high (but not 100%) on OCP 4.15.0-ec.3

Steps to Reproduce:

    1. Try triggering a CNV v4.14.1 -> v4.15.0 on OCP 4.15.0-ec.3
    2.
    3.

Actual results:

    The OLM is not reacting to changes on spec.conditions on the pending operator condition, so metadata.generation is constantly out of sync with status.conditions[*].observedGeneration and so the CSV is reported as 

    message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True"
      is outdated'
    phase: Pending
    reason: OperatorConditionNotUpgradeable

Expected results:

    The OLM correctly reconcile the operatorCondition and the upgrade starts

Additional info:

    Not reproducible with exactly the same bundle (origin and target) on OCP v4.14.z

https://github.com/openshift/operator-framework-olm/pull/644

Bug OCPBUGS-31503: Bump to kubernetes 1.28.8

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.28.8:

Changelog:
v1.28.8: https://github.com/kubernetes/kubernetes/blob/release-1.28/CHANGELOG/CHANGELOG-1.28.md#changelog-since-v1287

https://github.com/openshift/kubernetes/pull/1926

Vulnerability OCPBUGS-43953: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/605

Bug OCPBUGS-27417: Baremetal bootstrap logs no longer contain all services

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27092~~. The following is the description of the original issue:
—
Description of problem:

When bootstrap logs are collected (e.g. as part of a CI run when bootstrapping fails), it no longer contains most of the Ironic services. They used to be run in standalone pods, but after a recent refactoring, they are systemd services.

https://github.com/openshift/installer/pull/7928

Bug OCPBUGS-37061: Update to azidentity v1.7.0 [4.15]

View the Description View the linked PRs

Description of problem:

  Snyk is failing on some deps

Version-Release number of selected component (if applicable):

  At least master/4.17 and 4.16

How reproducible:

    100%

Steps to Reproduce:

Open a PR against master or release-4.16 branch, Snyk will fail. And it seems like recent history shows that the test is just being overridden, we should stop overriding the test and fix the deps or justify excluding them from Snyk

Actual results:

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/679/pull-ci-openshift-cloud-credential-operator-master-security/1793098328855023616

https://github.com/openshift/cloud-credential-operator/pull/728

Bug OCPBUGS-9331: Manila deployed without metrics endpoints

View the Description View the linked PRs

Version: 4.11.0-0.nightly-2022-06-22-015220

$ openshift-install version
openshift-install 4.11.0-0.nightly-2022-06-22-015220
built from commit f912534f12491721e3874e2bf64f7fa8d44aa7f5
release image registry.ci.openshift.org/ocp/release@sha256:9c2e9cafaaf48464a0d27652088d8fb3b2336008a615868aadf8223202bdc082
release architecture amd64

Platform: OSP 16.1.8 with manila service

Please specify:

IPI

What happened?

In a fresh 4.11 cluster (with Kuryr, but shouldn't be related to the issue), there are not endpoints
for manila metrics:

> $ oc -n openshift-manila-csi-driver get endpoints
NAME ENDPOINTS AGE
manila-csi-driver-controller-metrics <none> 3h7m

> $ oc -n openshift-manila-csi-driver describe endpoints
Name: manila-csi-driver-controller-metrics
Namespace: openshift-manila-csi-driver
Labels: app=manila-csi-driver-controller-metrics
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2022-06-22T10:30:06Z
Subsets:
Events: <none>

> $ oc -n openshift-manila-csi-driver get all
NAME READY STATUS RESTARTS AGE
pod/csi-nodeplugin-nfsplugin-4mqgx 1/1 Running 0 3h7m
pod/csi-nodeplugin-nfsplugin-555ns 1/1 Running 0 3h2m
pod/csi-nodeplugin-nfsplugin-bn26j 1/1 Running 0 3h7m
pod/csi-nodeplugin-nfsplugin-lfsm7 1/1 Running 0 3h1m
pod/csi-nodeplugin-nfsplugin-xwxnz 1/1 Running 0 3h1m
pod/csi-nodeplugin-nfsplugin-zqnkt 1/1 Running 0 3h7m
pod/openstack-manila-csi-controllerplugin-7fc4b4f56d-ddn25 6/6 Running 2 (158m ago) 3h7m
pod/openstack-manila-csi-controllerplugin-7fc4b4f56d-p9jss 6/6 Running 0 3h6m
pod/openstack-manila-csi-nodeplugin-6w426 2/2 Running 0 3h2m
pod/openstack-manila-csi-nodeplugin-fvsjr 2/2 Running 0 3h7m
pod/openstack-manila-csi-nodeplugin-g9x4t 2/2 Running 0 3h1m
pod/openstack-manila-csi-nodeplugin-gp76x 2/2 Running 0 3h7m
pod/openstack-manila-csi-nodeplugin-n9v9t 2/2 Running 0 3h7m
pod/openstack-manila-csi-nodeplugin-s6srv 2/2 Running 0 3h1m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/manila-csi-driver-controller-metrics ClusterIP 172.30.118.232 <none> 443/TCP,444/TCP 3h7m

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/csi-nodeplugin-nfsplugin 6 6 6 6 6 <none> 3h7m
daemonset.apps/openstack-manila-csi-nodeplugin 6 6 6 6 6 <none> 3h7m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/openstack-manila-csi-controllerplugin 2/2 2 2 3h7m

NAME DESIRED CURRENT READY AGE
replicaset.apps/openstack-manila-csi-controllerplugin-5697ccfcbf 0 0 0 3h7m
replicaset.apps/openstack-manila-csi-controllerplugin-7fc4b4f56d 2 2 2 3h7m

This can lead to not being able to retrieve manila metrics.

openshift_install.log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/j2pg/DFG-osasinfra-shiftstack_periodic_subjob-ocp_install-4.11-kuryr-ipi/15/undercloud-0/home/stack/ostest/.openshift_install.log.gz

must_gather: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/j2pg/DFG-osasinfra-shiftstack_periodic_subjob-ocp_install-4.11-kuryr-ipi/15/infrared/.workspaces/workspace_2022-06-22_09-58-34/must-gather-install.tar.gz/

cinder-csi for example is configured with such endpoints:

> $ oc -n openshift-cluster-csi-drivers get endpoints
NAME ENDPOINTS AGE
openstack-cinder-csi-driver-controller-metrics 10.196.1.100:9203,10.196.2.82:9203,10.196.1.100:9205 + 5 more... 3h15m

> $ oc -n openshift-cluster-csi-drivers describe endpoints
Name: openstack-cinder-csi-driver-controller-metrics
Namespace: openshift-cluster-csi-drivers
Labels: app=openstack-cinder-csi-driver-controller-metrics
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2022-06-22T10:58:57Z
Subsets:
Addresses: 10.196.1.100,10.196.2.82
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
attacher-m 9203 TCP
snapshotter-m 9205 TCP
provisioner-m 9202 TCP
resizer-m 9204 TCP

Events: <none>

https://github.com/openshift/csi-driver-manila-operator/pull/210

Bug OCPBUGS-22357: Fix and bump library-go for storage operators

View the Description View the linked PRs

We need to fix and bump library-go for http2 vulnerability CVE-2023-44487. This effectively turns off HTTP/2 in library-go http endpoints, i.e. metrics and health.

Bug OCPBUGS-22497: Inline Dockerbuild type doesn't preserve file modified timestamp

View the Description View the linked PRs

While trying to develop a demo for a Java application, that first builds using the source-to-image strategy and then uses the resulting image to copy artefacts from the s2i-builder+compiled sources-image to a slimmer runtime image using an inline Dockerfile build strategy on OpenShift, the deployment then fails since the inline Dockerfile hooks doesn't preserve the modification time of the file that gets copied. This is different to how 'docker' itself does it with a multi-stage build.

Version-Release number of selected component (if applicable):

4.12.14

How reproducible:

Always

Steps to Reproduce:

1. git clone https://github.com/jerboaa/quarkus-quickstarts
2. cd quarkus-quickstarts && git checkout ocp-bug-inline-docker
3. oc new-project quarkus-appcds-nok
4. oc process -f rest-json-quickstart/openshift/quarkus_runtime_appcds_template.yaml | oc create -f -

Actual results:

$ oc logs quarkus-rest-json-appcds-4-xc47z
INFO exec -a "java" java -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -Xshare:on -XX:SharedArchiveFile=/deployments/app-cds.jsa -Dquarkus.http.host=0.0.0.0 -cp "." -jar /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar 
INFO running in /deployments
Error occurred during initialization of VM
Unable to use shared archive.
An error has occurred while processing the shared archive file.
A jar file is not the one used while building the shared archive file: rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar

Expected results:

Starting the Java application using /opt/jboss/container/java/run/run-java.sh ...
INFO exec -a "java" java -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -Xshare:on -XX:SharedArchiveFile=/deployments/app-cds.jsa -Dquarkus.http.host=0.0.0.0 -cp "." -jar /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar 
INFO running in /deployments
__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2023-10-27 18:13:01,866 INFO  [io.quarkus] (main) rest-json-quickstart 1.0.0-SNAPSHOT on JVM (powered by Quarkus 3.4.3) started in 0.966s. Listening on: http://0.0.0.0:8080
2023-10-27 18:13:01,867 INFO  [io.quarkus] (main) Profile prod activated. 
2023-10-27 18:13:01,867 INFO  [io.quarkus] (main) Installed features: [cdi, resteasy-reactive, resteasy-reactive-jackson, smallrye-context-propagation, vertx]

Additional info:

When deploying with AppCDS turned on, then we can get the pods to start and when we then look at the modified file time of the offending file we notice that these differ from the original s2i-merge-image (A) and the runtime image (B):

(A)
$ oc rsh quarkus-rest-json-appcds-s2i-1-x5hct stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057039  	Blocks: 31368      IO Block: 4096   regular file
Device: 200001h/2097153d	Inode: 60146490    Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  185/ default)   Gid: (    0/    root)
Access: 2023-10-27 18:11:22.000000000 +0000
Modify: 2023-10-27 18:11:22.000000000 +0000
Change: 2023-10-27 18:11:41.555586774 +0000
 Birth: 2023-10-27 18:11:41.491586774 +0000

(B)
$ oc rsh quarkus-rest-json-appcds-1-l7xw2 stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057039  	Blocks: 31368      IO Block: 4096   regular file
Device: 2000a3h/2097315d	Inode: 71601163    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-10-27 18:11:44.000000000 +0000
Modify: 2023-10-27 18:11:44.000000000 +0000
Change: 2023-10-27 18:12:12.169087346 +0000
 Birth: 2023-10-27 18:12:12.114087346 +0000

Both should have 'Modify: 2023-10-27 18:11:22.000000000 +0000'.

When I perform a local s2i build of the same application sources and then use this multi-stage Dockerfile, the modify time of the files remain the same.

FROM quarkus-app-uberjar:ubi9 as s2iimg

FROM registry.access.redhat.com/ubi9/openjdk-17-runtime as final
COPY --from=s2iimg /deployments/* /deployments/
ENV JAVA_OPTS_APPEND="-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -Xshare:on -XX:SharedArchiveFile=app-cds.jsa"

as shown here:

$ sudo docker run --rm -ti --entrypoint /bin/bash quarkus-app-uberjar:ubi9 -c 'stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar'
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057020  	Blocks: 31368      IO Block: 4096   regular file
Device: 6fh/111d	Inode: 276781319   Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  185/ default)   Gid: (    0/    root)
Access: 2023-10-27 15:52:28.000000000 +0000
Modify: 2023-10-27 15:52:28.000000000 +0000
Change: 2023-10-27 15:52:37.352926632 +0000
 Birth: 2023-10-27 15:52:37.288926109 +0000
$ sudo docker run --rm -ti --entrypoint /bin/bash quarkus-cds-app -c 'stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar'
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057020  	Blocks: 31368      IO Block: 4096   regular file
Device: 6fh/111d	Inode: 14916403    Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  185/ default)   Gid: (    0/    root)
Access: 2023-10-27 15:52:28.000000000 +0000
Modify: 2023-10-27 15:52:28.000000000 +0000
Change: 2023-10-27 15:53:04.408147760 +0000
 Birth: 2023-10-27 15:53:04.346147253 +0000

Both have a modified file time of 2023-10-27 15:52:28.000000000 +0000

https://github.com/openshift/builder/pull/369

Bug OCPBUGS-38613: [HCP] APIServer CR is not synced to the hosted cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38486~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-23922~~. The following is the description of the original issue:
—
Description of problem:

In https://issues.redhat.com//browse/STOR-1453: TLSSecurityProfile feature, storage clustercsidriver.spec.observedConfig will get the value from APIServer.spec.tlsSecurityProfile to set cipherSuites and minTLSVersion in all corresponding csi driver, but it doesn't work well in hypershift cluster when only setting different value in the hostedclusters.spec.configuration.apiServer.tlsSecurityProfile in management cluster, the APIServer.spec in hosted cluster is not synced and CSI driver doesn't get the updated value as well.

Version-Release number of selected component (if applicable):

Pre-merge test with openshift/csi-operator#69,openshift/csi-operator#71

How reproducible:

Always

Steps to Reproduce:

1. Have a hypershift cluster, the clustercsidriver get the default value like "minTLSVersion": "VersionTLS12"
$ oc get clustercsidriver ebs.csi.aws.com -ojson | jq .spec.observedConfig.targetcsiconfig.servingInfo
{
  "cipherSuites": [
    "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256",
    "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"
  ],
  "minTLSVersion": "VersionTLS12"
}
 
2. set the tlsSecurityProfile in hostedclusters.spec.configuration.apiServer in mgmtcluster, like the "minTLSVersion": "VersionTLS11":
 $ oc -n clusters get hostedclusters hypershift-ci-14206 -o json | jq .spec.configuration
{
  "apiServer": {
    "audit": {
      "profile": "Default"
    },
    "tlsSecurityProfile": {
      "custom": {
        "ciphers": [
          "ECDHE-ECDSA-CHACHA20-POLY1305",
          "ECDHE-RSA-CHACHA20-POLY1305",
          "ECDHE-RSA-AES128-GCM-SHA256",
          "ECDHE-ECDSA-AES128-GCM-SHA256"
        ],
        "minTLSVersion": "VersionTLS11"
      },
      "type": "Custom"
    }
  }
}     

3. This doesn't pass to apiserver in hosted cluster
oc get apiserver cluster -ojson | jq .spec
{
  "audit": {
    "profile": "Default"
  }
}     

4. CSI Driver still use the default value which is different from mgmtcluster.hostedclusters.spec.configuration.apiServer
$ oc get clustercsidriver ebs.csi.aws.com -ojson | jq .spec.observedConfig.targetcsiconfig.servingInfo
{
  "cipherSuites": [
    "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256",
    "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"
  ],
  "minTLSVersion": "VersionTLS12"
}

Actual results:

The tlsSecurityProfile doesn't get synced

Expected results:

The tlsSecurityProfile should get synced

Additional info:

https://github.com/openshift/hypershift/pull/4567

Bug OCPBUGS-23292: Webpack-DevServer Hot-Reload Not Working

View the Description View the linked PRs

Description of problem:

Webpack-DevServer Hot-Reload Not Working due to recent update to nodejsv18

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13331

Bug OCPBUGS-43468: HCP unable to pull images from registries only accessible from worker nodes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43308~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-43051~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42783. The following is the description of the original issue:
—
Context
Some ROSA HCP users host their own container registries (e.g., self-hosted Quay servers) that are only accessible from inside of their VPCs. This is often achieved through the use of private DNS zones that resolve non-public domains like quay.mycompany.intranet to non-public IP addresses. The private registries at those addresses then present self-signed SSL certificates to the client that can be validated against the HCP's additional CA trust bundle.

Problem Description
A user of a ROSA HCP cluster with a configuration like the one described above is encountering errors when attempting to import a container image from their private registry into their HCP's internal registry via oc import-image. Originally, these errors showed up in openshift-apiserver logs as DNS resolution errors, i.e., ~~OCPBUGS-36944~~. After the user upgraded their cluster to 4.14.37 (which fixes ~~OCPBUGS-36944~~), openshift-apiserver was able to properly resolve the domain name but complains of HTTP 502 Bad Gateway errors. We suspect these 502 Bad Gateway errors are coming from the Konnectivity-agent while it proxies traffic between the control and data planes.

We've confirmed that the private registry is accessible from the HCP data plane (worker nodes) and that the certificate presented by the registry can be validated against the cluster's additional trust bundle. IOW, curl-ing the private registry from a worker node returns a HTTP 200 OK, but doing the same from a control plane node returns a HTTP 502. Notably, this cluster is not configured with a cluster-wide proxy, nor does the user's VPC feature a transparent proxy.

Version-Release number of selected component
OCP v4.14.37

How reproducible
Can be reliably reproduced, although the network config (see Context above) is quite specific

Steps to Reproduce

Run the following command from the HCP data plane

oc import-image imagegroup/imagename:v1.2.3 --from=quay.mycompany.intranet/imagegroup/imagename:v1.2.3 --confirm

Observe the command output, the resulting ImageStream object, and openshift-apiserver logs

Actual Results

error: tag v1.2.3 failed: Internal error occurred: quay.mycompany.intranet/imagegroup/imagename:v1.2.3: Get "https://quay.mycompany.intranet/v2/": Bad Gateway
imagestream.image.openshift.io/imagename imported with errors

Name:            imagename
Namespace:        mynamespace
Created:        Less than a second ago
Labels:            <none>
Annotations:        openshift.io/image.dockerRepositoryCheck=2024-10-01T12:46:02Z
Image Repository:    default-route-openshift-image-registry.apps.rosa.clustername.abcd.p1.openshiftapps.com/mynamespace/imagename
Image Lookup:        local=false
Unique Images:        0
Tags:            1

v1.2.3
  tagged from quay.mycompany.intranet/imagegroup/imagename:v1.2.3

  ! error: Import failed (InternalError): Internal error occurred: quay.mycompany.intranet/imagegroup/imagename:v1.2.3: Get "https://quay.mycompany.intranet/v2/": Bad Gateway
      Less than a second ago

error: imported completed with errors

Expected Results
Desired container image is imported from private external image registry into cluster's internal image registry without error

https://github.com/openshift/hypershift/pull/4929

Bug OCPBUGS-24213: kube-apiserver TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1568

Bug OCPBUGS-24323: MSTeams receiver with empty title/text triggers prometheus operator panic

View the Description View the linked PRs

Pre-requisites:

UWM enabled with AlertmanagerConfig support.

The following AlertmanagerConfig object will trigger a panic of the UWM prometheus operator:



apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: alertmanager-config
  labels:
    resource: prometheus
spec:
  route:
    groupBy: ["..."]
    groupWait: 1m
    groupInterval: 1m
    repeatInterval: 12h
    receiver: "default_channel"
    routes:
      - matchers:
        - matchType: =
          name: severity
          value: warning
        receiver: teams
receivers:
    - name: "default_channel"
    - name: teams
      msteamsConfigs:
        - webhookUrl:
            name: alertmanager-teams
            key: webhook

See https://github.com/prometheus-operator/prometheus-operator/issues/6082

Bug OCPBUGS-29850: Revocation of customer certificate causes access to the cluster using kubeconfig with sre cert to be denied

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29645~~. The following is the description of the original issue:
—
Description of problem:

When a customer certificate and sre certificate are configured and approved, revocation of customer certificate causes access to the cluster using kubeconfig with sre cert to be denied

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. Create a cluster
    2. Configure a customer cert and a sre cert, they are approved
    3. Revoke a customer cert, access to the cluster using kubeconfig with sre cert gets denied

Actual results:

   Revoke a customer cert, access to the cluster using kubeconfig with sre cert gets denied

Expected results:

   Revoke a customer cert, access to the cluster using kubeconfig with sre cert succeeds

Additional info:

https://github.com/openshift/hypershift/pull/3626

Bug OCPBUGS-18504: CAPI E2Es: missing ControlPlaneEndpoint field in AWSCluster

View the Description View the linked PRs

Description of problem:

Currently CAPI Cluster object always stays in `Provisioning` state.
This is because there is nothing that sets the ControlPlaneEndpoint field on the object.

Version-Release number of selected component (if applicable):

all

How reproducible:

Always

Steps to Reproduce:

1. Run E2Es
2. See that Cluster always stays in Provisioning state
3.

Actual results:

Cluster always stays in Provisioning state

Expected results:

Cluster should go into Provisioned state

Additional info:

As such we need to update the E2E tests and the objects creation scripts so that they set the ControlPlaneEndpoint before Cluster object creation, to make the Cluster go into Provisioned state.
This is a temporary workaround, as we expect the Cluster & InfrastructureCluster objects creation and the population of the ControlPlaneEndpoint is going to happen in a dedicated controller within the operator.

https://github.com/openshift/cluster-capi-operator/pull/126

Bug OCPBUGS-18832: shortname for FAR Template not correct in console resource badge

View the Description View the linked PRs

Description of problem:

console does not enable customizing the abbreviation that appears on the resource icon badge. This causes an issue for the FAR operator with the CRD FenceAgentRemediationTemplate, the badge icon shows FART. The CRD includes a custom short name, but the console ignores it

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. create the CRD (included link to github)
2. navigate to Home -> search
3. Enter far into the Resources filter

Actual results:

The badge FART shows in the dropdown

Expected results:

The badge should show fartemplate - the content of the short name

Additional info:

https://github.com/openshift/console/pull/13162

Bug OCPBUGS-22772: segmentation violation code=0x1, github.com/openshift/installer/pkg/asset.PersistToFile

View the Description View the linked PRs

Description of problem:

level=error198level=error msg=Error: waiting for EC2 Instance (i-054a010f3e99f7a2c) create: timeout while waiting for state to become 'running' (last state: 'pending', timeout: 10m0s)199level=error200level=error msg=  with module.masters.aws_instance.master[2],201level=error msg=  on master/main.tf line 136, in resource "aws_instance" "master":202level=error msg= 136: resource "aws_instance" "master" {203level=error204panic: runtime error: invalid memory address or nil pointer dereference205[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1936dcc]206207goroutine 1 [running]:208github.com/openshift/installer/pkg/asset.PersistToFile({0x22860140?, 0x277372f0?}, {0x7ffc102e22db, 0xe})209	/go/src/github.com/openshift/installer/pkg/asset/asset.go:57 +0xac210github.com/openshift/installer/pkg/asset.(*fileWriterAdapter).PersistToFile(0x227fa3e0?, {0x7ffc102e22db?, 0x277372f0?})211	/go/src/github.com/openshift/installer/pkg/asset/filewriter.go:19 +0x31212main.runTargetCmd.func1({0x7ffc102e22db, 0xe})213	/go/src/github.com/openshift/installer/cmd/openshift-install/create.go:277 +0x24a214main.runTargetCmd.func2(0x275d0340?, {0xc0007a6d00?, 0x1?, 0x1?})215	/go/src/github.com/openshift/installer/cmd/openshift-install/create.go:302 +0xe7216github.com/spf13/cobra.(*Command).execute(0x275d0340, {0xc0007a6cc0, 0x1, 0x1})217	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:920 +0x847218github.com/spf13/cobra.(*Command).ExecuteC(0xc000956000)219	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:1040 +0x3bd220github.com/spf13/cobra.(*Command).Execute(...)221	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:968222main.installerMain()223	/go/src/github.com/openshift/installer/cmd/openshift-install/main.go:56 +0x2b0224main.main()225	/go/src/github.com/openshift/installer/cmd/openshift-install/main.go:33 +0xff226Installer exit with code 2

Version-Release number of selected component (if applicable):

4.15

How reproducible:

I noticed it on a presubmit

Steps to Reproduce:

1.Run pull-ci-openshift-origin-master-e2e-aws-ovn-fips job on openshift/origin repo presubmit 2.
3.

Actual results:

Expected results:

Additional info:

Example where it occurred: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/28372/pull-ci-openshift-origin-master-e2e-aws-ovn-fips/1719449092209250304

This shows it happed on several jobs: https://search.ci.openshift.org/?search=asset.PersistToFile&maxAge=48h&context=1&type=build-log&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/installer/pull/7671

Bug OCPBUGS-23706: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/307

Bug OCPBUGS-33020: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2141

Bug OCPBUGS-41635: [Backport-4.15] Cluster-ingress-operator logs an update when one didn't happen

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39324~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34413~~. The following is the description of the original issue:
—
Description of problem:

Cluster-ingress-operator logs an update when one didn't happen.

% grep -e 'successfully updated Infra CR with Ingress Load Balancer IPs' -m 1 -- ingress-operator.log       
2024-05-17T14:46:01.434Z	INFO	operator.ingress_controller	ingress/controller.go:326	successfully updated Infra CR with Ingress Load Balancer IPs

% grep -e 'successfully updated Infra CR with Ingress Load Balancer IPs' -c -- ingress-operator.log 
142

https://github.com/openshift/cluster-ingress-operator/pull/1016 has a logic error, which causes the operator to log this message even when it didn't do an update:

[https://github.com/openshift/cluster-ingress-operator/blob/009644a6b197b67f074cc34a07868ef01db31510/pkg/operator/controller/ingress/controller.go#L1135-L1145

// If the lbService exists for the "default" IngressController, then update Infra CR's PlatformStatus with the Ingress LB IPs. 

if haveLB && ci.Name == manifests.DefaultIngressControllerName 
{ if updated, err := computeUpdatedInfraFromService(lbService, infraConfig); err != nil 
{ errs = append(errs, fmt.Errorf("failed to update Infrastructure PlatformStatus: %w", err)) } 
else if updated 
{ if err := r.client.Status().Update(context.TODO(), infraConfig); err != nil { errs = append(errs, fmt.Errorf("failed to update Infrastructure CR after updating Ingress LB IPs: %w", err)) } } 

log.Info("successfully updated Infra CR with Ingress Load Balancer IPs") }

Version-Release number of selected component (if applicable):

    4.17

How reproducible:

    100%

Steps to Reproduce:

    1. Create a LB service for the default Ingress Operator
    2. Watch ingress operator logs for the search strings mentioned above

Actual results:

    Lots of these log entries will be seen even though no further updates are made to the default ingress operator:

2024-05-17T14:46:01.434Z INFO operator.ingress_controller ingress/controller.go:326 successfully updated Infra CR with Ingress Load Balancer IPs

Expected results:

    Only see this log entry when an update to Infra CR is made.  Perhaps just one the first time you add a LB service to the default ingress operator.

Additional info:

     https://github.com/openshift/cluster-ingress-operator/pull/1016 was backported to 4.15, so it would be nice to fix it and backport the fix to 4.15. It is rather noisy, and it's trivial to fix.

https://github.com/openshift/cluster-ingress-operator/pull/1141

Task HOSTEDCP-1306: Bump Golang to v1.20 in Containerfile.operator for RHTAP

View the Description View the linked PRs

Bump Golang to v1.20 in Containerfile.operator for RHTAP

https://github.com/openshift/hypershift/pull/3196

Bug OCPBUGS-19052: Avoid caching etcdctl on cluster-backup.sh

View the Description View the linked PRs

Description of problem:

With OCPBUGS-18274 we had to update the etcdctl binary. Unfortunately the script does not attempt to update the binary if it's found in the path already:

https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/etcd/etcd-common-tools#L16-L24

This causes confusion as the binary might not be the latest that we're shipping with etcd.

Pulling the binary shouldn't be a big deal, etcd is running locally anyway and the local image should be cached already just fine. We should always replace the binary

Version-Release number of selected component (if applicable):

any currently supported release

How reproducible:

always

Steps to Reproduce:

1. run cluster-backup.sh to download the binary
2. update the etcd image (take a different version or so)
3. run cluster-backup.sh again

Actual results:

cluster-backup.sh will simply print "etcdctl is already installed"

Expected results:

etcdctl should always be pulled

Additional info:

Bug OCPBUGS-46482: [release-4.15] When the webhook token authenticator is enabled, the console is in crashloopback

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-46481~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-46390. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-46068. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45222. The following is the description of the original issue:
—
Description of problem:


When setting up the "webhookTokenAuthenticator" the oauth configure "type" is set to "None". 
Then controller sets the console configmap with "authType=disabled". Which will cause that the console pod goes in the crash loop back due to the not allowed type:

Error:
validate.go:76] invalid flag: user-auth, error: value must be one of [oidc openshift], not disabled.

This worked before on 4.14, stopped working on 4.15.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.15

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

The console can't start, seems like it is not allowed to change the console.

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/953

Bug MGMT-16047: InfraEnv accepting cpuArchitecture arm64 causes the converged ZTP flow to break

View the Description View the linked PRs

Description of the problem:

The InfraEnv resource will accept both arm64 and aarch64 as valid cpuArchitectures. Both result in an ISO URL with arm64 in the path. However, supplying the infraEnv with cpuArchitecture arm64 will result in the converged flow becoming stuck because of the metal3 PreprovisioningImage resource only accepts aarch64 as an architecture:

  - lastTransitionTime: "2023-10-26T14:46:14Z"
    message: PreprovisioningImage CPU architecture (aarch64) does not match InfraEnv
      CPU architecture (arm64)
    observedGeneration: 2
    reason: InfraEnvArchMismatch
    status: "False"
    type: Ready
  - lastTransitionTime: "2023-10-26T14:46:14Z"
    message: PreprovisioningImage CPU architecture (aarch64) does not match InfraEnv
      CPU architecture (arm64)
    observedGeneration: 2
    reason: InfraEnvArchMismatch
    status: "True"
    type: Error
  networkData: {}

How reproducible:

100%

Steps to reproduce:

1. Create an infraenv with cpuArchitecture: arm64

2. Create BMH resources with the converged flow enabled

Actual results:

PreprovisioningImages have InfraEnvArchMismatch because it only support aarch64 architecture

Expected results:

InfraEnv only support aarch64 cpuArchitecture or correctly translates arm64 to aarch64.

Workaround

The workaround is just to create the InfraEnv resource with cpuArchitecture: aarch64 instead of arm64

https://github.com/openshift/cluster-baremetal-operator/pull/383

Bug OCPBUGS-18690: [azure] Fail to provision bootstrap node with vm size in family standardEIBDSv5Family and standardEIBSv5Family

View the Description View the linked PRs

Description of problem:

In install-config.yaml, set controlplane type to size in vm family standardEIBDSv5Family and standardEIBSv5Family, get below error from installer when creating cluster
----------------------------
09-07 17:55:57.613  level=error msg=Error: creating Linux Virtual Machine: (Name "jima-test-wlgrr-bootstrap" / Resource Group "jima-test-wlgrr-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="The VM size 'Standard_E112ibs_v5' cannot boot with OS image or disk. Please check that disk controller types supported by the OS image or disk is one of the supported disk controller types for the VM size 'Standard_E112ibs_v5'. Please query sku api at https://aka.ms/azure-compute-skus  to determine supported disk controller types for the VM size." Target="vmSize"

Checked that both vm families only support diskControllerTypes NVMe
      {
        "name": "DiskControllerTypes",
        "value": "NVMe"
      },

From https://github.com/hashicorp/terraform-provider-azurerm/issues/22058, seems that it does not support to set disk controller types.

Suggest to add validation for those family as what is done in https://github.com/openshift/installer/pull/6733

Version-Release number of selected component (if applicable):

4.14 nightly build

How reproducible:

always

Steps to Reproduce:

1. prepare install-config, set vm size in family standardEIBDSv5Family and standardEIBSv5Family for controlplane
2. create cluster
3.

Actual results:

Installer failed with error

Expected results:

Installer should have pre-check for those unsupported instance types and exit with error message

Additional info:

https://github.com/openshift/installer/pull/7500

Bug OCPBUGS-31116: When issuerCertificateAuthority is set, kube-apiserver gets CrashLoopBackOff

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30991~~. The following is the description of the original issue:
—
Description of problem:

When issuerCertificateAuthority is set, kube-apiserver pod is CrashLoopBackOff.

Tried RCA debugging, found the cause is: the path /etc/kubernetes/certs/oidc-ca/ca.crt is incorrect. The expected path should be /etc/kubernetes/certs/oidc-ca/ca-bundle.crt .

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-03-13-061822

How reproducible:

    Always

Steps to Reproduce:

1. Create fresh HCP cluster.
2. Create keycloak as OIDC server exposed as a Route which uses cluster's default ingress certificate as the serving certificate.
3. Configure clients necessarily on keycloak admin UI.
4. Configure external OIDC:
$ oc create configmap keycloak-oidc-ca --from-file=ca-bundle.crt=router-ca/ca.crt --kubeconfig $MGMT_KUBECONFIG -n clusters

$ oc patch hc $HC_NAME -n clusters --kubeconfig $MGMT_KUBECONFIG --type=merge -p="
spec:
  configuration:
    authentication:
      oidcProviders:
      - claimMappings:
          groups:
            claim: groups
            prefix: 'oidc-groups-test:'
          username:
            claim: email
            prefixPolicy: Prefix
            prefix:
              prefixString: 'oidc-user-test:'
        issuer:
          audiences:
          - $AUDIENCE_1
          - $AUDIENCE_2
          issuerCertificateAuthority:
            name: keycloak-oidc-ca
          issuerURL: $ISSUER_URL
        name: keycloak-oidc-server
        oidcClients:
        - clientID: $CONSOLE_CLIENT_ID
          clientSecret:
            name: $CONSOLE_CLIENT_SECRET_NAME
          componentName: console
          componentNamespace: openshift-console
      type: OIDC
"

5. Check pods should be renewed, but new pod is CrashLoopBackOff:
$ oc get po -n clusters-$HC_NAME --kubeconfig $MGMT_KUBECONFIG --sort-by metadata.creationTimestamp | tail -n 4
openshift-apiserver-65f8c5f545-x2vdf                  3/3     Running            0               5h8m
community-operators-catalog-57dd5886f7-jq25f          1/1     Running            0               4h1m
kube-apiserver-5d75b5b848-c9c8r                       4/5     CrashLoopBackOff   25 (3m9s ago)   107m

$ oc logs --timestamps -n clusters-$HC_NAME --kubeconfig $MGMT_KUBECONFIG -c kube-apiserver kube-apiserver-5d75b5b848-gk2t8
...
2024-03-18T09:11:14.836540684Z I0318 09:11:14.836495       1 dynamic_cafile_content.go:119] "Loaded a new CA Bundle and Verifier" name="client-ca-bundle::/etc/kubernetes/certs/client-ca/ca.crt"
2024-03-18T09:11:14.837725839Z E0318 09:11:14.837695       1 run.go:74] "command failed" err="jwt[0].issuer.certificateAuthority: Invalid value: \"<omitted>\": data does not contain any valid RSA or ECDSA certificates"

Actual results:

5. New kube-apiserver pod is CrashLoopBackOff.

`oc explain` for issuerCertificateAuthority says the configmap data should use ca-bundle.crt. But I also tried to use ca.crt in configmap's data, got same result.

Expected results:

6. No CrashLoopBackOff.

Additional info:
Below is my RCA for the CrashLoopBackOff kube-apiserver pod:
Check if it is valid RSA certificate, it is valid:

$ openssl x509 -noout -text -in router-ca/ca.crt | grep -i rsa
        Signature Algorithm: sha256WithRSAEncryption
            Public Key Algorithm: rsaEncryption
    Signature Algorithm: sha256WithRSAEncryption

So, the CA certificate has no issue.
Above pod logs show "/etc/kubernetes/certs/oidc-ca/ca.crt" is used. Double checked the configmap:

$ oc get cm auth-config -n clusters-$HC_NAME --kubeconfig $MGMT_KUBECONFIG -o jsonpath='{.data.auth\.json}' | jq | ~/auto/json2yaml.sh
---
kind: AuthenticationConfiguration
apiVersion: apiserver.config.k8s.io/v1alpha1
jwt:
- issuer:
    url: https://keycloak-keycloak.apps..../realms/master
    certificateAuthority: "/etc/kubernetes/certs/oidc-ca/ca.crt"
...

Then debug the CrashLoopBackOff pod:

The used path /etc/kubernetes/certs/oidc-ca/ca.crt does not exist! The correct path should be /etc/kubernetes/certs/oidc-ca/ca-bundle.crt:

$ oc debug -n clusters-$HC_NAME --kubeconfig $MGMT_KUBECONFIG -c kube-apiserver kube-apiserver-5d75b5b848-gk2t8
Starting pod/kube-apiserver-5d75b5b848-gk2t8-debug-kpmlf, command was: hyperkube kube-apiserver --openshift-config=/etc/kubernetes/config/config.json -v2 --encryption-provider-config=/etc/kubernetes/secret-encryption/config.yaml
sh-5.1$ cat /etc/kubernetes/certs/oidc-ca/ca.crt
cat: /etc/kubernetes/certs/oidc-ca/ca.crt: No such file or directory
sh-5.1$ ls /etc/kubernetes/certs/oidc-ca/
ca-bundle.crt
sh-5.1$ cat /etc/kubernetes/certs/oidc-ca/ca-bundle.crt
-----BEGIN CERTIFICATE-----
MIIDPDCCAiSgAwIBAgIIM3E0ckpP750wDQYJKoZIhvcNAQELBQAwJjESMBAGA1UE
...

https://github.com/openshift/hypershift/pull/3783

Bug OCPBUGS-32029: unable to logout when logged in as kubeadmin

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31843~~. The following is the description of the original issue:
—
Description of problem:

'kubeadmin' user unable to logout when logged with 'kube:admin' IDP, clicking on 'Log out' does nothing

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-04-06-020637

How reproducible:

Always

Steps to Reproduce:

1. Login to console with 'kube:admin' IDP, type username 'kubeadmin' and its password
2. Try to Log out from console

Actual results:

2. unable to log out successfully

Expected results:

2. any user should be able to log out successfully

Additional info:

https://github.com/openshift/console/pull/13748

Bug OCPBUGS-41977: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-powervs/pull/85

Bug OCPBUGS-25985: when baselinecapabiliity set is set to None, still see SA with name `deployer-controller` being present in the cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24190~~. The following is the description of the original issue:
—
When baselineCapabilitySet is set to None, still see an SA with name `deployer-controller` in the cluster.

steps to Reproduce:

=================

1. Install 4.15 cluster with baselineCapabilitySet to None

2. Run command `oc get sa -A | grep deployer`

Actual Results:

================

[knarra@knarra openshift-tests-private]$ oc get sa -A | grep deployer
openshift-infra deployer-controller 0 63m

Expected Results:

==================

No SA related to deployer should be returned

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/322

Bug OCPBUGS-27149: hosted-cluster-config-operator-manager should throttle creation attempts

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23228~~. The following is the description of the original issue:
—

Description of problem:

Release controller > 4.14.2 > HyperShift conformance run > gathered assets:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/1722648207965556736/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-e8d2a8003773eacb6a8b/core/pods/logs/kube-apiserver-5f47c7b667-42h2f-audit-logs.log | grep -v '/var/log/kube-apiserver/audit.log. has' | jq -r 'select(.user.username == "system:admin" and .verb == "create" and .requestURI == "/apis/operator.openshift.io/v1/storages") | .userAgent' | sort | uniq -c
     65 hosted-cluster-config-operator-manager
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/1722648207965556736/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-e8d2a8003773eacb6a8b/core/pods/logs/kube-apiserver-5f47c7b667-42h2f-audit-logs.log | grep -v '/var/log/kube-apiserver/audit.log. has' | jq -r 'select(.user.username == "system:admin" and .verb == "create" and .requestURI == "/apis/operator.openshift.io/v1/storages") | .requestReceivedTimestamp + " " + (.responseStatus | (.code | tostring) + " " + .reason)' | head -n5
2023-11-09T17:17:15.130454Z 409 AlreadyExists
2023-11-09T17:17:15.163256Z 409 AlreadyExists
2023-11-09T17:17:15.198908Z 409 AlreadyExists
2023-11-09T17:17:15.230532Z 409 AlreadyExists
2023-11-09T17:17:22.899579Z 409 AlreadyExists

That's banging away pretty hard with creation attempts that keep getting 409ed, presumably because an earlier creation attempt succeeded. If the controller needs very quick latency in re-creation, perhaps an informing watch? If the controller can handle some re-creation latency, perhaps a quieter poll?

Version-Release number of selected component (if applicable):

4.14.2. I haven't checked other releases.

How reproducible:

Likely 100%. I saw similar behavior in an unrelated dump, and confirmed the busy 409s in the first CI run I checked.

Steps to Reproduce:

1. Dump a hosted cluster.
2. Inspect its audit logs for hosted-cluster-config-operator-manager create activity.

Actual results:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/1722648207965556736/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-e8d2a8003773eacb6a8b/core/pods/logs/kube-apiserver-5f47c7b667-42h2f-audit-logs.log | grep -v '/var/log/kube-apiserver/audit.log. has' | jq -r 'select(.userAgent == "hosted-cluster-config-operator-manager" and .verb == "create") | .verb + " " + (.responseStatus.code | tostring)' | sort | uniq -c
    130 create 409

Expected results:

Zero or rare 409 creation request from this user-agent.

Additional info:

The user agent seems to be defined here, so likely the fix will involve changes to that manager.

https://github.com/openshift/hypershift/pull/3421

Bug OCPBUGS-20076: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2049

Bug OCPBUGS-23485: eventlet dependency breaks python-dns in RHEL 9.3 rebase

View the Description View the linked PRs

RHEL 9.3 broke at least ironic when it rebased python-dns to 2.3.0

dnspython 2.3.0 raised AttributeError: module 'dns.rdtypes' has no attribute 'ANY' https://github.com/eventlet/eventlet/issues/781

https://github.com/openshift/ironic-image/pull/425

Bug OCPBUGS-24062: network-node-identity does not honor restart annotation

View the linked PRs

https://github.com/openshift/hypershift/pull/3245

Bug OCPBUGS-24203: Metrics: ConsolePlugins must no longer needs to be grouped

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-25227: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/1005

Bug OCPBUGS-26045: ART-8361: Replace genisoimage with xorriso in 4.15

View the Description View the linked PRs

This is duplicate of https://issues.redhat.com/browse/ART-8361 one since on ART bugs we are not able to set `target` so creating the issue here.

https://github.com/openshift/cluster-api-provider-libvirt/pull/272

Bug OCPBUGS-32977: haproxy oom - troubleshoot process

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29690~~. The following is the description of the original issue:
—
Description of problem:

    Router are restarting due to memory issues

Version-Release number of selected component (if applicable):

    OCP 4.12.45

How reproducible:

    not easy

Router restart due to memory issues:
~~~
3h40m       Warning   ProbeError   pod/router-default-56c9f67f66-j8xwn                        Readiness probe error: Get "http://localhost:1936/healthz/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)...
3h40m       Warning   Unhealthy    pod/router-default-56c9f67f66-j8xwn                        Readiness probe failed: Get "http://localhost:1936/healthz/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
3h40m       Warning   ProbeError   pod/router-default-56c9f67f66-j8xwn                        Liveness probe error: Get "http://localhost:1936/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)...
3h40m       Warning   Unhealthy    pod/router-default-56c9f67f66-j8xwn                        Liveness probe failed: Get "http://localhost:1936/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
3h40m       Normal    Killing      pod/router-default-56c9f67f66-j8xwn                        Container router failed liveness probe, will be restarted
3h40m       Warning   ProbeError   pod/router-default-56c9f67f66-j8xwn                        Readiness probe error: HTTP probe failed with statuscode: 500...
3h40m       Warning   Unhealthy    pod/router-default-56c9f67f66-j8xwn                        Readiness probe failed: HTTP probe failed with statuscode: 500
~~~

The node only host the router replica, and from prometheus it can be verified that routers are consumming all the memory in a short period of time ~20G with an hour.

At some point, the number of haproxy are increasing and ending consuming all memory resources leading in a service disruption in a productive environment.

As console is one of the service with highest activity as per router stats, so far customer is deleting the console pod and process decreasing from 45 to 12. 

Customer is willing to have a guidance about how to identify the process that is consuming the memory, haproxy monitoring is enabled but no dashboard available. 

Router stats from when the router has 8g-6g-3g of memory available has been requested.

Additional info:

 Customer is claiming that this is a happening only in OCP 4.12.45, as other active cluster is still in version 4.10.39 and this is not happening. Upgrade is blocked because of this .

Requested action:
* hard-stop-after might be an option but customer expect information about side effects of this configuration.
* How to reset console connection from haproxy?
* Is there any documentation about haproxy prometheus queries?

https://github.com/openshift/router/pull/586

Task MON-3489: Bump downstream Prometheus adapter to v0.11.2

View the linked PRs

https://github.com/openshift/k8s-prometheus-adapter/pull/94

Bug OCPBUGS-19235: Update 4.15 ose-cluster-bootstrap image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-bootstrap/pull/100

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-bootstrap/pull/100

Bug OCPBUGS-22710: Can we view status of an adminbased external route policy, if so then how/where?

View the Description View the linked PRs

Description of problem:

On the prerelease doc Configure a secondary external gateway, on stop 3. we state the output of said command should confirm the admin policy has been created:

#oc describe apbexternalroute <name> | tail -n 6

First of all this is a typo there is no "apbexternalroute", the correct term is "adminpolicybasedexternalroutes", even if we use the correct term, the resulting output is almost not relevant as per the status of said policy, it just reports on the policy it's self and well some minor details like time and so on.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-04-143709

How reproducible:

Every time

Steps to Reproduce:

1. Deploy a cluster
2. Boot up a pod under a namespace
3. $ cat 4.create.abp_static_bar1.yaml  later apply said policy
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
  name: first-policy
spec:
## gateway example
  from:
    namespaceSelector:
      matchLabels:
          kubernetes.io/metadata.name: bar
  nextHops:       
    static:
      - ip: "173.20.0.8"
      - ip: "173.20.0.9"
4. confirm policy in place: $ oc getadminpolicybasedexternalroutes.k8s.ovn.org 
NAME           LAST UPDATE   STATUS
first-policy   

5. But wow do we test the policies status? 
The doc's guide doesn't help much:  $ oc describeadminpolicybasedexternalroutes.k8s.ovn.org <name> | tail -n 6 

$ oc describe adminpolicybasedexternalroutes.k8s.ovn.org first-policy 
Name:         first-policy
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  k8s.ovn.org/v1
Kind:         AdminPolicyBasedExternalRoute
Metadata:
  Creation Timestamp:  2023-10-30T20:09:20Z
  Generation:          1
  Resource Version:    10904672
  UID:                 3c4a60da-a618-45b1-94a8-2085dcdc5631
Spec:
  From:
    Namespace Selector:
      Match Labels:
        kubernetes.io/metadata.name:  bar
  Next Hops:
    Static:
      Bfd Enabled:  false
      Ip:           173.20.0.8
      Bfd Enabled:  false
      Ip:           173.20.0.9
Events:             <none>
 

Noting regarding policy status shows up, if this is even supported at all, other than fixing the doc, if there is a way to view the status it should be documented. One more thing if there is indeed a policy status shouldn't it also populate the status column here:

$ oc get adminpolicybasedexternalroutes.k8s.ovn.org 
NAME           LAST UPDATE   STATUS
first-policy                   ^ 

Asking as on another bug https://issues.redhat.com/browse/OCPBUGS-22706, I recreated a situation where the status should have reported an error yet it never did nor does it update the above table, come to think of it the last update column too has never exposed any data either, in which case why do we even have these two columns to begin with?

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-38787: [4.15] Get hypershift-kubevirt conformance tests on Azure pass

View the Description View the linked PRs

It has been shown that running the conformance test suite on hypershift hosted clusters with the kubevirt provider are far more stable than their metal counterparts. In order to get the conformance on Azure passing, we need to skip a single test that sends ping (ICMP) to the Internet, as azure is blocking ICMP.

https://github.com/openshift/origin/pull/28994

Bug OCPBUGS-18954: Many SNOs failed to complete install because "the cluster operator cluster-autoscaler is not available"

View the Description View the linked PRs

Description of problem:

While installing 3618 SNOs via ZTP using ACM 2.9, 15 clusters failed to complete install and have failed on the cluster-autoscaler operator. This represents the bulk of all cluster install failures in this testbed for OCP 4.14.0-rc.0.


# cat aci.InstallationFailed.autoscaler  | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers "
vm00527 version         False   True   20h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00717 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00881 version         False   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00998 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01006 version         False   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01059 version         False   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01155 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01930 version         False   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm02407 version         False   True   16h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm02651 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03073 version         False   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03258 version         False   True   20h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03295 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03303 version         False   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03517 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available

Version-Release number of selected component (if applicable):

Hub 4.13.11
Deployed SNOs 4.14.0-rc.0
ACM 2.9 - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

15 out of 20 failures (75% of the failures)
15 out of 3618 total attempted SNOs to be installed ~.4% of all installs

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

It appears that some show in the logs of the cluster-autoscaler-operator an error, Example:

I0912 19:54:39.962897       1 main.go:15] Go Version: go1.20.5 X:strictfipsruntime
I0912 19:54:39.962977       1 main.go:16] Go OS/Arch: linux/amd64
I0912 19:54:39.962982       1 main.go:17] Version: cluster-autoscaler-operator v4.14.0-202308301903.p0.gb57f5a9.assembly.stream-dirty
I0912 19:54:39.963137       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
I0912 19:54:39.975478       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"="127.0.0.1:9191"
I0912 19:54:39.976939       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-clusterautoscalers"
I0912 19:54:39.976984       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-machineautoscalers"
I0912 19:54:39.977082       1 main.go:41] Starting cluster-autoscaler-operator
I0912 19:54:39.977216       1 server.go:216] controller-runtime/webhook/webhooks "msg"="Starting webhook server" 
I0912 19:54:39.977693       1 certwatcher.go:161] controller-runtime/certwatcher "msg"="Updated current TLS certificate" 
I0912 19:54:39.977813       1 server.go:273] controller-runtime/webhook "msg"="Serving webhook server" "host"="" "port"=8443
I0912 19:54:39.977938       1 certwatcher.go:115] controller-runtime/certwatcher "msg"="Starting certificate watcher" 
I0912 19:54:39.978008       1 server.go:50]  "msg"="starting server" "addr"={"IP":"127.0.0.1","Port":9191,"Zone":""} "kind"="metrics" "path"="/metrics"
I0912 19:54:39.978052       1 leaderelection.go:245] attempting to acquire leader lease openshift-machine-api/cluster-autoscaler-operator-leader...
I0912 19:54:39.982052       1 leaderelection.go:255] successfully acquired lease openshift-machine-api/cluster-autoscaler-operator-leader
I0912 19:54:39.983412       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.ClusterAutoscaler"
I0912 19:54:39.983462       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.Deployment"
I0912 19:54:39.983483       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.Service"
I0912 19:54:39.983501       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.ServiceMonitor"
I0912 19:54:39.983520       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.PrometheusRule"
I0912 19:54:39.983532       1 controller.go:185]  "msg"="Starting Controller" "controller"="cluster_autoscaler_controller"
I0912 19:54:39.986041       1 controller.go:177]  "msg"="Starting EventSource" "controller"="machine_autoscaler_controller" "source"="kind source: *v1beta1.MachineAutoscaler"
I0912 19:54:39.986065       1 controller.go:177]  "msg"="Starting EventSource" "controller"="machine_autoscaler_controller" "source"="kind source: *unstructured.Unstructured"
I0912 19:54:39.986072       1 controller.go:185]  "msg"="Starting Controller" "controller"="machine_autoscaler_controller"
I0912 19:54:40.095808       1 webhookconfig.go:72] Webhook configuration status: created
I0912 19:54:40.101613       1 controller.go:219]  "msg"="Starting workers" "controller"="cluster_autoscaler_controller" "worker count"=1
I0912 19:54:40.102857       1 controller.go:219]  "msg"="Starting workers" "controller"="machine_autoscaler_controller" "worker count"=1
E0912 19:58:48.113290       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": net/http: TLS handshake timeout - error from a previous attempt: unexpected EOF
E0912 20:02:48.135610       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp [fd02::1]:443: connect: connection refused
E0913 13:49:02.118757       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp [fd02::1]:443: connect: connection refused

https://github.com/openshift/cluster-autoscaler-operator/pull/285

Bug OCPBUGS-20364: [azure] missing instance type validation check under defaultMachinePlatform

View the Description View the linked PRs

Description of problem:

There is no instance type validation check under defaultMachinePlatform.
For example, set platform.azure.defaultMachinePlatform.type to Standard_D11_v2, which does not support PremiumIO, then create manifests:
 
# az vm list-skus --location southcentralus --size Standard_D11_v2 --query "[].capabilities[?name=='PremiumIO'].value" -otsv
False

install-config.yaml:
-------------------
platform:
  azure:
    defaultMachinePlatform:
      type: Standard_D11_v2
    baseDomainResourceGroupName: os4-common
    cloudName: AzurePublicCloud
    outboundType: Loadbalancer
    region: southcentralus

succeeded to create manifests:
$ ./openshift-install create manifests --dir ipi
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
INFO Consuming Install Config from target directory 
INFO Manifests created in: ipi/manifests and ipi/openshift 

while get expected error when setting type under compute:
$ ./openshift-install create manifests --dir ipi
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[0].platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D11_v2

same situation for field vmNetworkingType under defaultMachinePlatform, instance type Standard_B4ms does not support Accelerated networking.
# az vm list-skus --location southcentralus --size Standard_B4ms --query "[].capabilities[?name=='AcceleratedNetworkingEnabled'].value" -otsv
False

install-config.yaml
----------------
platform:
  azure:
    defaultMachinePlatform:
      type: Standard_B4ms
      vmNetworkingType: "Accelerated" 

install still succeeds to create manifests file, should exit with error when type and vmNetworkingType setting under compute.
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[0].platform.azure.vmNetworkingType: Invalid value: "Accelerated": vm networking type is not supported for instance type Standard_B4ms

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-220853

How reproducible:

always on all supported version

Steps to Reproduce:

1. configure invalid instance type ( e.g unsupported PremiumIO) under defaultMachinePlatform in install-config.yaml
2. create manifests
3.

Actual results:

installer creates manifests successfully.

Expected results:

installer should exit with error, and have similar behavior when invalid instance type is configured under compute and controlPlane.

Additional info:

https://github.com/openshift/installer/pull/7584

Bug OCPBUGS-23178: cloud-credential-operator cannot add new grants to deleted gcp role

View the Description View the linked PRs

Description of problem:

   The GCP Mint mode sync is failing when attempting to add permissions to a previously deleted custom role.

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1. Create a gcp cluster in mint mode (with a CCO credentialRequests that has permissions defined)
    2. Delete the openshift-hive-dev-cloud-credential-operator-gcp-ro-creds custom role from GCP
    3. oc -n openshift-cloud-credential-operator delete secret cloud-credential-operator-gcp-ro-creds

Actual results:

    Receive the following error when attempting to add permissions to the deleted custom role: "cloud-credential-operator cannot add new grants to deleted gcp role"

Expected results:

    The new permissions should be added to the role without issue.

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/637

Bug OCPBUGS-38512: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-17877: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4020

Bug OCPBUGS-23170: vsphere techpreview installs are failing

View the Description View the linked PRs

https://github.com/openshift/installer/pull/7418 broke techpreview installs https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-11-09-094429

https://github.com/openshift/installer/pull/7708

Bug OCPBUGS-27113: Console blips Available=False with RouteHealth_FailedGet and such

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24041~~. The following is the description of the original issue:
—

Description

Seen in 4.15-related update CI:

$ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/console.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False[^:]*: \(.*\)|\1 \2 \3|' | sed 's|[.]apps[.][^ /]*|.apps...|g' | sort | uniq -c | sort -n
      1 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... dial tcp 52.158.160.194:443: connect: connection refused
      1 console RouteHealth_StatusError route not yet available, https://console-openshift-console.apps... returns '503 Service Unavailable'
      2 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... dial tcp: lookup console-openshift-console.apps... on 172.30.0.10:53: no such host
      2 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... EOF
      8 console RouteHealth_RouteNotAdmitted console route is not admitted
     16 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... context deadline exceeded (Client.Timeout exceeded while awaiting headers)

For example this 4.14 to 4.15 run had:

: [bz-Management Console] clusteroperator/console should not change condition/Available 
Run #0: Failed 	1h25m23s
{  1 unexpected clusteroperator state transitions during e2e test run 

Nov 28 03:42:41.207 - 1s    E clusteroperator/console condition/Available reason/RouteHealth_FailedGet status/False RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.ci-op-d2qsp1gp-2a31d.aws-2.ci.openshift.org): Get "https://console-openshift-console.apps.ci-op-d2qsp1gp-2a31d.aws-2.ci.openshift.org": context deadline exceeded (Client.Timeout exceeded while awaiting headers)}

While a timeout for console Route isn't fantastic, an issue that only persists for 1s is not long enough to warrant immediate admin intervention. Teaching the console operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required.

Version-Release number of selected component

At least 4.15. Possibly other versions; I haven't checked.

.h2 How reproducible

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/console.*condition/Available.*status/False' | grep 'periodic.*failures match' | sort
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 12 runs, 17% failed, 50% of failures match = 8% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-ppc64le (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 12 runs, 17% failed, 100% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 12 runs, 25% failed, 33% of failures match = 8% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 23% failed, 28% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 28% failed, 23% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade (all) - 63 runs, 38% failed, 8% of failures match = 3% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-azure-sdn-upgrade (all) - 60 runs, 73% failed, 11% of failures match = 8% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 70 runs, 7% failed, 20% of failures match = 1% impact

Seems like it's primarily minor-version updates that trip this, and in jobs with high run counts, the impact percentage is single-digits.

Steps to reproduce

There may be a way to reliable trigger these hiccups, but as a reproducer floor, running days of CI and checking to see whether impact percentages decrease would be a good way to test fixes post-merge.

Actual results

Lots of console ClusterOperator going Available=False blips in 4.15 update CI.

Expected results

Console goes Available=False if and only if immediate admin intervention is appropriate.

https://github.com/openshift/console-operator/pull/835

Bug OCPBUGS-30922: [release-4.15] coreos-installer iso kargs show broken on Agent ISO

View the Description View the linked PRs

Running the command coreos-installer iso kargs show no longer works with the 4.13 Agent ISO. Instead we get this error:

$ coreos-installer iso kargs show agent.x86_64.iso
Writing manifest to image destination
Storing signatures
Error: No karg embed areas found; old or corrupted CoreOS ISO image.

This is almost certainly due to the way we repack the ISO as part of embedding the agent-tui binary in it.

It worked fine in 4.12. I have tested both with every version of coreos-installer from 0.14 to 0.17

https://github.com/openshift/installer/pull/8163

Bug OCPBUGS-24154: Update 4.15 ose-cluster-machine-approver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-machine-approver/pull/217

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-machine-approver/pull/217

Bug OCPBUGS-31033: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2098

Bug OCPBUGS-42930: Continuous pull-secret updates / slow initialization on build01 (test platform infrastructure)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42420~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42362~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42106. The following is the description of the original issue:
—
Description of problem:

Test Platform has detected a large increase in the amount of time spent waiting for pull secrets to be initialized.
Monitoring the audit log, we can see nearly continuous updates to the SA pull secrets in the cluster (~2 per minute for every SA pull secret in the cluster).

Controller manager is filled with entries like: 
- "Internal registry pull secret auth data does not contain the correct number of entries" ns="ci-op-tpd3xnbx" name="deployer-dockercfg-p9j54" expected=5 actual=4"
- "Observed image registry urls" urls=["172.30.228.83:5000","image-registry.openshift-image-registry.svc.cluster.local:5000","image-registry.openshift-image-registry.svc:5000","registry.build01.ci.openshift.org","registry.build01.ci.openshift.org"

In this "Observed image registry urls" log line, notice the duplicate entries for "registry.build01.ci.openshift.org" . We are not sure what is causing this but it leads to duplicate entry, but when actualized in a pull secret map, the double entry is reduced to one. So the controller-manager finds the cardinality mismatch on the next check.

The duplication is evident in OpenShiftControllerManager/cluster:
      dockerPullSecret:
        internalRegistryHostname: image-registry.openshift-image-registry.svc:5000
        registryURLs:
        - registry.build01.ci.openshift.org
        - registry.build01.ci.openshift.org


But there is only one hostname in config.imageregistry.operator.openshift.io/cluster:
  routes:
  - hostname: registry.build01.ci.openshift.org
    name: public-routes
    secretName: public-route-tls

Version-Release number of selected component (if applicable):

4.17.0-rc.3

How reproducible:

Constant on build01 but not on other build farms

Steps to Reproduce:

    1. Something ends up creating duplicate entries in the observed configuration of the openshift-controller-manager.
    2.
    3.

Actual results:

- Approximately 400K secret patches an hour on build01 vs ~40K on other build farms. Intialization times have increased by two orders of magnitude in new ci-operator namespaces.    
- The openshift-controller-manager is hot looping and experiencing client throttling.

Expected results:

1. Initialization of pull secrets in a namespace should take < 1 seconds. On build01, it can take over 1.5 minutes.
2. openshift-controller-manager should not possess duplicate entries.
3. If duplicate entries are a configuration error, openshift-controller-manager should de-dupe the entries.
4. There should be alerting when the openshift-controller-manager experiences client-side throttling / pathological behavior.

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1136

Bug OCPBUGS-10423: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3141

Bug OCPBUGS-11179: Network operator should be compliant with CIS benchmark rule

View the Description View the linked PRs

Description of problem:

Network operator is not compliant with CIS benchmark rule "Ensure Usage of Unique Service Accounts" [1] as part of "ocp4-cis" profile used in compliance operator [2]. Observed that network operator is using the default service account where default SA comes into play if there is no other service account specified. OpenShift core operators should be compliant with the CIS benchmark, i.e. the operators should run with their own serviceaccount rather than using the "default" one.

Raised similar bug for machine-config operator.

[1] https://static.open-scap.org/ssg-guides/ssg-ocp4-guide-cis.html#xccdf_org.ssgproject.content_group_accounts [2] https://docs.openshift.com/container-platform/4.11/security/compliance_operator/compliance-operator-supported-profiles.html

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Network operator using default SA

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2084

Bug OCPBUGS-19268: Update 4.15 ose-cluster-ingress-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-ingress-operator/pull/977

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-ingress-operator/pull/977

Bug OCPBUGS-21730: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/145

Bug OCPBUGS-21641: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/251

Bug OCPBUGS-20104: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2051

Bug OCPBUGS-21729: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1567

Bug OCPBUGS-27227: Dynamic irq load balancing issues (4.15)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25699~~. The following is the description of the original issue:
—
Description of problem:

If GloballyDisableIrqLoadBalancing in disabled in the performance profile then irqs should be balanced across all cpus minus the cpus that are explicitly removed by crio via the pod annotation irq-load-balancing.crio.io: "disable"

We have found a number of issues with this:

1) The script clear-irqbalance-banned-cpus.sh is setting an empty value for IRQBALANCE_BANNED_CPUS in /etc/sysconfig/irqbalance. If no value is provided, irqbalance will calculate a default. The default will exclude all isolated and nohz_full cpus from the mask resulting in the irq’s being balanced over the reserved cpus only, breaking the user intent.
If a guaranteed pod with the irq-load-balancing.crio.io: "disable” annotation gets launched then irqbalance will heal the system but if one never does then all irqs will be affined to the reserved cores.
This script needs to set the banned mask to 0’s on startup.

2) The more serious issue, the scheduler plugin in tuned will attempt to affine all irqs to the non-isolated cores. Isolated here means non-reserved, not truly isolated cores. This is directly at odds with the user intent. So now we have tuned fighting with crio/irqbalance both trying to do different things.

Scenarios
- If a pod get’s launched with the annotation after tuned has started, runtime or after a reboot - ok
- On a reboot if tuned recovers after the guaranteed pod has been launched - broken
- If tuned restarts at runtime for any reason - broken

3) Lastly the crio restore of the irqbalance mask needs to be removed. Disabling this should be part of the crio conf that is installed by the NTO.

Version-Release number of selected component (if applicable):

   4.14 and likely earlier

How reproducible:

    See description

Steps to Reproduce:

    1.See description 
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/953

Bug OCPBUGS-46149: [release-4.15] IPsec state not cleaned up on the cluster

View the Description View the linked PRs

This is a clone of issue OCPBUGS-33656. The following is the description of the original issue:
—
While running IPsec e2e tests in the CI, the data plane traffic is not flowing with desired traffic type esp or udp. For example, ipsec mode external, the traffic type seems to seen as esp for EW traffic, but it's supposed to be geneve (udp) taffic.

Example CI run: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/50687/rehearse-50687-pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-ipsec-serial/1789527351734833152

This issue was reproducible on a local cluster after many attempts and noticed ipsec states are not cleanup on the node which is a residue from previous test run with ipsec full mode.

[peri@sdn-09 origin]$ kubectl get networks.operator.openshift.io cluster -o yaml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
creationTimestamp: "2024-05-13T18:55:57Z"
generation: 1362
name: cluster
resourceVersion: "593827"
uid: 10f804c9-da46-41ee-91d5-37aff920bee4
spec:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
defaultNetwork:
ovnKubernetesConfig:
egressIPConfig: {}
gatewayConfig:
ipv4: {}
ipv6: {}
routingViaHost: false
genevePort: 6081
ipsecConfig:
mode: External
mtu: 1400
policyAuditConfig:
destination: "null"
maxFileSize: 50
maxLogFiles: 5
rateLimit: 20
syslogFacility: local0
type: OVNKubernetes
deployKubeProxy: false
disableMultiNetwork: false
disableNetworkDiagnostics: false
logLevel: Normal
managementState: Managed
observedConfig: null
operatorLogLevel: Normal
serviceNetwork:
- 172.30.0.0/16
unsupportedConfigOverrides: null
useMultiNetworkPolicy: false
status:
conditions:
- lastTransitionTime: "2024-05-13T18:55:57Z"
status: "False"
type: ManagementStateDegraded
- lastTransitionTime: "2024-05-14T10:13:12Z"
status: "False"
type: Degraded
- lastTransitionTime: "2024-05-13T18:55:57Z"
status: "True"
type: Upgradeable
- lastTransitionTime: "2024-05-14T11:50:26Z"
status: "False"
type: Progressing
- lastTransitionTime: "2024-05-13T18:57:13Z"
status: "True"
type: Available
readyReplicas: 0
version: 4.16.0-0.nightly-2024-05-08-222442
[peri@sdn-09 origin]$ oc debug node/worker-0
Starting pod/worker-0-debug-k6nlm ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.111.23
If you don't see a command prompt, try pressing enter.
sh-5.1# chroot /host
sh-5.1# toolbox
Checking if there is a newer version of registry.redhat.io/rhel9/support-tools available...
Container 'toolbox-root' already exists. Trying to start...
(To remove the container and start with a fresh toolbox, run: sudo podman rm 'toolbox-root')
toolbox-root
Container started successfully. To exit, type 'exit'.
[root@worker-0 /]# tcpdump -i enp2s0 -c 1 -v --direction=out esp and src 192.168.111.23 and dst 192.168.111.24
dropped privs to tcpdump
tcpdump: listening on enp2s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:07:01.854214 IP (tos 0x0, ttl 64, id 20451, offset 0, flags [DF], proto ESP (50), length 152)
worker-0 > worker-1: ESP(spi=0x52cc9c8d,seq=0xe1c5c), length 132
1 packet captured
6 packets received by filter
0 packets dropped by kernel
[root@worker-0 /]# exit
exit

sh-5.1# ipsec whack --trafficstatus
006 #20: "ovn-1184d9-0-in-1", type=ESP, add_time=1715687134, inBytes=206148172, outBytes=0, maxBytes=2^63B, id='@1184d960-3211-45c4-a482-d7b6fe995446'
006 #19: "ovn-1184d9-0-out-1", type=ESP, add_time=1715687112, inBytes=0, outBytes=40269835, maxBytes=2^63B, id='@1184d960-3211-45c4-a482-d7b6fe995446'
006 #27: "ovn-185198-0-in-1", type=ESP, add_time=1715687419, inBytes=71406656, outBytes=0, maxBytes=2^63B, id='@185198f6-7dde-4e9b-b2aa-52439d2beef5'
006 #26: "ovn-185198-0-out-1", type=ESP, add_time=1715687401, inBytes=0, outBytes=17201159, maxBytes=2^63B, id='@185198f6-7dde-4e9b-b2aa-52439d2beef5'
006 #14: "ovn-922aca-0-in-1", type=ESP, add_time=1715687004, inBytes=116384250, outBytes=0, maxBytes=2^63B, id='@922aca42-b893-496e-bb9b-0310884f4cc1'
006 #13: "ovn-922aca-0-out-1", type=ESP, add_time=1715686986, inBytes=0, outBytes=986900228, maxBytes=2^63B, id='@922aca42-b893-496e-bb9b-0310884f4cc1'
006 #6: "ovn-f72f26-0-in-1", type=ESP, add_time=1715686855, inBytes=115781441, outBytes=98, maxBytes=2^63B, id='@f72f2622-e7dc-414e-8369-6013752ea15b'
006 #5: "ovn-f72f26-0-out-1", type=ESP, add_time=1715686833, inBytes=9320, outBytes=29002449, maxBytes=2^63B, id='@f72f2622-e7dc-414e-8369-6013752ea15b'
sh-5.1# ip xfrm state; echo ' '; ip xfrm policy
src 192.168.111.21 dst 192.168.111.23
proto esp spi 0x7f7ddcf5 reqid 16413 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x6158d9a0f4a28598500e15f81a40ef715502b37ecf979feb11bbc488479c8804598011ee 128
lastused 2024-05-14 16:07:11
anti-replay esn context:
seq-hi 0x0, seq 0x18564, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
ffffffff ffffffff ffffffff ffffffff
sel src 192.168.111.21/32 dst 192.168.111.23/32 proto udp dport 6081
src 192.168.111.23 dst 192.168.111.21
proto esp spi 0xda57e42e reqid 16413 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x810bebecef77951ae8bb9a46cf53a348a24266df8b57bf2c88d4f23244eb3875e88cc796 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.23/32 dst 192.168.111.21/32 proto udp sport 6081
src 192.168.111.21 dst 192.168.111.23
proto esp spi 0xf84f2fcf reqid 16417 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x0f242efb072699a0f061d4c941d1bb9d4eb7357b136db85a0165c3b3979e27b00ff20ac7 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.21/32 dst 192.168.111.23/32 proto udp sport 6081
src 192.168.111.23 dst 192.168.111.21
proto esp spi 0x9523c6ca reqid 16417 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0xe075d39b6e53c033f5225f8be48efe537c3ba605cee2f5f5f3bb1cf16b6c53182ecf35f7 128
lastused 2024-05-14 16:07:11
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x10fb2
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.23/32 dst 192.168.111.21/32 proto udp dport 6081
src 192.168.111.20 dst 192.168.111.23
proto esp spi 0x459d8516 reqid 16397 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0xee778e6db2ce83fa24da3b18e028451bbfcf4259513bca21db832c3023e238a6b55fdacc 128
lastused 2024-05-14 16:07:13
anti-replay esn context:
seq-hi 0x0, seq 0x3ec45, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
ffffffff ffffffff ffffffff ffffffff
sel src 192.168.111.20/32 dst 192.168.111.23/32 proto udp dport 6081
src 192.168.111.23 dst 192.168.111.20
proto esp spi 0x3142f53a reqid 16397 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x6238fea6dffdd36cbb909f6aab48425ba6e38f9d32edfa0c1e0fc6af8d4e3a5c11b5dfd1 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.23/32 dst 192.168.111.20/32 proto udp sport 6081
src 192.168.111.20 dst 192.168.111.23
proto esp spi 0xeda1ccb9 reqid 16401 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0xef84a90993bd71df9c97db940803ad31c6f7d2e72a367a1ec55b4798879818a6341c38b6 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.20/32 dst 192.168.111.23/32 proto udp sport 6081
src 192.168.111.23 dst 192.168.111.20
proto esp spi 0x02c3c0dd reqid 16401 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x858ab7326e54b6d888825118724de5f0c0ad772be2b39133c272920c2cceb2f716d02754 128
lastused 2024-05-14 16:07:13
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x26f8e
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.23/32 dst 192.168.111.20/32 proto udp dport 6081
src 192.168.111.24 dst 192.168.111.23
proto esp spi 0xc9535b47 reqid 16405 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0xd7a83ff4bd6e7704562c597810d509c3cdd4e208daabf2ec074d109748fd1647ab2eff9d 128
lastused 2024-05-14 16:07:14
anti-replay esn context:
seq-hi 0x0, seq 0x53d4c, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
ffffffff ffffffff ffffffff ffffffff
sel src 192.168.111.24/32 dst 192.168.111.23/32 proto udp dport 6081
src 192.168.111.23 dst 192.168.111.24
proto esp spi 0xb66203c8 reqid 16405 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0xc207001a7f1ed7f114b3e327308ddbddc36de5272a11fe0661d03eaecc84b6761c7ec9c4 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.23/32 dst 192.168.111.24/32 proto udp sport 6081
src 192.168.111.24 dst 192.168.111.23
proto esp spi 0x2e4d4deb reqid 16409 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x91e399d83aa1c2626424b502d4b8dae07d4a170f7ef39f8d1baca8e92b8a1dee210e2502 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.24/32 dst 192.168.111.23/32 proto udp sport 6081
src 192.168.111.23 dst 192.168.111.24
proto esp spi 0x52cc9c8d reqid 16409 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0xb605451f32f5dd7a113cae16e6f1509270c286d67265da2ad14634abccf6c90f907e5c00 128
lastused 2024-05-14 16:07:14
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0xe2735
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.23/32 dst 192.168.111.24/32 proto udp dport 6081
src 192.168.111.22 dst 192.168.111.23
proto esp spi 0x973119c3 reqid 16389 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x87d13e67b948454671fb8463ec0cd4d9c38e5e2dd7f97cbb8f88b50d4965fb1f21b36199 128
lastused 2024-05-14 16:07:14
anti-replay esn context:
seq-hi 0x0, seq 0x2af9a, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
ffffffff ffffffff ffffffff ffffffff
sel src 192.168.111.22/32 dst 192.168.111.23/32 proto udp dport 6081
src 192.168.111.23 dst 192.168.111.22
proto esp spi 0x4c3580ff reqid 16389 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x2c09750f51e86d60647a60e15606f8b312036639f8de2d7e49e733cda105b920baade029 128
lastused 2024-05-14 14:36:43
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x1
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.23/32 dst 192.168.111.22/32 proto udp sport 6081
src 192.168.111.22 dst 192.168.111.23
proto esp spi 0xa3e469dc reqid 16393 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x1d5c5c232e6fd4b72f3dad68e8a4d523cbd297f463c53602fad429d12c0211d97ae26f47 128
lastused 2024-05-14 14:18:42
anti-replay esn context:
seq-hi 0x0, seq 0xb, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
00000000 00000000 00000000 000007ff
sel src 192.168.111.22/32 dst 192.168.111.23/32 proto udp sport 6081
src 192.168.111.23 dst 192.168.111.22
proto esp spi 0xdee8476f reqid 16393 mode transport
replay-window 0 flag esn
aead rfc4106(gcm(aes)) 0x5895025ce5b192a7854091841c73c8e29e7e302f61becfa3feb44d071ac5c64ce54f5083 128
lastused 2024-05-14 16:07:14
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x1f1a3
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
sel src 192.168.111.23/32 dst 192.168.111.22/32 proto udp dport 6081

src 192.168.111.23/32 dst 192.168.111.21/32 proto udp sport 6081
dir out priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16413 mode transport
src 192.168.111.21/32 dst 192.168.111.23/32 proto udp dport 6081
dir in priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16413 mode transport
src 192.168.111.23/32 dst 192.168.111.21/32 proto udp dport 6081
dir out priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16417 mode transport
src 192.168.111.21/32 dst 192.168.111.23/32 proto udp sport 6081
dir in priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16417 mode transport
src 192.168.111.23/32 dst 192.168.111.20/32 proto udp sport 6081
dir out priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16397 mode transport
src 192.168.111.20/32 dst 192.168.111.23/32 proto udp dport 6081
dir in priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16397 mode transport
src 192.168.111.23/32 dst 192.168.111.20/32 proto udp dport 6081
dir out priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16401 mode transport
src 192.168.111.20/32 dst 192.168.111.23/32 proto udp sport 6081
dir in priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16401 mode transport
src 192.168.111.23/32 dst 192.168.111.24/32 proto udp sport 6081
dir out priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16405 mode transport
src 192.168.111.24/32 dst 192.168.111.23/32 proto udp dport 6081
dir in priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16405 mode transport
src 192.168.111.23/32 dst 192.168.111.24/32 proto udp dport 6081
dir out priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16409 mode transport
src 192.168.111.24/32 dst 192.168.111.23/32 proto udp sport 6081
dir in priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16409 mode transport
src 192.168.111.23/32 dst 192.168.111.22/32 proto udp sport 6081
dir out priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16389 mode transport
src 192.168.111.22/32 dst 192.168.111.23/32 proto udp dport 6081
dir in priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16389 mode transport
src 192.168.111.23/32 dst 192.168.111.22/32 proto udp dport 6081
dir out priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16393 mode transport
src 192.168.111.22/32 dst 192.168.111.23/32 proto udp sport 6081
dir in priority 1360065 ptype main
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16393 mode transport
src ::/0 dst ::/0
socket out priority 0 ptype main
src ::/0 dst ::/0
socket in priority 0 ptype main
src ::/0 dst ::/0
socket out priority 0 ptype main
src ::/0 dst ::/0
socket in priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0 ptype main
src ::/0 dst ::/0 proto ipv6-icmp type 135
dir out priority 1 ptype main
src ::/0 dst ::/0 proto ipv6-icmp type 135
dir fwd priority 1 ptype main
src ::/0 dst ::/0 proto ipv6-icmp type 135
dir in priority 1 ptype main
src ::/0 dst ::/0 proto ipv6-icmp type 136
dir out priority 1 ptype main
src ::/0 dst ::/0 proto ipv6-icmp type 136
dir fwd priority 1 ptype main
src ::/0 dst ::/0 proto ipv6-icmp type 136
dir in priority 1 ptype main
sh-5.1# cat /etc/ipsec.conf
# /etc/ipsec.conf - Libreswan 4.0 configuration file
#
# see 'man ipsec.conf' and 'man pluto' for more information
#
# For example configurations and documentation, see https://libreswan.org/wiki/

config setup
# If logfile= is unset, syslog is used to send log messages too.
# Note that on busy VPN servers, the amount of logging can trigger
# syslogd (or journald) to rate limit messages.
#logfile=/var/log/pluto.log
#
# Debugging should only be used to find bugs, not configuration issues!
# "base" regular debug, "tmi" is excessive and "private" will log
# sensitive key material (not available in FIPS mode). The "cpu-usage"
# value logs timing information and should not be used with other
# debug options as it will defeat getting accurate timing information.
# Default is "none"
# plutodebug="base"
# plutodebug="tmi"
#plutodebug="none"
#
# Some machines use a DNS resolver on localhost with broken DNSSEC
# support. This can be tested using the command:
# dig +dnssec DNSnameOfRemoteServer
# If that fails but omitting '+dnssec' works, the system's resolver is
# broken and you might need to disable DNSSEC.
# dnssec-enable=no
#
# To enable IKE and IPsec over TCP for VPN server. Requires at least
# Linux 5.7 kernel or a kernel with TCP backport (like RHEL8 4.18.0-291)
# listen-tcp=yes
# To enable IKE and IPsec over TCP for VPN client, also specify
# tcp-remote-port=4500 in the client's conn section.

# if it exists, include system wide crypto-policy defaults
include /etc/crypto-policies/back-ends/libreswan.config

# It is best to add your IPsec connections as separate files
# in /etc/ipsec.d/
include /etc/ipsec.d/*.conf
sh-5.1# cat /etc/ipsec.d/openshift.conf
# Generated by ovs-monitor-ipsec...do not modify by hand!

config setup
uniqueids=yes

conn %default
keyingtries=%forever
type=transport
auto=route
ike=aes_gcm256-sha2_256
esp=aes_gcm256
ikev2=insist

conn ovn-f72f26-0-in-1
left=192.168.111.23
right=192.168.111.22
leftid=@cf36db5c-5c54-4329-9141-b83679b18ecc
rightid=@f72f2622-e7dc-414e-8369-6013752ea15b
leftcert="ovs_certkey_cf36db5c-5c54-4329-9141-b83679b18ecc"
leftrsasigkey=%cert
rightca=%same
leftprotoport=udp/6081
rightprotoport=udp

conn ovn-f72f26-0-out-1
left=192.168.111.23
right=192.168.111.22
leftid=@cf36db5c-5c54-4329-9141-b83679b18ecc
rightid=@f72f2622-e7dc-414e-8369-6013752ea15b
leftcert="ovs_certkey_cf36db5c-5c54-4329-9141-b83679b18ecc"
leftrsasigkey=%cert
rightca=%same
leftprotoport=udp
rightprotoport=udp/6081

conn ovn-1184d9-0-in-1
left=192.168.111.23
right=192.168.111.20
leftid=@cf36db5c-5c54-4329-9141-b83679b18ecc
rightid=@1184d960-3211-45c4-a482-d7b6fe995446
leftcert="ovs_certkey_cf36db5c-5c54-4329-9141-b83679b18ecc"
leftrsasigkey=%cert
rightca=%same
leftprotoport=udp/6081
rightprotoport=udp

conn ovn-1184d9-0-out-1
left=192.168.111.23
right=192.168.111.20
leftid=@cf36db5c-5c54-4329-9141-b83679b18ecc
rightid=@1184d960-3211-45c4-a482-d7b6fe995446
leftcert="ovs_certkey_cf36db5c-5c54-4329-9141-b83679b18ecc"
leftrsasigkey=%cert
rightca=%same
leftprotoport=udp
rightprotoport=udp/6081

conn ovn-922aca-0-in-1
left=192.168.111.23
right=192.168.111.24
leftid=@cf36db5c-5c54-4329-9141-b83679b18ecc
rightid=@922aca42-b893-496e-bb9b-0310884f4cc1
leftcert="ovs_certkey_cf36db5c-5c54-4329-9141-b83679b18ecc"
leftrsasigkey=%cert
rightca=%same
leftprotoport=udp/6081
rightprotoport=udp

conn ovn-922aca-0-out-1
left=192.168.111.23
right=192.168.111.24
leftid=@cf36db5c-5c54-4329-9141-b83679b18ecc
rightid=@922aca42-b893-496e-bb9b-0310884f4cc1
leftcert="ovs_certkey_cf36db5c-5c54-4329-9141-b83679b18ecc"
leftrsasigkey=%cert
rightca=%same
leftprotoport=udp
rightprotoport=udp/6081

conn ovn-185198-0-in-1
left=192.168.111.23
right=192.168.111.21
leftid=@cf36db5c-5c54-4329-9141-b83679b18ecc
rightid=@185198f6-7dde-4e9b-b2aa-52439d2beef5
leftcert="ovs_certkey_cf36db5c-5c54-4329-9141-b83679b18ecc"
leftrsasigkey=%cert
rightca=%same
leftprotoport=udp/6081
rightprotoport=udp

conn ovn-185198-0-out-1
left=192.168.111.23
right=192.168.111.21
leftid=@cf36db5c-5c54-4329-9141-b83679b18ecc
rightid=@185198f6-7dde-4e9b-b2aa-52439d2beef5
leftcert="ovs_certkey_cf36db5c-5c54-4329-9141-b83679b18ecc"
leftrsasigkey=%cert
rightca=%same
leftprotoport=udp
rightprotoport=udp/6081

sh-5.1#

https://github.com/openshift/cluster-network-operator/pull/2596

Bug OCPBUGS-19186: Update 4.15 ose-image-customization-controller image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-customization-controller/pull/99

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-customization-controller/pull/99

Bug OCPBUGS-27947: [4.15] - IPv6 ETP=Local Services broken on LGW

View the Description View the linked PRs

Description of problem:

backport of https://issues.redhat.com//browse/OCPBUGS-27211

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2028

Bug OCPBUGS-19253: Update 4.15 ose-cluster-kube-storage-version-migrator-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/94

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/94

Bug OCPBUGS-20268: [Azure] Unit tests have deadlock condition in termination handler

View the Description View the linked PRs

Description of problem:

Due to the way that the termination handlers unit tests are configured, it is possible in some cases for the counter of http requests to the mock handler can cause the test to deadlock and time out. This happens randomly as the ordering of the tests has an effect on when the bug occurs.

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

It happens randomly when run in CI, or when the full suite is run. But if the tests are focused it will happen every time.
Focusing on "poll URL cannot be reached" will exploit the unit test.

Steps to Reproduce:

1. add `-focus "poll URL cannot be reached"` to unit test ginkgo arguments
2. run `make unit`

Actual results:

test suite hangs after this output:
"Handler Suite when running the handler when polling the termination endpoint and the poll URL cannot be reached should return an error /home/mike/dev/machine-api-provider-aws/pkg/termination/handler_test.go:197"

Expected results:

Tests pass

Additional info:

to fix this we need to isolate the test in its own context block, this patch should do the trick:

diff --git a/pkg/termination/handler_test.go b/pkg/termination/handler_test.go
index 2b98b08b..0f85feae 100644
--- a/pkg/termination/handler_test.go
+++ b/pkg/termination/handler_test.go
@@ -187,7 +187,9 @@ var _ = Describe("Handler Suite", func() {
                                        Consistently(nodeMarkedForDeletion(testNode.Name)).Should(BeFalse())
                                })
                        })
+               })
 
+               Context("when the termination endpoint is not valid", func() {
                        Context("and the poll URL cannot be reached", func() {
                                BeforeEach(func() {
                                        nonReachable := "abc#1://localhost"

https://github.com/openshift/machine-api-provider-azure/pull/77

Task HOSTEDCP-1212: Bump Golang to v1.20

View the Description View the linked PRs

Bump Golang to v1.20

https://github.com/openshift/hypershift/pull/3038

Bug OCPBUGS-20266: [AWS] Unit tests have deadlock condition in termination handler

View the Description View the linked PRs

Description of problem:

Due to the way that the termination handlers unit tests are configured, it is possible in some cases for the counter of http requests to the mock handler can cause the test to deadlock and time out. This happens randomly as the ordering of the tests has an effect on when the bug occurs.

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

It happens randomly when run in CI, or when the full suite is run. But if the tests are focused it will happen every time.
Focusing on "poll URL cannot be reached" will exploit the unit test.

Steps to Reproduce:

1. add `-focus "poll URL cannot be reached"` to unit test ginkgo arguments
2. run `make unit`

Actual results:

test suite hangs after this output:
"Handler Suite when running the handler when polling the termination endpoint and the poll URL cannot be reached should return an error /home/mike/dev/machine-api-provider-aws/pkg/termination/handler_test.go:197"

Expected results:

Tests pass

Additional info:

to fix this we need to isolate the test in its own context block, this patch should do the trick:

diff --git a/pkg/termination/handler_test.go b/pkg/termination/handler_test.go
index 2b98b08b..0f85feae 100644
--- a/pkg/termination/handler_test.go
+++ b/pkg/termination/handler_test.go
@@ -187,7 +187,9 @@ var _ = Describe("Handler Suite", func() {
                                        Consistently(nodeMarkedForDeletion(testNode.Name)).Should(BeFalse())
                                })
                        })
+               })
 
+               Context("when the termination endpoint is not valid", func() {
                        Context("and the poll URL cannot be reached", func() {
                                BeforeEach(func() {
                                        nonReachable := "abc#1://localhost"

https://github.com/openshift/machine-api-provider-aws/pull/84

Bug OCPBUGS-20440: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7582

Bug OCPBUGS-24161: Update 4.15 ose-cluster-image-registry-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-image-registry-operator/pull/966

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-image-registry-operator/pull/966

Bug OCPBUGS-29219: [Custom DNS] installer should skip DNS zone validation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29068~~. The following is the description of the original issue:
—
Description of problem:

User may provide an DNS domain outside GCP, once custom DNS is enabled, installer should skip DNS zone validation:

level=fatal msg="failed to fetch Terraform Variables: failed to generate asset \"Terraform Variables\": failed to get GCP public zone: no matching public DNS Zone found"

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-02-03-192446
4.16.0-0.nightly-2024-02-03-221256

How reproducible:

 Always

Steps to Reproduce:

1. Enable custom DNS on gcp: platform.gcp.userProvisionedDNS:Enabled and featureSet:TechPreviewNoUpgrade
2. config a baseDomain which does not exist on GCP.

Actual results:

See description.

Expected results:

Installer should skip the validation, as the custom domain may not exist on GCP

Additional info:

https://github.com/openshift/installer/pull/7994

Bug OCPBUGS-44006: Hide monitoring-plugin's nginx version from error pages

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44005~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-44003~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

    Hiding the version is a good security practice

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2519

Bug OCPBUGS-19170: Update 4.15 azure-file-csi-driver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver-operator/pull/74

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-file-csi-driver-operator/pull/74

Bug OCPBUGS-27299: Power VS: Add new regions with PER capability (eu-de-1, eu-de-2, sao04, and wdc07)

View the Description View the linked PRs

Description of problem:

Some regions have added PER capability and are not available in the installer.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Steps to Reproduce:

    1. Try to deploy in eu-de-1 for example
    2. Installer will fail
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7913

Bug OCPBUGS-29113: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/674

Bug OCPBUGS-24001: Pipeline Builder crashes after a Task was installed from ArtifactHub

View the Description View the linked PRs

After I installed a "Git" Task from ArtifactHub directly the Pipelines Builder and searched for a "git" Task again the Pipeline Builder crashes.

Steps to reproduce:

Install Pipelines operator
Navigate to Developer perspective > Pipelines
Press on Create to open the Pipeline Builder
Click on "Add task"
Search for "git"
Navigate down to an entry called "git" from the ArtifactHub and press enter to install it.
This automatically imports this Task below into the current project. You can also apply that yaml to reproduce this bug.
Click on "Add task" again
Search for "git"
Navigate down to the different git tasks.

Actual behaviour
Page crashes

Expected behaviour
Page should not crash

Additional information
Created/Imported Task:

apiVersion: tekton.dev/v1
kind: Task
metadata:
  annotations:
    openshift.io/installed-from: ArtifactHub
    tekton.dev/categories: Git
    tekton.dev/displayName: git
    tekton.dev/pipelines.minVersion: 0.38.0
    tekton.dev/platforms: 'linux/amd64,linux/s390x,linux/ppc64le,linux/arm64'
    tekton.dev/tags: git
  resourceVersion: '50218855'
  name: git
  uid: 1b88150a-f2c1-4030-9849-c7806c0745d8
  creationTimestamp: '2023-11-28T10:54:51Z'
  generation: 1
  labels:
    app.kubernetes.io/version: 0.1.0
spec:
  description: |
    This Task represents Git and is able to initialize and clone a remote repository on the informed Workspace. It's likely to become the first `step` on a Pipeline. 
  params:
    - description: |
        Git repository URL.
      name: URL
      type: string
    - default: main
      description: |
        Revision to checkout, an branch, tag, sha, ref, etc...
      name: REVISION
      type: string
    - default: ''
      description: |
        Repository `refspec` to fetch before checking out the revision.
      name: REFSPEC
      type: string
    - default: 'true'
      description: |
        Initialize and fetch Git submodules.
      name: SUBMODULES
      type: string
    - default: '1'
      description: |
        Number of commits to fetch, a "shallow clone" is a single commit.
      name: DEPTH
      type: string
    - default: 'true'
      description: |
        Sets the global `http.sslVerify` value, `false` is not advised unless
        you trust the remote repository.
      name: SSL_VERIFY
      type: string
    - default: ca-bundle.crt
      description: |
        Certificate Authority (CA) bundle filename on the `ssl-ca-directory`
        Workspace.
      name: CRT_FILENAME
      type: string
    - default: ''
      description: |
        Relative path to the `output` Workspace where the repository will be
        cloned.
      name: SUBDIRECTORY
      type: string
    - default: ''
      description: |
        List of directory patterns split by comma to perform "sparse checkout".
      name: SPARSE_CHECKOUT_DIRECTORIES
      type: string
    - default: 'true'
      description: |
        Clean out the contents of the `output` Workspace before cloning the
        repository, if data exists.
      name: DELETE_EXISTING
      type: string
    - default: ''
      description: |
        HTTP proxy server (non-TLS requests).
      name: HTTP_PROXY
      type: string
    - default: ''
      description: |
        HTTPS proxy server (TLS requests).
      name: HTTPS_PROXY
      type: string
    - default: ''
      description: |
        Opt out of proxying HTTP/HTTPS requests.
      name: NO_PROXY
      type: string
    - default: 'false'
      description: |
        Log the commands executed.
      name: VERBOSE
      type: string
    - default: /home/git
      description: |
        Absolute path to the Git user home directory.
      name: USER_HOME
      type: string
  results:
    - description: |
        The precise commit SHA digest cloned.
      name: COMMIT
      type: string
    - description: |
        The precise repository URL.
      name: URL
      type: string
    - description: |
        The epoch timestamp of the commit cloned.
      name: COMMITTER_DATE
      type: string
  stepTemplate:
    computeResources:
      limits:
        cpu: 100m
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 256Mi
    env:
      - name: PARAMS_URL
        value: $(params.URL)
      - name: PARAMS_REVISION
        value: $(params.REVISION)
      - name: PARAMS_REFSPEC
        value: $(params.REFSPEC)
      - name: PARAMS_SUBMODULES
        value: $(params.SUBMODULES)
      - name: PARAMS_DEPTH
        value: $(params.DEPTH)
      - name: PARAMS_SSL_VERIFY
        value: $(params.SSL_VERIFY)
      - name: PARAMS_CRT_FILENAME
        value: $(params.CRT_FILENAME)
      - name: PARAMS_SUBDIRECTORY
        value: $(params.SUBDIRECTORY)
      - name: PARAMS_SPARSE_CHECKOUT_DIRECTORIES
        value: $(params.SPARSE_CHECKOUT_DIRECTORIES)
      - name: PARAMS_DELETE_EXISTING
        value: $(params.DELETE_EXISTING)
      - name: PARAMS_HTTP_PROXY
        value: $(params.HTTP_PROXY)
      - name: PARAMS_HTTPS_PROXY
        value: $(params.HTTPS_PROXY)
      - name: PARAMS_NO_PROXY
        value: $(params.NO_PROXY)
      - name: PARAMS_VERBOSE
        value: $(params.VERBOSE)
      - name: PARAMS_USER_HOME
        value: $(params.USER_HOME)
      - name: WORKSPACES_OUTPUT_PATH
        value: $(workspaces.output.path)
      - name: WORKSPACES_SSH_DIRECTORY_BOUND
        value: $(workspaces.ssh-directory.bound)
      - name: WORKSPACES_SSH_DIRECTORY_PATH
        value: $(workspaces.ssh-directory.path)
      - name: WORKSPACES_BASIC_AUTH_BOUND
        value: $(workspaces.basic-auth.bound)
      - name: WORKSPACES_BASIC_AUTH_PATH
        value: $(workspaces.basic-auth.path)
      - name: WORKSPACES_SSL_CA_DIRECTORY_BOUND
        value: $(workspaces.ssl-ca-directory.bound)
      - name: WORKSPACES_SSL_CA_DIRECTORY_PATH
        value: $(workspaces.ssl-ca-directory.path)
      - name: RESULTS_COMMITTER_DATE_PATH
        value: $(results.COMMITTER_DATE.path)
      - name: RESULTS_COMMIT_PATH
        value: $(results.COMMIT.path)
      - name: RESULTS_URL_PATH
        value: $(results.URL.path)
    securityContext:
      runAsNonRoot: true
      runAsUser: 65532
  steps:
    - computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: load-scripts
      script: |
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKCmV4cG9ydCBQQVJBTVNfVVJMPSIke1BBUkFNU19VUkw6LX0iCmV4cG9ydCBQQVJBTVNfUkVWSVNJT049IiR7UEFSQU1TX1JFVklTSU9OOi19IgpleHBvcnQgUEFSQU1TX1JFRlNQRUM9IiR7UEFSQU1TX1JFRlNQRUM6LX0iCmV4cG9ydCBQQVJBTVNfU1VCTU9EVUxFUz0iJHtQQVJBTVNfU1VCTU9EVUxFUzotfSIKZXhwb3J0IFBBUkFNU19ERVBUSD0iJHtQQVJBTVNfREVQVEg6LX0iCmV4cG9ydCBQQVJBTVNfU1NMX1ZFUklGWT0iJHtQQVJBTVNfU1NMX1ZFUklGWTotfSIKZXhwb3J0IFBBUkFNU19DUlRfRklMRU5BTUU9IiR7UEFSQU1TX0NSVF9GSUxFTkFNRTotfSIKZXhwb3J0IFBBUkFNU19TVUJESVJFQ1RPUlk9IiR7UEFSQU1TX1NVQkRJUkVDVE9SWTotfSIKZXhwb3J0IFBBUkFNU19TUEFSU0VfQ0hFQ0tPVVRfRElSRUNUT1JJRVM9IiR7UEFSQU1TX1NQQVJTRV9DSEVDS09VVF9ESVJFQ1RPUklFUzotfSIKZXhwb3J0IFBBUkFNU19ERUxFVEVfRVhJU1RJTkc9IiR7UEFSQU1TX0RFTEVURV9FWElTVElORzotfSIKZXhwb3J0IFBBUkFNU19IVFRQX1BST1hZPSIke1BBUkFNU19IVFRQX1BST1hZOi19IgpleHBvcnQgUEFSQU1TX0hUVFBTX1BST1hZPSIke1BBUkFNU19IVFRQU19QUk9YWTotfSIKZXhwb3J0IFBBUkFNU19OT19QUk9YWT0iJHtQQVJBTVNfTk9fUFJPWFk6LX0iCmV4cG9ydCBQQVJBTVNfVkVSQk9TRT0iJHtQQVJBTVNfVkVSQk9TRTotfSIKZXhwb3J0IFBBUkFNU19VU0VSX0hPTUU9IiR7UEFSQU1TX1VTRVJfSE9NRTotfSIKCmV4cG9ydCBXT1JLU1BBQ0VTX09VVFBVVF9QQVRIPSIke1dPUktTUEFDRVNfT1VUUFVUX1BBVEg6LX0iCmV4cG9ydCBXT1JLU1BBQ0VTX1NTSF9ESVJFQ1RPUllfQk9VTkQ9IiR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX0JPVU5EOi19IgpleHBvcnQgV09SS1NQQUNFU19TU0hfRElSRUNUT1JZX1BBVEg9IiR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX1BBVEg6LX0iCmV4cG9ydCBXT1JLU1BBQ0VTX0JBU0lDX0FVVEhfQk9VTkQ9IiR7V09SS1NQQUNFU19CQVNJQ19BVVRIX0JPVU5EOi19IgpleHBvcnQgV09SS1NQQUNFU19CQVNJQ19BVVRIX1BBVEg9IiR7V09SS1NQQUNFU19CQVNJQ19BVVRIX1BBVEg6LX0iCmV4cG9ydCBXT1JLU1BBQ0VTX1NTTF9DQV9ESVJFQ1RPUllfQk9VTkQ9IiR7V09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX0JPVU5EOi19IgpleHBvcnQgV09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX1BBVEg9IiR7V09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX1BBVEg6LX0iCgpleHBvcnQgUkVTVUxUU19DT01NSVRURVJfREFURV9QQVRIPSIke1JFU1VMVFNfQ09NTUlUVEVSX0RBVEVfUEFUSDotfSIKZXhwb3J0IFJFU1VMVFNfQ09NTUlUX1BBVEg9IiR7UkVTVUxUU19DT01NSVRfUEFUSDotfSIKZXhwb3J0IFJFU1VMVFNfVVJMX1BBVEg9IiR7UkVTVUxUU19VUkxfUEFUSDotfSIKCiMgZnVsbCBwYXRoIHRvIHRoZSBjaGVja291dCBkaXJlY3RvcnksIHVzaW5nIHRoZSBvdXRwdXQgd29ya3NwYWNlIGFuZCBzdWJkaXJlY3RvciBwYXJhbWV0ZXIKZXhwb3J0IGNoZWNrb3V0X2Rpcj0iJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfS8ke1BBUkFNU19TVUJESVJFQ1RPUll9IgoKIwojIEZ1bmN0aW9ucwojCgpmYWlsKCkgewogICAgZWNobyAiRVJST1I6ICR7QH0iIDE+JjIKICAgIGV4aXQgMQp9CgpwaGFzZSgpIHsKICAgIGVjaG8gIi0tLT4gUGhhc2U6ICR7QH0uLi4iCn0KCiMgSW5zcGVjdCB0aGUgZW52aXJvbm1lbnQgdmFyaWFibGVzIHRvIGFzc2VydCB0aGUgbWluaW11bSBjb25maWd1cmF0aW9uIGlzIGluZm9ybWVkLgphc3NlcnRfcmVxdWlyZWRfY29uZmlndXJhdGlvbl9vcl9mYWlsKCkgewogICAgW1sgLXogIiR7UEFSQU1TX1VSTH0iIF1dICYmCiAgICAgICAgZmFpbCAiUGFyYW1ldGVyIFVSTCBpcyBub3Qgc2V0ISIKCiAgICBbWyAteiAiJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfSIgXV0gJiYKICAgICAgICBmYWlsICJPdXRwdXQgV29ya3NwYWNlIGlzIG5vdCBzZXQhIgoKICAgIFtbICEgLWQgIiR7V09SS1NQQUNFU19PVVRQVVRfUEFUSH0iIF1dICYmCiAgICAgICAgZmFpbCAiT3V0cHV0IFdvcmtzcGFjZSBkaXJlY3RvcnkgJyR7V09SS1NQQUNFU19PVVRQVVRfUEFUSH0nIG5vdCBmb3VuZCEiCgogICAgcmV0dXJuIDAKfQoKIyBDb3B5IHRoZSBmaWxlIGludG8gdGhlIGRlc3RpbmF0aW9uLCBjaGVja2luZyBpZiB0aGUgc291cmNlIGV4aXN0cy4KY29weV9vcl9mYWlsKCkgewogICAgbG9jYWwgX21vZGU9IiR7MX0iCiAgICBsb2NhbCBfc3JjPSIkezJ9IgogICAgbG9jYWwgX2RzdD0iJHszfSIKCiAgICBpZiBbWyAhIC1mICIke19zcmN9IiAmJiAhIC1kICIke19zcmN9IiBdXTsgdGhlbgogICAgICAgIGZhaWwgIlNvdXJjZSBmaWxlL2RpcmVjdG9yeSBpcyBub3QgZm91bmQgYXQgJyR7X3NyY30nIgogICAgZmkKCiAgICBpZiBbWyAtZCAiJHtfc3JjfSIgXV07IHRoZW4KICAgICAgICBjcCAtUnYgJHtfc3JjfSAke19kc3R9CiAgICAgICAgY2htb2QgLXYgJHtfbW9kZX0gJHtfZHN0fQogICAgZWxzZQogICAgICAgIGluc3RhbGwgLS12ZXJib3NlIC0tbW9kZT0ke19tb2RlfSAke19zcmN9ICR7X2RzdH0KICAgIGZpCn0KCiMgRGVsZXRlIGFueSBleGlzdGluZyBjb250ZW50cyBvZiB0aGUgcmVwbyBkaXJlY3RvcnkgaWYgaXQgZXhpc3RzLiBXZSBkb24ndCBqdXN0ICJybSAtcmYgPGRpcj4iCiMgYmVjYXVzZSBtaWdodCBiZSAiLyIgb3IgdGhlIHJvb3Qgb2YgYSBtb3VudGVkIHZvbHVtZS4KY2xlYW5fZGlyKCkgewogICAgbG9jYWwgX2Rpcj0iJHsxfSIKCiAgICBbWyAhIC1kICIke19kaXJ9IiBdXSAmJgogICAgICAgIHJldHVybiAwCgogICAgIyBEZWxldGUgbm9uLWhpZGRlbiBmaWxlcyBhbmQgZGlyZWN0b3JpZXMKICAgIHJtIC1yZnYgJHtfZGlyOj99LyoKICAgICMgRGVsZXRlIGZpbGVzIGFuZCBkaXJlY3RvcmllcyBzdGFydGluZyB3aXRoIC4gYnV0IGV4Y2x1ZGluZyAuLgogICAgcm0gLXJmdiAke19kaXJ9Ly5bIS5dKgogICAgIyBEZWxldGUgZmlsZXMgYW5kIGRpcmVjdG9yaWVzIHN0YXJ0aW5nIHdpdGggLi4gcGx1cyBhbnkgb3RoZXIgY2hhcmFjdGVyCiAgICBybSAtcmZ2ICR7X2Rpcn0vLi4/Kgp9CgojCiMgU2V0dGluZ3MKIwoKIyB3aGVuIHRoZSBrby1hcHAgZGlyZWN0b3J5IGlzIHByZXNlbnQsIG1ha2luZyBzdXJlIGl0J3MgcGFydCBvZiB0aGUgUEFUSApbWyAtZCAiL2tvLWFwcCIgXV0gJiYgZXhwb3J0IFBBVEg9IiR7UEFUSH06L2tvLWFwcCIKCiMgbWFraW5nIHRoZSBzaGVsbCB2ZXJib3NlIHdoZW4gdGhlIHBhcmFtdGVyIGlzIHNldApbWyAiJHtQQVJBTVNfVkVSQk9TRX0iID09ICJ0cnVlIiBdXSAmJiBzZXQgLXgKCnJldHVybiAw" |base64 -d >common.sh
        chmod +x "common.sh"
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKIwojIEV4cG9ydHMgcHJveHkgYW5kIGN1c3RvbSBTU0wgQ0EgY2VydGlmaWNhdHMgaW4gdGhlIGVudmlyb21lbnQgYW5kIHJ1bnMgdGhlIGdpdC1pbml0IHdpdGggZmxhZ3MKIyBiYXNlZCBvbiB0aGUgdGFzayBwYXJhbWV0ZXJzLgojCgpzZXQgLWV1Cgpzb3VyY2UgJChDRFBBVEg9IGNkIC0tICIkKGRpcm5hbWUgLS0gJHswfSkiICYmIHB3ZCkvY29tbW9uLnNoCgphc3NlcnRfcmVxdWlyZWRfY29uZmlndXJhdGlvbl9vcl9mYWlsCgojCiMgQ0EgKGBzc2wtY2EtZGlyZWN0b3J5YCBXb3Jrc3BhY2UpCiMKCmlmIFtbICIke1dPUktTUEFDRVNfU1NMX0NBX0RJUkVDVE9SWV9CT1VORH0iID09ICJ0cnVlIiAmJiAtbiAiJHtQQVJBTVNfQ1JUX0ZJTEVOQU1FfSIgXV07IHRoZW4KCXBoYXNlICJJbnNwZWN0aW5nICdzc2wtY2EtZGlyZWN0b3J5JyB3b3Jrc3BhY2UgbG9va2luZyBmb3IgJyR7UEFSQU1TX0NSVF9GSUxFTkFNRX0nIGZpbGUiCgljcnQ9IiR7V09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX1BBVEh9LyR7UEFSQU1TX0NSVF9GSUxFTkFNRX0iCglbWyAhIC1mICIke2NydH0iIF1dICYmCgkJZmFpbCAiQ1JUIGZpbGUgKFBBUkFNU19DUlRfRklMRU5BTUUpIG5vdCBmb3VuZCBhdCAnJHtjcnR9JyIKCglwaGFzZSAiRXhwb3J0aW5nIGN1c3RvbSBDQSBjZXJ0aWZpY2F0ZSAnR0lUX1NTTF9DQUlORk89JHtjcnR9JyIKCWV4cG9ydCBHSVRfU1NMX0NBSU5GTz0ke2NydH0KZmkKCiMKIyBQcm94eSBTZXR0aW5ncwojCgpwaGFzZSAiU2V0dGluZyB1cCBIVFRQX1BST1hZPScke1BBUkFNU19IVFRQX1BST1hZfSciCltbIC1uICIke1BBUkFNU19IVFRQX1BST1hZfSIgXV0gJiYgZXhwb3J0IEhUVFBfUFJPWFk9IiR7UEFSQU1TX0hUVFBfUFJPWFl9IgoKcGhhc2UgIlNldHR0aW5nIHVwIEhUVFBTX1BST1hZPScke1BBUkFNU19IVFRQU19QUk9YWX0nIgpbWyAtbiAiJHtQQVJBTVNfSFRUUFNfUFJPWFl9IiBdXSAmJiBleHBvcnQgSFRUUFNfUFJPWFk9IiR7UEFSQU1TX0hUVFBTX1BST1hZfSIKCnBoYXNlICJTZXR0aW5nIHVwIE5PX1BST1hZPScke1BBUkFNU19OT19QUk9YWX0nIgpbWyAtbiAiJHtQQVJBTVNfTk9fUFJPWFl9IiBdXSAmJiBleHBvcnQgTk9fUFJPWFk9IiR7UEFSQU1TX05PX1BST1hZfSIKCiMKIyBHaXQgQ2xvbmUKIwoKcGhhc2UgIlNldHRpbmcgb3V0cHV0IHdvcmtzcGFjZSBhcyBzYWZlIGRpcmVjdG9yeSAoJyR7V09SS1NQQUNFU19PVVRQVVRfUEFUSH0nKSIKZ2l0IGNvbmZpZyAtLWdsb2JhbCAtLWFkZCBzYWZlLmRpcmVjdG9yeSAiJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfSIKCnBoYXNlICJDbG9uaW5nICcke1BBUkFNU19VUkx9JyBpbnRvICcke2NoZWNrb3V0X2Rpcn0nIgpzZXQgLXgKZXhlYyBnaXQtaW5pdCBcCgktdXJsPSIke1BBUkFNU19VUkx9IiBcCgktcmV2aXNpb249IiR7UEFSQU1TX1JFVklTSU9OfSIgXAoJLXJlZnNwZWM9IiR7UEFSQU1TX1JFRlNQRUN9IiBcCgktcGF0aD0iJHtjaGVja291dF9kaXJ9IiBcCgktc3NsVmVyaWZ5PSIke1BBUkFNU19TU0xfVkVSSUZZfSIgXAoJLXN1Ym1vZHVsZXM9IiR7UEFSQU1TX1NVQk1PRFVMRVN9IiBcCgktZGVwdGg9IiR7UEFSQU1TX0RFUFRIfSIgXAoJLXNwYXJzZUNoZWNrb3V0RGlyZWN0b3JpZXM9IiR7UEFSQU1TX1NQQVJTRV9DSEVDS09VVF9ESVJFQ1RPUklFU30iCg==" |base64 -d >git-clone.sh
        chmod +x "git-clone.sh"
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKIwojIFNldHMgdXAgdGhlIGJhc2ljIGFuZCBTU0ggYXV0aGVudGljYXRpb24gYmFzZWQgb24gaW5mb3JtZWQgd29ya3NwYWNlcywgYXMgd2VsbCBhcyBjbGVhbmluZyB1cCB0aGUKIyBwcmV2aW91cyBnaXQtY2xvbmUgc3RhbGUgZGF0YS4KIwoKc2V0IC1ldQoKc291cmNlICQoQ0RQQVRIPSBjZCAtLSAiJChkaXJuYW1lIC0tICR7MH0pIiAmJiBwd2QpL2NvbW1vbi5zaAoKYXNzZXJ0X3JlcXVpcmVkX2NvbmZpZ3VyYXRpb25fb3JfZmFpbAoKcGhhc2UgIlByZXBhcmluZyB0aGUgZmlsZXN5c3RlbSBiZWZvcmUgY2xvbmluZyB0aGUgcmVwb3NpdG9yeSIKCmlmIFtbICIke1dPUktTUEFDRVNfQkFTSUNfQVVUSF9CT1VORH0iID09ICJ0cnVlIiBdXTsgdGhlbgoJcGhhc2UgIkNvbmZpZ3VyaW5nIEdpdCBhdXRoZW50aWNhdGlvbiB3aXRoICdiYXNpYy1hdXRoJyBXb3Jrc3BhY2UgZmlsZXMiCgoJZm9yIGYgaW4gLmdpdC1jcmVkZW50aWFscyAuZ2l0Y29uZmlnOyBkbwoJCXNyYz0iJHtXT1JLU1BBQ0VTX0JBU0lDX0FVVEhfUEFUSH0vJHtmfSIKCQlwaGFzZSAiQ29weWluZyAnJHtzcmN9JyB0byAnJHtQQVJBTVNfVVNFUl9IT01FfSciCgkJY29weV9vcl9mYWlsIDQwMCAke3NyY30gIiR7UEFSQU1TX1VTRVJfSE9NRX0vIgoJZG9uZQpmaQoKaWYgW1sgIiR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX0JPVU5EfSIgPT0gInRydWUiIF1dOyB0aGVuCglwaGFzZSAiQ29weWluZyAnLnNzaCcgZnJvbSBzc2gtZGlyZWN0b3J5IHdvcmtzcGFjZSAoJyR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX1BBVEh9JykiCgoJZG90X3NzaD0iJHtQQVJBTVNfVVNFUl9IT01FfS8uc3NoIgoJY29weV9vcl9mYWlsIDcwMCAke1dPUktTUEFDRVNfU1NIX0RJUkVDVE9SWV9QQVRIfSAke2RvdF9zc2h9CgljaG1vZCAtUnYgNDAwICR7ZG90X3NzaH0vKgpmaQoKaWYgW1sgIiR7UEFSQU1TX0RFTEVURV9FWElTVElOR30iID09ICJ0cnVlIiBdXTsgdGhlbgoJcGhhc2UgIkRlbGV0aW5nIGFsbCBjb250ZW50cyBvZiBjaGVja291dC1kaXIgJyR7Y2hlY2tvdXRfZGlyfSciCgljbGVhbl9kaXIgJHtjaGVja291dF9kaXJ9IHx8IHRydWUKZmkKCmV4aXQgMA==" |base64 -d >prepare.sh
        chmod +x "prepare.sh"
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKIwojIFNjYW4gdGhlIGNsb25lZCByZXBvc2l0b3J5IGluIG9yZGVyIHRvIHJlcG9ydCBkZXRhaWxzIHdyaXR0aW5nIHRoZSByZXN1bHQgZmlsZXMuCiMKCnNldCAtZXUKCnNvdXJjZSAkKENEUEFUSD0gY2QgLS0gIiQoZGlybmFtZSAtLSAkezB9KSIgJiYgcHdkKS9jb21tb24uc2gKCmFzc2VydF9yZXF1aXJlZF9jb25maWd1cmF0aW9uX29yX2ZhaWwKCnBoYXNlICJDb2xsZWN0aW5nIGNsb25lZCByZXBvc2l0b3J5IGluZm9ybWF0aW9uICgnJHtjaGVja291dF9kaXJ9JykiCgpjZCAiJHtjaGVja291dF9kaXJ9IiB8fCBmYWlsICJOb3QgYWJsZSB0byBlbnRlciBjaGVja291dC1kaXIgJyR7Y2hlY2tvdXRfZGlyfSciCgpwaGFzZSAiU2V0dGluZyBvdXRwdXQgd29ya3NwYWNlIGFzIHNhZmUgZGlyZWN0b3J5ICgnJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfScpIgpnaXQgY29uZmlnIC0tZ2xvYmFsIC0tYWRkIHNhZmUuZGlyZWN0b3J5ICIke1dPUktTUEFDRVNfT1VUUFVUX1BBVEh9IgoKcmVzdWx0X3NoYT0iJChnaXQgcmV2LXBhcnNlIEhFQUQpIgpyZXN1bHRfY29tbWl0dGVyX2RhdGU9IiQoZ2l0IGxvZyAtMSAtLXByZXR0eT0lY3QpIgoKcGhhc2UgIlJlcG9ydGluZyBsYXN0IGNvbW1pdCBkYXRlICcke3Jlc3VsdF9jb21taXR0ZXJfZGF0ZX0nIgpwcmludGYgIiVzIiAiJHtyZXN1bHRfY29tbWl0dGVyX2RhdGV9IiA+JHtSRVNVTFRTX0NPTU1JVFRFUl9EQVRFX1BBVEh9CgpwaGFzZSAiUmVwb3J0aW5nIHBhcnNlZCByZXZpc2lvbiBTSEEgJyR7cmVzdWx0X3NoYX0nIgpwcmludGYgIiVzIiAiJHtyZXN1bHRfc2hhfSIgPiR7UkVTVUxUU19DT01NSVRfUEFUSH0KCnBoYXNlICJSZXBvcnRpbmcgcmVwb3NpdG9yeSBVUkwgJyR7UEFSQU1TX1VSTH0nIgpwcmludGYgIiVzIiAiJHtQQVJBTVNfVVJMfSIgPiR7UkVTVUxUU19VUkxfUEFUSH0KCmV4aXQgMA==" |base64 -d >report.sh
        chmod +x "report.sh"
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
      workingDir: /scripts
    - command:
        - /scripts/prepare.sh
      computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: prepare
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
        - mountPath: $(params.USER_HOME)
          name: user-home
    - command:
        - /scripts/git-clone.sh
      computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: git-clone
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
        - mountPath: $(params.USER_HOME)
          name: user-home
    - command:
        - /scripts/report.sh
      computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: report
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
  volumes:
    - emptyDir: {}
      name: user-home
    - emptyDir: {}
      name: scripts-dir
  workspaces:
    - description: |
        The Git repository directory, data will be placed on the root of the
        Workspace, or on the relative path defined by the SUBDIRECTORY
        parameter.
      name: output
    - description: |
        A `.ssh` directory with private key, `known_hosts`, `config`, etc.
        Copied to the Git user's home before cloning the repository, in order to
        server as authentication mechanismBinding a Secret to this Workspace is
        strongly recommended over other volume types.
      name: ssh-directory
      optional: true
    - description: |
        A Workspace containing a `.gitconfig` and `.git-credentials` files.
        These will be copied to the user's home before Git commands run. All
        other files in this Workspace are ignored. It is strongly recommended to
        use `ssh-directory` over `basic-auth` whenever possible, and to bind a
        Secret to this Workspace over other volume types.
      name: basic-auth
      optional: true
    - description: |
        A Workspace containing CA certificates, this will be used by Git to
        verify the peer with when interacting with remote repositories using
        HTTPS.
      name: ssl-ca-directory
      optional: true

https://github.com/openshift/console/pull/13379

Bug OCPBUGS-27901: Client side throttling when running the metrics controller

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27445~~. The following is the description of the original issue:
—
Description of problem:

Client side throttling observed when running the metrics controller.

Steps to Reproduce:

1. Install an AWS cluster in mint mode
2. Enable debug log by editing cloudcredential/cluster
3. Wait for the metrics loop to run for a few times
4. Check CCO logs

Actual results:

// 7s consumed by metrics loop which is caused by client-side throttling 
time="2024-01-20T19:43:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
I0120 19:43:56.251278       1 request.go:629] Waited for 176.161298ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-cloud-network-config-controller/secrets/cloud-credentials
I0120 19:43:56.451311       1 request.go:629] Waited for 197.182213ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-cloud-network-config-controller/secrets/cloud-credentials
I0120 19:43:56.651313       1 request.go:629] Waited for 197.171082ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-cloud-network-config-controller/secrets/cloud-credentials
I0120 19:43:56.850631       1 request.go:629] Waited for 195.251487ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-cloud-network-config-controller/secrets/cloud-credentials
...
time="2024-01-20T19:44:03Z" level=info msg="reconcile complete" controller=metrics elapsed=7.231061324s

Expected results:

No client-side throttling when running the metrics controller.

https://github.com/openshift/cloud-credential-operator/pull/660

Bug OCPBUGS-31501: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2103

Bug OCPBUGS-18469: Azure Image Registry Operator Making too Many Storage Account List Calls

View the Description View the linked PRs

Description of problem:

The image registry operator in Azure by default has two replicas.  Every 5 minutes, each of those replicas makes a call to the StorageAccount List operation for the image registry storage account.  

Azure has published limits for storage account throttling operations.  These limits are 100 calls to list operations every 5 minutes based on the subscription & region pair that exists. 

Because of this, customers are limited to <50 clusters per subscription and region in Azure.  This number can change based on the number of image registry replicas as well as customer activity on List storage account operations within that subscription and region.  

On Azure Red Hat OpenShift managed service, we occasionally have customers exceeding these limits including internal customers for demos, preventing them from creating new clusters within the subscription & region due to these scaling limits.

Version-Release number of selected component (if applicable):

N/A

How reproducible:

Always.

Steps to Reproduce:

1. Scale up the number of image registry pods to hit the 100 / 5 minute List limit (50 replicas, or enough clusters within a given subscription & region)
2. Attempt to create a new cluster
3. Cluster installation may fail due to image-registry cluster operator never going healthy, or the installer not being able to generate a storage account key for the bootstrap node to fetch its ignition config.

Actual results:

storage.AccountsClient#ListAccountSAS: Failure responding to request: StatusCode=429 -- Original Error: autorest/azure: Service returned an error. Status=429 Code="TooManyRequests" Message="The request is being throttled as the limit has been reached for operation type - Read_ObservationWindow_00:05:00. For more information, see - https://aka.ms/srpthrottlinglimits"

Expected results:

Cluster installs successfully

Additional info:

Raising this as a bug since this issue will be persistent across all cluster installations should one exceed the threshold.  It will also impact the image-registry pod health.

https://github.com/openshift/cluster-image-registry-operator/pull/912

Bug OCPBUGS-18552: Ensure vlan interface names will be <= 15 characters

View the Description View the linked PRs

Description of problem:

An assisted-service fix https://issues.redhat.com//browse/MGMT-15340, resolved an issue in the nmstateconfig scripts to ensure VLAN names are < 15 characters. This same fix needs to be merged to the agent installer.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create an agent image with static networking using a vlan with a long name (greater than 15 characters)
2. Boot a host with the agent image

Actual results:

 The installation will fail

Expected results:

The installation will pass.

Additional info:

https://github.com/openshift/installer/pull/7486

Bug OCPBUGS-18598: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1923

Bug OCPBUGS-18963: [metal3] The BMH is stuck in registering "failed to register host in ironic: Bad Gateway"

View the Description View the linked PRs

OCP 4.14.0-rc.0
advanced-cluster-management.v2.9.0-130
multicluster-engine.v2.4.0-154

After encountering https://issues.redhat.com/browse/OCPBUGS-18959

Attempted to forcefully delete the BMH by removing the finalizer.
Then deleted all the metal3 pods.

Attempted to re-create the bmh.

Result:
the bmh is stuck in

oc get bmh
NAME                                           STATE         CONSUMER   ONLINE   ERROR   AGE
hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com   registering              true             15m

seeing this entry in the BMO log:

{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"start","baremetalhost":{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}}
{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"hardwareData is ready to be deleted","baremetalhost":{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}}
{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"host ready to be powered off","baremetalhost":

{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}

,"provisioningState":"powering off before delete"}

{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"provisioner.ironic","msg":"ensuring host is powered off (mode: hard)","host":"kni-qe-65~hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com"}

{"level":"error","ts":"2023-09-13T16:15:57Z","msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":

{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}

,"namespace":"kni-qe-65","name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","reconcileID":"167061cc-7ab4-4c4a-ae45-8c19dfc3ac22","error":"action \"powering off before delete\" failed: failed to power off before deleting node: Host not registered","errorVerbose":"Host not registered\nfailed to power off before deleting node\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionPowerOffBeforeDeleting\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:493\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handlePoweringOffBeforeDelete\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:585\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:202\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\naction \"powering off before delete\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:229\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"}

https://github.com/openshift/ironic-image/pull/401

Bug OCPBUGS-23576: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes-autoscaler/pull/275

Bug OCPBUGS-34156: router deployment fails on y-stream upgrade 4.13->4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25758~~. The following is the description of the original issue:
—
Description of problem:

router pod is in CrashLoopBackup after y-stream upgrade from 4.13->4.14

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

    1. create a cluster with 4.13
    2. upgrade HC to 4.14
    3.

Actual results:

    router pod in CrashLoopBackoff

Expected results:

    router pod is running after upgrade HC from 4.13->4.14

Additional info:

images:
======
HO image: 4.15
upgrade HC from 4.13.0-0.nightly-2023-12-19-114348 to 4.14.0-0.nightly-2023-12-19-120138

router pod log:
==============
jiezhao-mac:hypershift jiezhao$ oc get pods router-9cfd8b89-plvtc -n clusters-jie-test
NAME          READY  STATUS       RESTARTS    AGE
router-9cfd8b89-plvtc  0/1   CrashLoopBackOff  11 (45s ago)  32m
jiezhao-mac:hypershift jiezhao$

Events:
 Type   Reason              Age          From        Message
 ----   ------              ----          ----        -------
 Normal  Scheduled            27m          default-scheduler Successfully assigned clusters-jie-test/router-9cfd8b89-plvtc to ip-10-0-42-36.us-east-2.compute.internal
 Normal  AddedInterface          27m          multus       Add eth0 [10.129.2.82/23] from ovn-kubernetes
 Normal  Pulling             27m          kubelet      Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d2acba15f69ea3648b3c789111db34ff06d9230a4371c5949ebe3c6218e6ea3"
 Normal  Pulled              27m          kubelet      Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d2acba15f69ea3648b3c789111db34ff06d9230a4371c5949ebe3c6218e6ea3" in 14.309s (14.309s including waiting)
 Normal  Created             26m (x3 over 27m)   kubelet      Created container private-router
 Normal  Started             26m (x3 over 27m)   kubelet      Started container private-router
 Warning BackOff             26m (x5 over 27m)   kubelet      Back-off restarting failed container private-router in pod router-9cfd8b89-plvtc_clusters-jie-test(e6cf40ad-32cd-438c-8298-62d565cf6c6a)
 Normal  Pulled              26m (x3 over 27m)   kubelet      Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d2acba15f69ea3648b3c789111db34ff06d9230a4371c5949ebe3c6218e6ea3" already present on machine
 Warning FailedToRetrieveImagePullSecret 2m38s (x131 over 27m) kubelet      Unable to retrieve some image pull secrets (router-dockercfg-q768b); attempting to pull the image may not succeed.
jiezhao-mac:hypershift jiezhao$

jiezhao-mac:hypershift jiezhao$ oc logs router-9cfd8b89-plvtc -n clusters-jie-test
[NOTICE]  (1) : haproxy version is 2.6.13-234aa6d
[NOTICE]  (1) : path to executable is /usr/sbin/haproxy
[ALERT]  (1) : config : [/usr/local/etc/haproxy/haproxy.cfg:52] : 'server ovnkube_sbdb/ovnkube_sbdb' : could not resolve address 'None'.
[ALERT]  (1) : config : Failed to initialize server(s) addr.
jiezhao-mac:hypershift jiezhao$

notes:
=====
not sure if it has the same root cause as https://issues.redhat.com/browse/OCPBUGS-24627

https://github.com/openshift/hypershift/pull/4077

Bug OCPBUGS-34521: [v1] Disk to Mirror or use of targetCatalog requires access to internet for catalog images

View the Description View the linked PRs

Description of problem:

when use targetCatalog, mirror failed with error: 
error: error rebuilding catalog images from file-based catalogs: error copying image docker://registry.redhat.io/abc/redhat-operator-index:v4.13 to docker://localhost:5000/abc/redhat-operator-index:v4.13: initializing source docker://registry.redhat.io/abc/redhat-operator-index:v4.13: (Mirrors also failed: [localhost:5000/abc/redhat-operator-index:v4.13: pinging container registry localhost:5000: Get "https://localhost:5000/v2/": http: server gave HTTP response to HTTPS client]): registry.redhat.io/abc/redhat-operator-index:v4.13: reading manifest v4.13 in registry.redhat.io/abc/redhat-operator-index: unauthorized: access to the requested resource is not authorized

Version-Release number of selected component (if applicable):

oc-mirror 4.16

How reproducible:

always

Steps to Reproduce:

1) Use following isc to do mirror2mirror for v1:    
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: /tmp/case60597
mirror:
  operators:
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.13
    targetCatalog: abc/redhat-operator-index
    packages:
    - name: servicemeshoperator  
`oc-mirror --config config.yaml docker://localhost:5000 --dest-use-http`

Actual results:

1) mirror failed with error:
info: Mirroring completed in 420ms (0B/s)
error: error rebuilding catalog images from file-based catalogs: error copying image docker://registry.redhat.io/abc/redhat-operator-index:v4.13 to docker://localhost:5000/abc/redhat-operator-index:v4.13: initializing source docker://registry.redhat.io/abc/redhat-operator-index:v4.13: (Mirrors also failed: [localhost:5000/abc/redhat-operator-index:v4.13: pinging container registry localhost:5000: Get "https://localhost:5000/v2/": http: server gave HTTP response to HTTPS client]): registry.redhat.io/abc/redhat-operator-index:v4.13: reading manifest v4.13 in registry.redhat.io/abc/redhat-operator-index: unauthorized: access to the requested resource is not authorized

Expected results:

1) no error.

Additional information:

compared with oc-mirror 4.15.9, can't reproduce this issue

https://github.com/openshift/oc-mirror/pull/866

Bug OCPBUGS-36291: Cloud credential operator logs two errors per second when awsSTSIAMRoleARN is empty

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34117~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-33566~~. The following is the description of the original issue:
—
Description of problem:

When the cloud-credential operator is used in manual mode, and awsSTSIAMRoleARN is not present in the secret operator pods, it throws aggressive errors every second. 

One of the customer concern about the number of errors from the operator pods

Two errors per second
============================
time="2024-05-10T00:43:45Z" level=error msg="error syncing credentials: an empty awsSTSIAMRoleARN was found so no Secret was created" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials

time="2024-05-10T00:43:46Z" level=error msg="errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials

Version-Release number of selected component (if applicable):

    4.15.3

How reproducible:

    Always present in managed rosa clusters

Steps to Reproduce:

    1.create a rosa cluster 
    2.check the errors of cloud credentials operator pods 
    3.

Actual results:

    The CCO logs continually throw errors

Expected results:

    The CCO logs should not be continually throwing these errors.

Additional info:

    The focus of this bug is only to remove the error lines from the logs. The underlying issue, of continually attempting to reconcile the CRs will be handled by other bugs.

https://github.com/openshift/cloud-credential-operator/pull/709

Bug OCPBUGS-19232: Update 4.15 vertical-pod-autoscaler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vertical-pod-autoscaler-operator/pull/147

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Important: ART has recorded in their product data that bugs for
this component should be opened against Jira project "OCPBUGS" and
component "Node / Autoscaler (HPA, VPA, CMA)". This project or component does not exist. Jira
should either be updated to include this component or @release-artists should be
notified of the proper mapping in the #forum-ocp-art Slack channel.

Component name: ose-vertical-pod-autoscaler-operator-container .
Jira mapping: https://github.com/openshift-eng/ocp-build-data/blob/main/product.yml

https://github.com/openshift/kubernetes-autoscaler/pull/262

Bug OCPBUGS-30093: [release 4.15] [IPI] coredns-monitor continuously reporting "Failed to read ip from file /run/nodeip-configuration/ipv4"

View the Description View the linked PRs

Description of problem:

After installing an OpenShift IPI vSPhere cluter the coredns-monitor containers in the "openshift-vsphere-infra" namespace continuously report the message: "Failed to read ip from file /run/nodeip-configuration/ipv4" error="open /run/nodeip-configuration/ipv4: no such file or directory". The file "/run/nodeip-configuration/ipv4" present on the nodes is not actually moutned on the coredns pods. Apparently doesn't look to have any impact on the functionality of the cluster, but having a "failed" message on the container can triggers allarm or reserach for problem in the cluster.

Version-Release number of selected component (if applicable):

Any 4.12, 4.13, 4.14

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift IPI vSphere cluster
2. Wait forthe installation to complete
3. Read the logs of any coredns-monitor container in the "openshift-vsphere-infra" namespace

Actual results:

coredns-monitor continuously report the failed message, mesleading a cluster administartor for searching if there is a real issue.

Expected results:

coredns-monitor should not report this failed message if is not needed to fix it.

Additional info:

The same issue happens in Baremetal IPI clusters.

https://github.com/openshift/machine-config-operator/pull/4229

Bug OCPBUGS-39381: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task MGMT-15732: [CI] Enable defaulting webhook for hypershift install

View the Description View the linked PRs

For OCP 4.14+ need to include --enable-defaulting-webhook true to hypershift install command in CI

Reference: https://github.com/openshift/hypershift/pull/2922/files

Slack thread: https://redhat-internal.slack.com/archives/C014N2VLTQE/p1694090399430659

Bug OCPBUGS-18855: Update 4.15 openshift-enterprise-builder image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/builder/pull/357

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/builder/pull/357

Bug OCPBUGS-45890: ovnkube pod crashed if changing internalTransitSwitchSubnet subnet during live migration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-45593~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-43740. The following is the description of the original issue:
—
After changing internalJoinSubnet,internalTransitSwitchSubnet, on day2 and do live migration. ovnkube node pod crashed

network part as below the service cidr has same subnet with the ovn default internalTransitSwitchSubnet

    clusterNetwork:
    - cidr: 100.64.0.0/15
      hostPrefix: 23
    serviceNetwork:
    - 100.88.0.0/16

and then:

oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalJoinSubnet": "100.82.0.0/16"}}}}}'
oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalTransitSwitchSubnet": "100.69.0.0/16"}}}}}'

with error:

start-ovnkube-node ${OVN_KUBE_LOG_LEVEL} 29103 29105 State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: EmptyLbEvents:false PodIP: RawNoHostSubnetNodes:migration.network.openshift.io/plugin= NoHostSubnetNodes:<nil> HostNetworkNamespace:openshift-host-network PlatformType:AWS HealthzBindAddress:0.0.0.0:10256 CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:openshift-dns DNSServiceName:dns-default} Metrics:{BindAddress: OVNMetricsBindAddress: ExportOVSMetrics:false EnablePprof:false NodeServerPrivKey: NodeServerCert: EnableConfigDuration:false EnableScaleMetrics:false} OvnNorth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} Gateway:{Mode:shared Interface: EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 V4MasqueradeSubnet:100.254.0.0/17 V6MasqueradeSubnet:fd69::/125 MasqueradeIPs:

{V4OVNMasqueradeIP:169.254.169.1 V6OVNMasqueradeIP:fd69::1 V4HostMasqueradeIP:169.254.169.2 V6HostMasqueradeIP:fd69::2 V4HostETPLocalMasqueradeIP:169.254.169.3 V6HostETPLocalMasqueradeIP:fd69::3 V4DummyNextHopMasqueradeIP:169.254.169.4 V6DummyNextHopMasqueradeIP:fd69::4 V4OVNServiceHairpinMasqueradeIP:169.254.169.5 V6OVNServiceHairpinMasqueradeIP:fd69::5}

DisablePacke

https://github.com/openshift/cluster-network-operator/pull/2587

Bug OCPBUGS-19181: Update 4.15 ose-ibmcloud-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-ibmcloud/pull/58

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/58

Bug OCPBUGS-22757: [4.15] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.15. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-20356~~.

https://github.com/openshift/installer/pull/7654

Bug OCPBUGS-25237: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver-operator/pull/115

Bug OCPBUGS-31766: hypershift-operator fails to deploy 4.13 HostedClusters

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31725~~. The following is the description of the original issue:
—
The 4.13 CPO fails to reconcile

{"level":"error","ts":"2024-04-03T18:45:28Z","msg":"Reconciler error","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","hostedControlPlane":{"name":"sjenning-guest","namespace":"clusters-sjenning-guest"},"namespace":"clusters-sjenning-guest","name":"sjenning-guest","reconcileID":"35a91dd1-0066-4c81-a6a4-14770ffff61d","error":"failed to update control plane: failed to reconcile router: failed to reconcile router role: roles.rbac.authorization.k8s.io \"router\" is forbidden: user \"system:serviceaccount:clusters-sjenning-guest:control-plane-operator\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:clusters-sjenning-guest\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"security.openshift.io\"], Resources:[\"securitycontextconstraints\"], ResourceNames:[\"hostnetwork\"], Verbs:[\"use\"]}","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Caused by https://github.com/openshift/hypershift/pull/3789

https://github.com/openshift/hypershift/pull/3840

Bug OCPBUGS-19167: Update 4.15 ose-olm-rukpak image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-rukpak/pull/34

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-rukpak/pull/35

Bug OCPBUGS-23350: HostedControlPlane Nodeport service is not opened in a dualstack deployment

View the Description View the linked PRs

Description of problem:

After extensive debugging on HostedControlPlanes in dual stack mode, we have discovered that QE department has issues in dual stack environments. 

In Hypershift/HostedControlPlane, we have an HAProxy in the dataplane (worker nodes of the HostedCluster). This HAProxy is unable to redirect calls to the KubeApiServer in the ControlPlane, attempts to connect using both protocols, IPv6 initially and then IPv4. The issue is that the HostedCluster is exposing services in NodePort mode, and it seems that the masterNodes of the management cluster are not opening these NodePorts in IPv6, only in IPv4.
Even though the master node shows this trace with netstat:

tcp6 9 0 :::32272 :::* LISTEN 6086/ovnkube

It seems that it is only opening in IPv4, as it is not possible to connect to the API via IPv6 even locally. This only happens with dual stack; in the case of IPv4 and v6, it works correctly in single-stack mode.

Version-Release number of selected component (if applicable):

4.14.X
4.15.X

How reproducible:

100%

Steps to Reproduce:

1. Deploy an Openshift management cluster in dual stack mode
2. Deploy MCE 2.4
3. Deploy a HostedCluster in dual stack mode

Actual results:

- Many pods stuck in ContainerCreating state
- The HostedCluster cannot be deployed, many COs blocked and clusterversion also stuck

Expected results:

HostedCluster deployment done

Additional info:

To reproduce the issue you could contact @jparrill or @Liangquan Li in slack, this will make things easier for the environment creation.

https://github.com/openshift/hypershift/pull/3210

Bug OCPBUGS-27459: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/606

Task OU-179: Fix the root cause of externalLabels not present on alerts

View the Description View the linked PRs

Background

In order to evaluate solutions for https://issues.redhat.com/browse/RFE-3953 we need to investigate the root cause of the issue

Outcomes

If there is an issue, have a strategy to display external labels on alerts

https://github.com/openshift/monitoring-plugin/pull/53

Story CONSOLE-3084: [OCM] on-cluster console should disable update buttons in managed clusters

View the Description View the linked PRs

Managed OpenShift (OSD, ROSA) on-cluster console should have their update buttons greyed-out (disabled) so that customers don't suffer the error related to webhooks blocking updates. (since OSD and ROSA need the OCM UI or ROSA CLI in order to do updates)

As managed services governs when we allow specific update versions, this change would support that without letting the user encounter an unnecessary error.

https://github.com/openshift/console/pull/13184

Bug OCPBUGS-19859: Multus annotation permissions: Certificate duration should be configurable

View the Description View the linked PRs

Description of problem: the per-node certificates should be a configurable duration

https://github.com/openshift/multus-cni/pull/191

Bug OCPBUGS-29082: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2255

Bug OCPBUGS-32922: [release-4.15] Gather information via Cluster Fleet Evaluation if swap is enabled

View the Description View the linked PRs

Cluster Fleet Evaluation allows us to send to telemetry additional useful information. Conditions are added to the operator status to propagate this additional information to the customers.

We will need to see if various options are turned on via the kubelet config (swap, etc).

Example of this functionality.

This would fit into MCO witin the operator status logic.

https://github.com/openshift/machine-config-operator/pull/4334

Bug OCPBUGS-46075: Traffic to audit-webhook:8443 getting routed through Konnectivity proxy in ROSA

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43046~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42974~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42873. The following is the description of the original issue:
—
Description of problem:

openshift-apiserver that sends traffic through konnectivity proxy is sending traffic intended for the local audit-webhook service. The audit-webhook service should be included in the NO_PROXY env var of the openshift-apiserver container.

4.14.z,4.15.z,4.15.z,4.16.z

    How reproducible:{code:none} Always

Steps to Reproduce:

    1. Create a rosa hosted cluster
    2. Obeserve logs of the konnectivity-proxy sidecar of openshift-apiserver
    3.

Actual results:

     Logs include requests to the audit-webhook local service

Expected results:

      Logs do not include requests to audit-webhook

Additional info:

https://github.com/openshift/hypershift/pull/5274

Bug OCPBUGS-18771: Keepalived pods crashes and fail to start on worker node (Ingress VIP)

View the Description View the linked PRs

Description of problem:

Customer reported that keepalived pods crashes and fail to start on worker node (Ingress VIP). The expectation is that the keepalived pod (labeled by app=kni-infra-vrrp) should start. This affects everyone using OCP v4.13 together with Ingress VIP and could be a potential bug in the nodeip-configuration service in v4.13.

More details as below:

-> There are 2 problems in OCP v4.13. The regexp expression won't match and the chroot command will fail because of missing ldd libraries inside the container. This has been fixed on 4.14, but not on 4.13.

-> The nodeip-configuration service creates the /run/nodeip-configuration/remote-worker file based on onPremPlatformAPIServerInternalIPs (apiVIP) and ignores the onPremPlatformIngressIPs (ingressVIP) as can be seen in source code.

-> Then the keepalived process wont start because the remote-worker file exists.

-> The liveness probes will fail because the keepalived process does not exist.

The fix is quite simple(as highlighted by the customer), The nodeip-configuration.service template needs to be to extended to consider the Ingress VIPs as well. This is the source code where changes need to be done

As per the following code snippet, The NODE-IP ranges only over the onPremPlatformAPIServerInternalIPs and ignores the onPremPlatformIngressIPs.

node-ip \
    set \
    --platform {{ .Infra.Status.PlatformStatus.Type }} \
    {{if not (isOpenShiftManagedDefaultLB .) -}}
    --user-managed-lb \
    {{end -}}
    {{if or (eq .IPFamilies "IPv6") (eq .IPFamilies "DualStackIPv6Primary") -}}
    --prefer-ipv6 \
    {{end -}}
    --retry-on-failure \
    {{ range onPremPlatformAPIServerInternalIPs . }}{{.}} {{end}}; \
    do \
    sleep 5; \
    done"

Difference between OCPv 4.12 and v4.13 related to keepalived pod is also indicated in this image attached

Version-Release number of selected component (if applicable):

v4.13

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

The keepalived pods crashes and fail to start on worker node (Ingress VIP)

Expected results:

The expectation is that the keepalived pod (labeled by app=kni-infra-vrrp) should start.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3943

Bug OCPBUGS-26075: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/109

Bug OCPBUGS-31087: Power VS: proxy VM image cannot be found

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31013~~. The following is the description of the original issue:
—
Description of problem:

    When trying to deploy with an Internal publish strategy, DNS will fail because proxy VM cannot launch.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Set publishStrategy: Internal
    2. Fail
    3.

Actual results:

    terraform fails

Expected results:

    private cluster launches

Additional info:

https://github.com/openshift/installer/pull/8186

Bug OCPBUGS-32715: OLM logs contain initialization errors in HyperShift

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32487~~. The following is the description of the original issue:
—
Description of problem:

    The olm-operator pod has initilization errors in the logs in a HyperShift deployment. It appears that the --writePackageServerStatusName="" passed in as an argument is being interpreted as \"\" instead of an empty string.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

$ kubectl -n master-coh67vr100a3so6e7erg logs olm-operator-75474cfd48-w2fp5

Actual results:

Several errors that look like this

time="2024-04-19T12:41:32Z" level=error msg="initialization error - failed to ensure name=\"\" - ClusterOperator.config.openshift.io \"\\\"\\\"\" is invalid: metadata.name: Invalid value: \"\\\"\\\"\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator

Expected results:

    No errors

Additional info:

https://github.com/openshift/hypershift/pull/3923

Bug OCPBUGS-37064: [vSphere] network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32947~~. The following is the description of the original issue:
—
Description of problem:

    [vSphere] network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-04-23-032717

How reproducible:

    Always

Steps to Reproduce:

    1.Install a vSphere 4.16 cluster, we use automated template: ipi-on-vsphere/versioned-installer
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-04-23-032717   True        False         24m     Cluster version is 4.16.0-0.nightly-2024-04-23-032717     

    2.Check the controlplanemachineset, you can see network.devices, template and workspace have value.
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset     
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE    AGE
cluster   3         3         3       3                       Active   51m
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: "2024-04-25T02:52:11Z"
  finalizers:
  - controlplanemachineset.machine.openshift.io
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
  name: cluster
  namespace: openshift-machine-api
  resourceVersion: "18273"
  uid: f340d9b4-cf57-4122-b4d4-0f45f20e4d79
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  state: Active
  strategy:
    type: RollingUpdate
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: VSphere
        vsphere:
        - name: generated-failure-domain
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices:
              - networkName: devqe-segment-221
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: huliu-vs425c-f5tfl-rhcos-generated-region-generated-zone
            userDataSecret:
              name: master-user-data
            workspace:
              datacenter: DEVQEdatacenter
              datastore: /DEVQEdatacenter/datastore/vsanDatastore
              folder: /DEVQEdatacenter/vm/huliu-vs425c-f5tfl
              resourcePool: /DEVQEdatacenter/host/DEVQEcluster/Resources
              server: vcenter.devqe.ibmc.devcluster.openshift.com
status:
  conditions:
  - lastTransitionTime: "2024-04-25T02:59:37Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Error
  - lastTransitionTime: "2024-04-25T03:03:45Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-04-25T03:03:45Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-04-25T03:01:04Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasUpdated
    status: "False"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3     

    3.Delete the controlplanemachineset, it will recreate a new one, but those three fields that had values before are now cleared.

liuhuali@Lius-MacBook-Pro huali-test % oc delete controlplanemachineset cluster
controlplanemachineset.machine.openshift.io "cluster" deleted
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE      AGE
cluster   3         3         3       3                       Inactive   6s
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: "2024-04-25T03:45:51Z"
  finalizers:
  - controlplanemachineset.machine.openshift.io
  generation: 1
  name: cluster
  namespace: openshift-machine-api
  resourceVersion: "46172"
  uid: 45d966c9-ec95-42e1-b8b0-c4945ea58566
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  state: Inactive
  strategy:
    type: RollingUpdate
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: VSphere
        vsphere:
        - name: generated-failure-domain
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices: null
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: ""
            userDataSecret:
              name: master-user-data
            workspace: {}
status:
  conditions:
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Error
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasUpdated
    status: "False"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3     

    4.I active the controlplanemachineset and it does not trigger an update,  I continue to add these field values back and it does not trigger an update, I continue to edit these fields to add a second network device and it still does not trigger an update. 


            network:
              devices:
              - networkName: devqe-segment-221
              - networkName: devqe-segment-222


By the way, I can create worker machines with other network device or two network devices.
huliu-vs425c-f5tfl-worker-0a-ldbkh    Running                          81m
huliu-vs425c-f5tfl-worker-0aa-r8q4d   Running                          70m

Actual results:

    network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

Expected results:

    The fields value should not be changed when deleting the controlplanemachineset, 
    Updating these fields should trigger an update, or if these fields should not be modified, then it should not take effect when modifying the controlplanemachineset, as such an inconsistency seems confusing.

Additional info:

    Must gather:  https://drive.google.com/file/d/1mHR31m8gaNohVMSFqYovkkY__t8-E30s/view?usp=sharing

https://github.com/openshift/installer/pull/8735

Bug OCPBUGS-25664: no detail log on signature verification failure

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25055~~. The following is the description of the original issue:
—
Description of problem:

    No detail failure on signature verification while failing to validate signature of the target release payload during upgrade. It's unclear for user to know which action could be taken for the failure. For example, checking if any wrong configmap set, or default store is not available or any issue on custom store?
 
# ./oc adm upgrade
Cluster version is 4.15.0-0.nightly-2023-12-08-202155
Upgradeable=False  

  Reason: FeatureGates_RestrictedFeatureGates_TechPreviewNoUpgrade
  Message: Cluster operator config-operator should not be upgraded between minor versions: FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates

ReleaseAccepted=False  
  Reason: RetrievePayload
  Message: Retrieving payload failed version="4.15.0-0.nightly-2023-12-09-012410" image="registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7" failure=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat

Upstream: https://amd64.ocp.releases.ci.openshift.org/graph
Channel: stable-4.15
Recommended updates:  
  VERSION                            IMAGE
  4.15.0-0.nightly-2023-12-09-012410 registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7
 
# ./oc -n openshift-cluster-version logs cluster-version-operator-6b7b5ff598-vxjrq|grep "verified"|tail -n4
I1211 09:28:22.755834       1 sync_worker.go:434] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat
I1211 09:28:22.755974       1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RetrievePayloadFailed' Retrieving payload failed version="4.15.0-0.nightly-2023-12-09-012410" image="registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7" failure=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat
I1211 09:28:37.817102       1 sync_worker.go:434] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat
I1211 09:28:37.817488       1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RetrievePayloadFailed' Retrieving payload failed version="4.15.0-0.nightly-2023-12-09-012410" image="registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7" failure=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-08-202155

How reproducible:

    always

Steps to Reproduce:

    1. trigger an fresh installation with tp enabled(no spec.signaturestores property set by default) 

    2.trigger an upgrade against a nightly build(no signature available in default signature store)

    3.

Actual results:

    no detail log on signature verification failure

Expected results:

    include detail failure on signature verification in the cvo log

Additional info:

    https://github.com/openshift/cluster-version-operator/pull/1003

https://github.com/openshift/cluster-version-operator/pull/1007

Bug OCPBUGS-19013: ovnkube-trace compatibility issue on RHEL8.6

View the Description View the linked PRs

Description of problem:

There is an regression issue for ovnkube-trace compatibility.

I tried on 4.13.6, the ovnkube-trace binary file can be used on RHEL8.6, only has issue for 'pip3 not available', same to https://issues.redhat.com/browse/OCPBUGS-15914 

But on 4.13.7, ovnkube-trace binary file cannot be used on RHEL8.6 any more, with below glibc error:
./ovnkube-trace: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./ovnkube-trace) 
./ovnkube-trace: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./ovnkube-trace)

Version-Release number of selected component (if applicable):

4.13.7

How reproducible:

always

Steps to Reproduce:

1. install OCP4.13.7

2. copy ovnkube-trace binary file from ovnkube-master pod to local
$ POD=$(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master -o name | head -1 | awk -F '/' '{print $NF}')
$ oc cp -n openshift-ovn-kubernetes $POD:/usr/bin/ovnkube-trace ovnkube-trace
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
tar: Removing leading `/' from member names
$ chmod +x ovnkube-trace  $ ls -l ovnkube-trace 
-rwxrwxr-x. 1 cloud-user cloud-user 45947136 Sep 14 03:10 ovnkube-trace

3. run ovnkube-trace help
$ ./ovnkube-trace -h

Actual results:

$ ./ovnkube-trace -h 
./ovnkube-trace: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./ovnkube-trace) 
./ovnkube-trace: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./ovnkube-trace)

Expected results:

ovnkube-trace can be used on RHEL8.6

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1887

Bug OCPBUGS-23397: Sync openshift-apiserver's shutdown-delay-duration with core offering

View the Description View the linked PRs

Description of problem:

The shutdown-delay-duration argument for the openshift-apiserver is set to 3s in hypershift, but set to 15s in core openshift. Hypershift should update the value to match.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Diff the openshift-apiserver configs

Actual results:

https://github.com/openshift/hypershift/blob/3a42e77041535c8ac8012856d279bc782efcaf3c/control-plane-operator/controllers/hostedcontrolplane/oapi/config.go#L59C1-L60C1

Expected results:

https://github.com/openshift/cluster-openshift-apiserver-operator/commit/cad9746b62abf3b3230592d45f7f60bcecc96dac

Additional info:

https://github.com/openshift/hypershift/pull/3204

Bug OCPBUGS-23610: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-ibm/pull/64

Bug OCPBUGS-21933: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/983

Bug OCPBUGS-31466: oc-mirror's new defaultChannel property breaks after initial sync

View the Description View the linked PRs

Description of problem:

It seems something might be wrong with the logic for the new defaultChannel property. After initially syncing an operator to a tarball, subsequent runs complain the catalog is invalid, as if defaultChannel was never set.

Version-Release number of selected component (if applicable):

I tried oc-mirror v4.14.16 and v4.15.2

How reproducible:

100%

Steps to Reproduce:

1. Write this yaml config to an isc.yaml file in an empty dir. (It is worth noting that right now the default channel for this operator is of course something else – currently `latest`.)

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: ./operator-images
mirror:
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.14
      packages:
        - name: openshift-pipelines-operator-rh
          defaultChannel: pipelines-1.11
          channels:
            - name: pipelines-1.11
              minVersion: 1.11.3
              maxVersion: 1.11.3

2. Using oc-mirror v4.14.16 or v4.15.2, run:

oc-mirror -c ./isc.yaml file://operator-images

3. Without the defaultChannel property and a recent version of oc-mirror, that would have failed. Assuming it succeeds, run the same command a second time (with or without the --dry-run option) and note that it now fails. It seems nothing can be done. oc-mirror says the catalog is invalid.

Actual results:

$ oc-mirror -c ./isc.yaml file://operator-images
Creating directory: operator-images/oc-mirror-workspace/src/publish
Creating directory: operator-images/oc-mirror-workspace/src/v2
Creating directory: operator-images/oc-mirror-workspace/src/charts
Creating directory: operator-images/oc-mirror-workspace/src/release-signatures
No metadata detected, creating new workspace
wrote mirroring manifests to operator-images/oc-mirror-workspace/operators.1711523827/manifests-redhat-operator-indexTo upload local images to a registry, run:        oc adm catalog mirror file://redhat/redhat-operator-index:v4.14 REGISTRY/REPOSITORY
<dir>
  openshift-pipelines/pipelines-chains-controller-rhel8
    blobs:
      registry.redhat.io/openshift-pipelines/pipelines-chains-controller-rhel8 sha256:b06cce9e748bd5e1687a8d2fb11e5e01dd8b901eeeaa1bece327305ccbd62907 11.51KiB
      registry.redhat.io/openshift-pipelines/pipelines-chains-controller-rhel8 sha256:e5897b8264878f1f63f6eceed870b939ff39993b05240ce8292f489e68c9bd19 11.52KiB
...
  stats: shared=12 unique=274 size=24.71GiB ratio=0.98
info: Mirroring completed in 9m45.86s (45.28MB/s)
Creating archive operator-images/mirror_seq1_000000.tar


$ oc-mirror -c ./isc.yaml file://operator-images
Found: operator-images/oc-mirror-workspace/src/publish
Found: operator-images/oc-mirror-workspace/src/v2
Found: operator-images/oc-mirror-workspace/src/charts
Found: operator-images/oc-mirror-workspace/src/release-signatures
The current default channel was not valid, so an attempt was made to automatically assign a new default channel, which has failed.
The failure occurred because none of the remaining channels contain an "olm.channel" priority property, so it was not possible to establish a channel to use as the default channel.

This can be resolved by one of the following changes:
1) assign an "olm.channel" property on the appropriate channels to establish a channel priority
2) modify the default channel manually in the catalog
3) by changing the ImageSetConfiguration to filter channels or packages in such a way that it will include a package version that exists in the current default channel

The rendered catalog is invalid.

Run "oc-mirror list operators --catalog CATALOG-NAME --package PACKAGE-NAME" for more information.

error: error generating diff: the current default channel "latest" for package "openshift-pipelines-operator-rh" could not be determined... ensure that your ImageSetConfiguration filtering criteria results in a package version that exists in the current default channel or use the 'defaultChannel' field

Expected results:

It should NOT throw that error and instead should either update (if you've added more to the imagesetconfig) or gracefully print the "No new images" message.

https://github.com/openshift/oc-mirror/pull/846

Bug OCPBUGS-32191: Allow removal of audit-logs container in kas when "None" policy is used (cherry-pick 4.15)

View the Description View the linked PRs

Description of problem:

    The kube-apiserver has a container called audit-logs
 that keeps audit records stored in the logs of the container (just 
prints to stdout). We would like the ability to disable this container 
whenever the None policy is used on the
 cluster. As of today, this consumes about 1gb of storage for each 
apiserver pod on the system. As you scale up, that 1gb per master adds 
up.

https://github.com/openshift/hypershift/issues/3764

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3875

Bug TRT-1359: monitor test azure-metrics-collector failed with throttle error

View the Description View the linked PRs

[Jira:"Test Framework"] monitor test azure-metrics-collector collection failure in https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/28395/pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd/1724427658311241728

Looks like Azure is throttling our request. We should probably try some retry mechanism.

Relevant thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1699977299650309

https://github.com/openshift/origin/pull/28420

Bug OCPBUGS-21776: [HyperShift] Runtime zero namespaces are not excluded from pod security in guest cluster

View the Description View the linked PRs

Description of problem: runtime zero namespaces ("default", "kube-system", "kube-public") are not excluded from pod security admission in hypershift guest cluster.
In OCP, these runtime zero namespaces are excluded from PSA.

How reproducible: Always

Steps to Reproduce:

1. Install a fresh 4.14 hypershift cluster
2. Check the labels under default, kube-system, kube-public namespaces
3. Try to change the PSA value on these namespaces in hypershift guest cluster and the values are getting updated.

Actual results:

$ oc get ns default -oyaml --kubeconfig=guest.kubeconfig
...
  labels:
    kubernetes.io/metadata.name: default
  name: default
...
$ oc label ns default pod-security.kubernetes.io/enforce=restricted --overwrite --kubeconfig=guest.kubeconfig
namespace/default labeled
$ oc get ns default -oyaml --kubeconfig=guest.kubeconfig
...
  labels:
    kubernetes.io/metadata.name: default
    pod-security.kubernetes.io/enforce: restricted
  name: default

Expected results:

Runtime zero namespaces ("default", "kube-system", "kube-public") are excluded from pod security admission

Additional info:

kube-system ns is excluded from PSA in guest cluster but when try to update security.openshift.io/scc.podSecurityLabelSync value with true/false, it is not updated where as in management cluster podSecurityLabelSync value will get updated.

https://github.com/openshift/hypershift/pull/3115

Bug OCPBUGS-22628: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/49

Bug OCPBUGS-30193: OCP 4.15.0 is not correctly refreshing operator catalogs (imagePullPolicy: IfNotPresent)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30132~~. The following is the description of the original issue:
—
Description of problem:

In OCP 4.14 the catalog pods in openshift-marketplace where defined as:

$ oc get pods -n openshift-marketplace redhat-operators-4bnz4 -o yaml
apiVersion: v1
kind: Pod
metadata:
...
  labels:
    olm.catalogSource: redhat-operators
    olm.pod-spec-hash: 658b699dc
  name: redhat-operators-4bnz4
  namespace: openshift-marketplace
...
spec:
  containers:
  - image: registry.redhat.io/redhat/redhat-operator-index:v4.14
    imagePullPolicy: Always



Now on OCP 4.15 they are defined as:
apiVersion: v1
kind: Pod
metadata:
...
  name: redhat-operators-44wxs
  namespace: openshift-marketplace
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: CatalogSource
    name: redhat-operators
    uid: 3b41ac7b-7ad1-4d58-a62f-4a9e667ae356
  resourceVersion: "877589"
  uid: 65ad927c-3764-4412-8d34-82fd856a4cbc
spec:
  containers:
  - args:
    - serve
    - /extracted-catalog/catalog
    - --cache-dir=/extracted-catalog/cache
    command:
    - /bin/opm
...
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7259b65d8ae04c89cf8c4211e4d9ddc054bb8aebc7f26fac6699b314dc40dbe3
    imagePullPolicy: Always
...
  initContainers:
...
  - args:
    - --catalog.from=/configs
    - --catalog.to=/extracted-catalog/catalog
    - --cache.from=/tmp/cache
    - --cache.to=/extracted-catalog/cache
    command:
    - /utilities/copy-content
    image: registry.redhat.io/redhat/redhat-operator-index:v4.15
    imagePullPolicy: IfNotPresent
...



And due to `imagePullPolicy: IfNotPresent` on the initContainer used to extract the index image (referenced by tag) content, they are never really updated.

Version-Release number of selected component (if applicable):

    OCP 4.15.0

How reproducible:

    100%

Steps to Reproduce:

    1. wait for the next version of a released operator on OCP 4.15
    2.
    3.

Actual results:

    Operator catalogs are never really refreshed due to  imagePullPolicy: IfNotPresent for the index image

Expected results:

    Operator catalogs are periodically (every 10 minutes by default) refreshed

Additional info:

https://github.com/openshift/operator-framework-olm/pull/711

Bug OCPBUGS-31631: Deploy dual stack with IPv6 on top of bond/vlan fails

View the Description View the linked PRs

Description of problem:
Given this nmstate inside the agent-config

        - name: bond0.10
          type: vlan
          state: up
          vlan:
            base-iface: bond0
            id: 10
          ipv4:
            address:
              - ip: 10.10.10.116
                prefix-length: 24
            dhcp: false
            enabled: true
          ipv6:
            enabled: true
            autoconf: true
            dhcp: true
            auto-dns: false
            auto-gateway: true
            auto-routes: true

The installation fails due to the assisted-service validation

    "message": "No connectivity to the majority of hosts in the cluster"

It misses the l2 connectivity for the ipv6 part (??)
Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/assisted-service/pull/6245

Bug OCPBUGS-31726: oc newapp unit tests are failing due to removed images

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31484~~. The following is the description of the original issue:
—
Description of problem:

    all images have been removed from quay.io/centos7 and oc newapp unit tests are heavily relying on these images and started failing. See https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_oc/1716/pull-ci-openshift-oc-master-unit/1773203483667730432

Version-Release number of selected component (if applicable):

    probably all

How reproducible:

    Open a PR and see that pre-submit unit test fails

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-27071: HCP does not deploy cloud provider kubevirt with configured node selectors

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25696~~. The following is the description of the original issue:
—
Description of problem:

    When deploying a HCP KubeVirt cluster using the hcp's --node-selector cli arg, that node selector is not applied to the "kubevirt-cloud-controller-manager" pods within the HCP namespace. 

This makes it not possible to pin the entire HCP pods to specific nodes.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    100%

Steps to Reproduce:

    1. deploy an hcp kubevirt cluster with the --node-selector cli option
    2.
    3.

Actual results:

    the node selector is not applied to cloud provider kubevirt pod

Expected results:

    the node selector should be applied to cloud provider kubevirt pod.

Additional info:

https://github.com/openshift/hypershift/pull/3417

Bug OCPBUGS-22528: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-23528: hypershift destroy command fails when removing destroy finalizer

View the Description View the linked PRs

Description of problem:

Attempting to destroy an AWS cluster can result in an error such as:

2023-11-21T15:04:15Z	INFO	Deleted role	{"role": "53375835bafc21240c89-mgmt-worker-role"}
2023-11-21T15:04:15Z	INFO	Deleting Secrets	{"namespace": "clusters"}
2023-11-21T15:04:15Z	INFO	Deleted CLI generated secrets
2023-11-21T15:04:15Z	ERROR	Failed to destroy cluster	{"error": "failed to remove finalizer: HostedCluster.hypershift.openshift.io \"53375835bafc21240c89-mgmt\" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{\"hypershift.io/aws-oidc-discovery\"}"}
github.com/spf13/cobra.(*Command).execute
	/hypershift/vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
	/hypershift/vendor/github.com/spf13/cobra/command.go:1044
github.com/spf13/cobra.(*Command).Execute
	/hypershift/vendor/github.com/spf13/cobra/command.go:968
github.com/spf13/cobra.(*Command).ExecuteContext
	/hypershift/vendor/github.com/spf13/cobra/command.go:961
main.main
	/hypershift/main.go:70
runtime.main
	/usr/local/go/src/runtime/proc.go:250
Error: failed to remove finalizer: HostedCluster.hypershift.openshift.io "53375835bafc21240c89-mgmt" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"hypershift.io/aws-oidc-discovery"}
failed to remove finalizer: HostedCluster.hypershift.openshift.io "53375835bafc21240c89-mgmt" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"hypershift.io/aws-oidc-discovery"}

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Occasionally

Steps to Reproduce:

1. create hosted AWS cluster
2. destroy cluster with `hypershift destroy cluster aws`

Actual results:

In some cases, the destroy will fail with the message in the description

Expected results:

The destroy does not fail while removing the destroy finalizer

Additional info:

https://github.com/openshift/hypershift/pull/3219

Bug OCPBUGS-24146: Update 4.15 ose-vertical-pod-autoscaler-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vertical-pod-autoscaler-operator/pull/149

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/269

Bug OCPBUGS-27029: nodeip-configuration doesn't log to serial console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19628~~. The following is the description of the original issue:
—
Description of problem:

The nodeip-configuration service does not log to the serial console, which makes it difficult to debug problems when networking is not available and there is no access to the node.

Version-Release number of selected component (if applicable):

Reported against 4.13, but present in all releases

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4113

Bug OCPBUGS-29620: Power VS: Handle composite_instance for cluster create

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29587~~. The following is the description of the original issue:
—
Description of problem:

    When deploying to a Power VS workspace created after February 14th 2024, it will not be found by the installer.

Version-Release number of selected component (if applicable):

How reproducible:

    Easily.

Steps to Reproduce:

    1. Create a Power VS Workspace
    2. Specify it in the install config
    3. Attempt to deploy
    4. Fail with "...is not a valid guid" error.

Actual results:

    Failure to deploy to service instance

Expected results:

    Should deploy to service instance

Additional info:

https://github.com/openshift/installer/pull/8035

Task MON-2642: Write e2e tests for the alertrelabelconfigs CRD

View the Description View the linked PRs

OCP 4.11 ships the alertrelabelconfigs CRD as a techpreview feature. Before graduating to GA we need to have e2e tests in the CMO repository.

AC:

End-to-end tests in the CMO repository validating
- Create/update/delete of alertingrules
- Invalid resources are rejected
Configuration of a blocking job in openshift/release.

https://github.com/openshift/cluster-monitoring-operator/pull/2080

Bug OCPBUGS-22681: Bump documentationBaseURL to 4.15

View the Description View the linked PRs

Description of problem:

documentationBaseURL still points to 4.14

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1.Check documentationBaseURL on 4.15 cluster: 
# oc get configmap console-config -n openshift-console -o yaml | grep documentationBaseURL
      documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/

2.
3.

Actual results:

1.documentationBaseURL is still pointing to 4.14

Expected results:

1.documentationBaseURL should point to 4.15

Additional info:

https://github.com/openshift/console-operator/pull/807

Bug OCPBUGS-23977: After Patternfly5 Update: Knative Service Name Bar not visible in Topology view

View the Description View the linked PRs

After Patternfly5 Update: Knative Service Name Bar not visible in Topology view

Refer this:

https://drive.google.com/file/d/1_KAotzs4WC8g2oW0OymTA_cGm-xabXlq/view?usp=sharing

https://github.com/openshift/console/pull/13376

Bug OCPBUGS-26013: 4.15 [vSphere CSI Driver] [zonal] Volume provisioning failed with: No compatible datastores found for accessibility requirements

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24716~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-27198: apbexternalroute and egressfirewall status shows empty on hypershift hosted cluster[4.15]

View the Description View the linked PRs

Description of problem:

apbexternalroute and egressfirewall status shows empty on hypershift hosted cluster

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-17-173511

How reproducible:

always

Steps to Reproduce:

1. setup hypershift, login hosted cluster
% oc get node
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-128-55.us-east-2.compute.internal    Ready    worker   125m   v1.28.4+7aa0a74
ip-10-0-129-197.us-east-2.compute.internal   Ready    worker   125m   v1.28.4+7aa0a74
ip-10-0-135-106.us-east-2.compute.internal   Ready    worker   125m   v1.28.4+7aa0a74
ip-10-0-140-89.us-east-2.compute.internal    Ready    worker   125m   v1.28.4+7aa0a74


2. create new project test
% oc new-project test


3. create apbexternalroute and egressfirewall on hosted cluster
apbexternalroute yaml file:
---
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
  name: apbex-route-policy
spec:
  from:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: test
  nextHops:
    static:
    - ip: "172.18.0.8"
    - ip: "172.18.0.9"
% oc apply -f apbexroute.yaml 
adminpolicybasedexternalroute.k8s.ovn.org/apbex-route-policy created

egressfirewall yaml file:
---
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
  egress:
  - type: Allow
    to: 
      cidrSelector: 0.0.0.0/0
% oc apply -f egressfw.yaml 
egressfirewall.k8s.ovn.org/default created


3. oc get apbexternalroute and oc get egressfirewall

Actual results:

The status show empty:
% oc get apbexternalroute
NAME                 LAST UPDATE   STATUS
apbex-route-policy   49s                     <--- status is empty
% oc describe apbexternalroute apbex-route-policy | tail -n 8
Status:
  Last Transition Time:  2023-12-19T06:54:17Z
  Messages:
    ip-10-0-135-106.us-east-2.compute.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
    ip-10-0-129-197.us-east-2.compute.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
    ip-10-0-128-55.us-east-2.compute.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
    ip-10-0-140-89.us-east-2.compute.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
Events:  <none>

% oc get egressfirewall
NAME      EGRESSFIREWALL STATUS
default                           <--- status is empty 
% oc describe egressfirewall default | tail -n 8
    Type:             Allow
Status:
  Messages:
    ip-10-0-129-197.us-east-2.compute.internal: EgressFirewall Rules applied
    ip-10-0-128-55.us-east-2.compute.internal: EgressFirewall Rules applied
    ip-10-0-140-89.us-east-2.compute.internal: EgressFirewall Rules applied
    ip-10-0-135-106.us-east-2.compute.internal: EgressFirewall Rules applied
Events:  <none>

Expected results:

the status can be shown correctly

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn't need to read the entire case history.
Don't presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with "sbr-triaged"
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with "sbr-untriaged"
Note: bugs that do not meet these minimum standards will be closed with label "SDN-Jira-template"

https://github.com/openshift/cluster-network-operator/pull/2204

Bug OCPBUGS-36606: HCP missing audit log configuration for oauth-openshift (OAuth server)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33060~~. The following is the description of the original issue:
—
Description of problem:

HCP has audit log configuration for Kube API server, OpenShift API server, OAuth API server (like OCP), but does not have audit for oauth-openshift (OAuth server). Discussed with Standa in https://redhat-internal.slack.com/archives/CS05TR7BK/p1714124297376299 , oauth-openshift needs audit too in HCP.

Version-Release number of selected component (if applicable):

4.11 ~ 4.16

How reproducible:

Always

Steps to Reproduce:

1. Launch HCP env.
2. Check audit log configuration:
$ oc get deployment -n clusters-hypershift-ci-279389 kube-apiserver openshift-apiserver openshift-oauth-apiserver oauth-openshift -o yaml | grep -e '^    name:' -e 'audit\.log'

Actual results:

2. It outputs oauth-openshift (OAuth server) has no audit:
    name: kube-apiserver
          - /var/log/kube-apiserver/audit.log
    name: openshift-apiserver
          - /var/log/openshift-apiserver/audit.log
    name: openshift-oauth-apiserver
          - --audit-log-path=/var/log/openshift-oauth-apiserver/audit.log
          - /var/log/openshift-oauth-apiserver/audit.log
    name: oauth-openshift

Expected results:

2. oauth-openshift (OAuth server) needs to have audit too.

Additional info:

OCP has audit for OAuth server since 4.11 ~~AUTH-6~~ https://docs.openshift.com/container-platform/4.11/security/audit-log-view.html saying "You can view the logs for the OpenShift API server, Kubernetes API server, OpenShift OAuth API server, and OpenShift OAuth server".

https://github.com/openshift/hypershift/pull/4320

Bug OCPBUGS-33624: [4.15] "k8s.ovn.org/node-chassis-id annotation not found" event causing CI failures

View the Description View the linked PRs

Component Readiness has found a potential regression in [sig-arch] events should not repeat pathologically for ns/openshift-multus.

Probability of significant regression: 99.96%

Sample (being evaluated) Release: 4.16
Start Time: 2024-02-19T00:00:00Z
End Time: 2024-03-04T23:59:59Z
Success Rate: 53.33%
Successes: 8
Failures: 7
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-19T00:00:00Z
End Time: 2024-03-04T23:59:59Z
Success Rate: 100.00%
Successes: 24
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2024-03-04%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-19%2000%3A00%3A00&capability=Other&component=Networking%20%2F%20multus&confidence=95&environment=ovn%20no-upgrade%20amd64%20azure%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=azure&sampleEndTime=2024-03-04%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-02-19%2000%3A00%3A00&testId=openshift-tests%3A4ef347b98570fc3fa0f208ca0bbcdd04&testName=%5Bsig-arch%5D%20events%20should%20not%20repeat%20pathologically%20for%20ns%2Fopenshift-multus&upgrade=no-upgrade&variant=serial

https://github.com/openshift/ovn-kubernetes/pull/2168

Bug OCPBUGS-20246: Unresponsive server API in ipv6 disconnected agent-based hosted cluster

View the Description View the linked PRs

Description of problem:

Installing ipv6 agent-based hosted cluster in disconnected environment. The hosted control plane is available but when using its kubeconfig to run oc commands on the hosted cluster, I'm getting 

E1009 08:05:34.000946  115216 memcache.go:265] couldn't get current server API group list: Get "https://fd2e:6f44:5dd8::58:31765/api?timeout=32s": dial tcp [fd2e:6f44:5dd8::58]:31765: i/o timeout

Version-Release number of selected component (if applicable):

OCP 4.14.0-rc.4

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

I can use oc commands against the hosted cluster

Additional info:

Bug OCPBUGS-23010: Alibaba volume snapshot never become ready

View the Description View the linked PRs

Description of problem:

On Alibaba, some volume snapshot never become ready.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-06-182702

How reproducible: sometimes

Steps to Reproduce:

Create PVC + Pod
Create VolumeSnapshot of the PVC
Observe that the VolumeSnapshot never becomes "ready".

Actual results:

$ oc get volumesnapshot
NAME          READYTOUSE   SOURCEPVC   ...
mysnapl587m   false         myclaim     ...

Expected results:

The VolumeSnapshot becomes ready in ~1 minute or less (for small volumes)

Additional info:

There seems to be something odd between the external-snapshotter and the CSI driver. From the snapshotter logs:

the external-snapshotter calls initial CreateSnapshot and gets an unready snapshot (like "readyToUse [false]").
the snapshotter calls CreateSnapshot again and gets an error (Alibaba CSI driver has some throttling). This happens few times in sequence.
Finally, the snapshotter calls CreateSnapshot and get unready snapshot again instead of the throttling error. At this point, the snapshotter stops and does not continue calling CreateSnapshot to get ready snapshot.

This sequence is very timing sensitive - sometimes it happens that the cloud finishes the snapshot at step 2., therefore the driver gets snapshot that is ready at step 3. and then everything works OK.

(Sorry, I lost the full logs...)

https://github.com/openshift/csi-external-snapshotter/pull/114

Bug OCPBUGS-25232: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/82

Bug OCPBUGS-29025: HCP .well-known/oauth-authorization-server shows "https://:0" even OIDC oauthMetadata is set in hc.spec.configuration.authentication

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28625~~. The following is the description of the original issue:
—
Description of problem:

HCP does not honor the oauthMetadata field of hc.spec.configuration.authentication, making console crash and oc login fail.

Version-Release number of selected component (if applicable):

HyperShift management cluster: 4.16.0-0.nightly-2024-01-29-233218
HyperShift hosted cluster: 4.16.0-0.nightly-2024-01-29-233218

How reproducible:

Always

Steps to Reproduce:

1. Install HCP env. Export KUBECONFIG:
$ export KUBECONFIG=/path/to/hosted-cluster/kubeconfig

2. Create keycloak applications. Then get the route:
$ KEYCLOAK_HOST=https://$(oc get -n keycloak route keycloak --template='{{ .spec.host }}')
$ echo $KEYCLOAK_HOST
https://keycloak-keycloak.apps.hypershift-ci-18556.xxx
$ curl -sSk "$KEYCLOAK_HOST/realms/master/.well-known/openid-configuration" > oauthMetadata

$ cat oauthMetadata 
{"issuer":"https://keycloak-keycloak.apps.hypershift-ci-18556.xxx/realms/master"

$ oc create configmap oauth-meta --from-file ./oauthMetadata -n clusters --kubeconfig /path/to/management-cluster/kubeconfig
...

3. Set hc.spec.configuration.authentication:
$ CLIENT_ID=openshift-test-aud
$ oc patch hc hypershift-ci-18556 -n clusters --kubeconfig /path/to/management-cluster/kubeconfig --type=merge -p="
spec:
  configuration:
    authentication:
      oauthMetadata:
        name: oauth-meta
      oidcProviders:
      - claimMappings:
          ...
        issuer:
          audiences:
          - $CLIENT_ID
          issuerCertificateAuthority:
            name: keycloak-oidc-ca
          issuerURL: $KEYCLOAK_HOST/realms/master
        name: keycloak-oidc-test
      type: OIDC
"

Check KAS indeed already picks up the setting:
$ oc logs -c kube-apiserver kube-apiserver-5c976d59f5-zbrwh -n clusters-hypershift-ci-18556 --kubeconfig /path/to/management-cluster/kubeconfig | grep "oidc-"
...
I0130 08:07:24.266247       1 flags.go:64] FLAG: --oidc-ca-file="/etc/kubernetes/certs/oidc-ca/ca.crt"
I0130 08:07:24.266251       1 flags.go:64] FLAG: --oidc-client-id="openshift-test-aud"
...
I0130 08:07:24.266261       1 flags.go:64] FLAG: --oidc-issuer-url="https://keycloak-keycloak.apps.hypershift-ci-18556.xxx/realms/master"
...

Wait about 15 mins.

4. Check COs and check oc login. Both show the same error:
$ oc get co | grep -v 'True.*False.*False'
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console                                    4.16.0-0.nightly-2024-01-29-233218   True        True          False      4h57m   SyncLoopRefreshProgressing: Working toward version 4.16.0-0.nightly-2024-01-29-233218, 1 replicas available
$ oc get po -n openshift-console
NAME                        READY   STATUS             RESTARTS         AGE
console-547cf6bdbb-l8z9q    1/1     Running            0                4h55m
console-54f88749d7-cv7ht    0/1     CrashLoopBackOff   9 (3m18s ago)    14m
console-54f88749d7-t7x96    0/1     CrashLoopBackOff   9 (3m32s ago)    14m

$ oc logs console-547cf6bdbb-l8z9q -n openshift-console
I0130 03:23:36.788951       1 metrics.go:156] usage.Metrics: Update console users metrics: 0 kubeadmin, 0 cluster-admins, 0 developers, 0 unknown/errors (took 406.059196ms)
E0130 06:48:32.745179       1 asynccache.go:43] failed a caching attempt: request to OAuth issuer endpoint https://:0/oauth/token failed: Head "https://:0": dial tcp :0: connect: connection refused
E0130 06:53:32.757881       1 asynccache.go:43] failed a caching attempt: request to OAuth issuer endpoint https://:0/oauth/token failed: Head "https://:0": dial tcp :0: connect: connection refused
...

$ oc login --exec-plugin=oc-oidc --client-id=openshift-test-aud --extra-scopes=email,profile --callback-port=8080
error: oidc authenticator error: oidc discovery error: Get "https://:0/.well-known/openid-configuration": dial tcp :0: connect: connection refused
error: oidc authenticator error: oidc discovery error: Get "https://:0/.well-known/openid-configuration": dial tcp :0: connect: connection refused
Unable to connect to the server: getting credentials: exec: executable oc failed with exit code 1

5. Check root cause, the configured oauthMetadata is not picked up well:
$ curl -k https://a6e149f24f8xxxxxx.elb.ap-east-1.amazonaws.com:6443/.well-known/oauth-authorization-server
{
"issuer": "https://:0",
"authorization_endpoint": "https://:0/oauth/authorize",
"token_endpoint": "https://:0/oauth/token",
...
}

Actual results:

As above steps 4 and 5, the configured oauthMetadata is not picked up well, causing console and oc login hit the error.

Expected results:

The configured oauthMetadata is picked up well. No error.

Additional info:

For oc, if I manually use `oc config set-credentials oidc --exec-api-version=client.authentication.k8s.io/v1 --exec-command=oc --exec-arg=get-token --exec-arg="--issuer-url=$KEYCLOAK_HOST/realms/master" ...` instead of using `oc login --exec-plugin=oc-oidc ...`, oc authentication works well. This means my configuration is correct.
$ oc whoami  
Please visit the following URL in your browser: http://localhost:8080
oidc-user-test:xxia@redhat.com

https://github.com/openshift/hypershift/pull/3522

Bug OCPBUGS-15253: Add namespace to IngressWithoutClassName and UnmanagedRoutes alert message

View the Description View the linked PRs

Description of problem:

It would help making debugging easier if we included the namespace in the message for these alerts: https://github.com/openshift/cluster-ingress-operator/blob/master/manifests/0000_90_ingress-operator_03_prometheusrules.yaml#L69

Version-Release number of selected component (if applicable):

4.12.x

How reproducible:

Always

Steps to Reproduce:

1. 
2.
3.

Actual results:

No namespace in the alert message

Expected results:

Additional info:

https://github.com/openshift/route-controller-manager/pull/35

Bug OCPBUGS-25231: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-node-driver-registrar/pull/60

Bug OCPBUGS-45149: [release-4.15] jkyros is leaving, so we shouldn't have him in the OWNERS files

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/kubernetes-autoscaler/pull/326

Bug OCPBUGS-129: [OCP web console] Unable to select/change log component under master node's logs section once user made any selection.

View the Description View the linked PRs

Description of problem:

Once a user makes a change to the log component from master node's log section, then the user is unable to change or select a different log component from the dropdown.

To make different log component selection , the user needs to revisit the logs section under master node again and this refreshes the pane and reloads to default options.

Version-Release number of selected components (if applicable):

4.11.0-0.nightly-2022-08-15-152346

How reproducible:

Always

Steps to Reproduce:

Login to OCP web console.
Go to Compute > Nodes > Click on one of the master nodes.
Go to the Logs section.
Change the dropdown value from journal to openshift-apiserver ( also select audit log)
Try to change the dropdown value from openshift-apiserver to journal/kube-apiserver/oauth-apiserver.
View the behavior.

Actual results:

Unable to select or change the log component once the user already made a selection from the dropdown under master nodes' logs section.

Expected results:

Users should be allowed to change/select the log component from master node's logs section whenever required with the help of available dropdown.

Additional info:

Reproduced in both chrome[103.0.5060.114 (Official Build) (64-bit)] and firefox[91.11.0esr (64-bit)] browsers
Attached screen capture for the same.ScreenRecorder_2022-08-16_26457662-aea5-4a00-aeb4-0fbddf8f16f0.mp4

https://github.com/openshift/console/pull/13092

Bug OCPBUGS-16514: OCP 4.14 | Execution of two oc tag commands in a row, creates wrong .image.dockerImageMetadata

View the Description View the linked PRs

Description of problem:

When I execute the following two tag commands in a row on OCP 4.14.0-ec.3, Multi-Arch:

  oc tag $IMAGE@$DIGEST_MANIFEST test-1:tag-manifest
  sleep 0
  oc tag $IMAGE@$DIGEST_MANIFEST test-1:tag-manifest-preserve-original --import-mode=PreserveOriginal

Then wrong data is written to the .image.dockerImageMetadata record.
If there is a delay between these two commands, e.g. sleep 5, then the image.dockerImageMetadata contains correct data.

Version-Release number of selected component (if applicable):

How reproducible:

Run the below script and you see the error. If you change the SLEEP_TIME=5, then the script passes. No problem.

Steps to Reproduce:

#!/usr/bin/env bash
set -e
SLEEP_TIME=0     # Test will fail, when sleep time is 0, use delay of 3 sec or more to pass this test

IMAGE="quay.io/podman/hello"
podman pull $IMAGE:latest
DIGEST_MANIFEST=$(podman inspect quay.io/podman/hello:latest | jq -r '.[0].Digest')

oc new-project "ir-test-001"
oc create imagestream test-1
oc import-image test-1 --from="${IMAGE}@${DIGEST_MANIFEST}" --import-mode='PreserveOriginal'

oc tag $IMAGE@$DIGEST_MANIFEST test-1:tag-manifest
sleep "${SLEEP_TIME}"
oc tag $IMAGE@$DIGEST_MANIFEST test-1:tag-manifest-preserve-original --import-mode=PreserveOriginal

sleep 5

[[ $(oc get istag test-1:tag-manifest-preserve-original -o json | jq -r '.image.dockerImageMetadata.Architecture') == "null" ]] && echo "pass: tag-manifest-preserve-original has no architecture" || echo "fail: tag-preserve-original has architecture and should not"

Actual results:

fail: tag-preserve-original has architecture and should not

oc get istag test-1:tag-manifest-preserve-original -o json | jq -r '.image.dockerImageMetadata.Architecture'
amd64

Expected results:

pass: tag-manifest-preserve-original has no architecture

oc get istag test-1:tag-manifest-preserve-original -o json | jq -r '.image.dockerImageMetadata.Architecture'
null

Additional info:

This was tested with OC command on x86_64

https://github.com/openshift/openshift-apiserver/pull/386

Bug OCPBUGS-20563: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/374

Bug OCPBUGS-35922: lots of churn during image registry managed/removed transition

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34213~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-34054~~. The following is the description of the original issue:
—
The OCM-operator's imagePullSecretCleanupController attempts to prevent new pods from using an image pull secret that needs to be deleted, but this results in the OCM creating a new image pull secret in the meantime.

The overlap occurs when OCM-operator has detected the registry is removed, simultaneously triggering the imagePullSecretCleanup controller to start deleting and updating the OCM config to stop creating, but the OCM behavior change is delayed until its pods are restarted.

In 4.16 this churn is minimized due to the OCM naming the image pull secrets consistently, but the churn can occur during an upgrade given that the OCM-operator is updated first.

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/358

Bug OCPBUGS-27307: Environment file /etc/kubernetes/node.env is overwritten after a node restart

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27261~~. The following is the description of the original issue:
—
Description of problem:

    Environment file /etc/kubernetes/node.env is overwritten after node restart. 

There is a type in https://github.com/openshift/machine-config-operator/blob/master/templates/common/aws/files/usr-local-bin-aws-kubelet-nodename.yaml where variable should be changed to NODEENV wherever NODENV is found.

Version-Release number of selected component (if applicable):

How reproducible:

  Easy

Steps to Reproduce:

    1. Change contents of /etc/kubernetes/node.env
    2. Restart node
    3. Notice changes are lost

Actual results:

Expected results:

     /etc/kubernetes/node.env should not be changed after restart of a node

Additional info:

https://github.com/openshift/machine-config-operator/pull/4130

Bug OCPBUGS-35144: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8646

Bug OCPBUGS-19116: Update 4.15 ose-gcp-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-gcp/pull/200

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-gcp/pull/200

Bug OCPBUGS-19221: Update 4.15 ose-ibmcloud-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-ibmcloud/pull/24

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/24

Bug OCPBUGS-20474: Mirroring a manifest-list-based release payload with --to-image-stream uses Legacy importMode and does not honor --keep-manifest-list

View the Description View the linked PRs

Description of problem:

When mirroring a multiarch release payload through oc adm release mirror --keep-manifest-list --to-image-stream into an image stream of a cluster's internal registry, the cluster does not import the image as a manifest list.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. oc adm release mirror \
                  --from=quay.io/openshift-release-dev/ocp-release:4.14.0-rc.5-multi \
                  --to-image-stream=release \
                  --keep-manifest-list=true
2. oc get istag release:installer -o yaml
3.

Actual results:

apiVersion: image.openshift.io/v1
generation: 1
image:
  dockerImageLayers:
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:97da74cc6d8fa5d1634eb1760fd1da5c6048619c264c23e62d75f3bf6b8ef5c4
    size: 79524639
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:d8190195889efb5333eeec18af9b6c82313edd4db62989bd3a357caca4f13f0e
    size: 1438
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:09c3f3b6718f2df2ee9cd3a6c2e19ddb73ca777f216d310eaf4e0420407ea7c7
    size: 59044444
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:cf84754d71b4b704c30abd45668882903e3eaa1355857b605e1dbb25ecf516d7
    size: 11455659
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:2e20a50f4b685b3976028637f296ae8839c18a9505b5f58d6e4a0f03984ef1e8
    size: 433281528
  dockerImageManifestMediaType: application/vnd.docker.distribution.manifest.v2+json
  dockerImageMetadata:
    Architecture: amd64
    Config:
      Entrypoint:
      - /bin/openshift-install
      Env:
      - container=oci
      - GODEBUG=x509ignoreCN=0,madvdontneed=1
      - __doozer=merge
      - BUILD_RELEASE=202310100645.p0.gc926532.assembly.stream
      - BUILD_VERSION=v4.15.0
      - OS_GIT_MAJOR=4
      - OS_GIT_MINOR=15
      - OS_GIT_PATCH=0
      - OS_GIT_TREE_STATE=clean
      - OS_GIT_VERSION=4.15.0-202310100645.p0.gc926532.assembly.stream-c926532
      - SOURCE_GIT_TREE_STATE=clean
      - __doozer_group=openshift-4.15
      - __doozer_key=ose-installer
      - OS_GIT_COMMIT=c926532
      - SOURCE_DATE_EPOCH=1696907019
      - SOURCE_GIT_COMMIT=c926532cd50b6ef4974f14dfe3d877a0f7707972
      - SOURCE_GIT_TAG=agent-installer-v4.11.0-dev-preview-2-2165-gc926532cd5
      - SOURCE_GIT_URL=https://github.com/openshift/installer
      - PATH=/bin
      - HOME=/output
      Labels:
        License: GPLv2+
        architecture: x86_64
        build-date: 2023-10-10T10:01:18
        com.redhat.build-host: cpt-1001.osbs.prod.upshift.rdu2.redhat.com
        com.redhat.component: ose-installer-container
        com.redhat.license_terms: https://www.redhat.com/agreements
        description: This is the base image from which all OpenShift Container Platform
          images inherit.
        distribution-scope: public
        io.buildah.version: 1.29.0
        io.k8s.description: This is the base image from which all OpenShift Container
          Platform images inherit.
        io.k8s.display-name: OpenShift Container Platform RHEL 8 Base
        io.openshift.build.commit.id: c926532cd50b6ef4974f14dfe3d877a0f7707972
        io.openshift.build.commit.url: https://github.com/openshift/installer/commit/c926532cd50b6ef4974f14dfe3d877a0f7707972
        io.openshift.build.source-location: https://github.com/openshift/installer
        io.openshift.expose-services: ""
        io.openshift.maintainer.component: Installer / openshift-installer
        io.openshift.maintainer.project: OCPBUGS
        io.openshift.release.operator: "true"
        io.openshift.tags: openshift,base
        maintainer: Red Hat, Inc.
        name: openshift/ose-installer
        release: 202310100645.p0.gc926532.assembly.stream
        summary: Provides the latest release of the Red Hat Extended Life Base Image.
        url: https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-installer/images/v4.15.0-202310100645.p0.gc926532.assembly.stream
        vcs-ref: d40a2800e169f6c2d63897467af22d59933e8811
        vcs-type: git
        vendor: Red Hat, Inc.
        version: v4.15.0
      User: 1000:1000
      WorkingDir: /output
    ContainerConfig: {}
    Created: "2023-10-10T10:59:36Z"
    Id: sha256:ae4c47d3c08de5d57b5d4fa8a30497ac097c05abab4e284c91eae389e512f202
    Size: 583326767
    apiVersion: image.openshift.io/1.0
    kind: DockerImage
  dockerImageMetadataVersion: "1.0"
  dockerImageReference: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:67d35b2185c9f267523f86e54f403d0d2561c9098b7bb81fa3bfd6fd8a121d04
  metadata:
    annotations:
      image.openshift.io/dockerLayersOrder: ascending
    creationTimestamp: "2023-10-11T10:56:53Z"
    name: sha256:67d35b2185c9f267523f86e54f403d0d2561c9098b7bb81fa3bfd6fd8a121d04
    resourceVersion: "740341"
    uid: 17dede63-ca3b-47ad-a157-c78f38c1df7d
kind: ImageStreamTag
lookupPolicy:
  local: true
metadata:
  creationTimestamp: "2023-10-12T09:32:10Z"
  name: release:installer
  namespace: okd-fcos
  resourceVersion: "1329147"
  uid: d6cfcd4d-3f9c-4bb1-bc56-04bf5e926628
tag:
  annotations: null
  from:
    kind: DockerImage
    name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
  generation: 12
  importPolicy:
    importMode: Legacy
  name: installer
  referencePolicy:
    type: Source

Expected results:

apiVersion: image.openshift.io/v1
generation: 12
image:
  dockerImageManifestMediaType: application/vnd.docker.distribution.manifest.list.v2+json
  dockerImageManifests:
  - architecture: amd64
    digest: sha256:67d35b2185c9f267523f86e54f403d0d2561c9098b7bb81fa3bfd6fd8a121d04
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: arm64
    digest: sha256:a602c3e4b5f8f747b2813ed2166f366417f638fc6884deecebdb04e18431fcd6
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: ppc64le
    digest: sha256:04296057a8f037f20d4b1ca20bcaac5bdca5368cdd711a3f37bd05d66c9fdaec
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: s390x
    digest: sha256:5fda4ea09bfd2026b7d6acd80441b2b7c51b1cf440fd46e0535a7320b67894fb
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  dockerImageMetadata:
    ContainerConfig: {}
    Created: "2023-10-12T09:32:03Z"
    Id: sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
    apiVersion: image.openshift.io/1.0
    kind: DockerImage
  dockerImageMetadataVersion: "1.0"
  dockerImageReference: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
  metadata:
    creationTimestamp: "2023-10-12T09:32:10Z"
    name: sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
    resourceVersion: "1327949"
    uid: 4d78c9ba-12b2-414f-a173-b926ae019ab0
kind: ImageStreamTag
lookupPolicy:
  local: true
metadata:
  creationTimestamp: "2023-10-12T09:32:10Z"
  name: release:installer
  namespace: okd-fcos
  resourceVersion: "1329147"
  uid: d6cfcd4d-3f9c-4bb1-bc56-04bf5e926628
tag:
  annotations: null
  from:
    kind: DockerImage
    name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
  generation: 12
  importPolicy:
    importMode: PreserveOriginal
  name: installer
  referencePolicy:
    type: Source

Additional info:

https://github.com/openshift/oc/pull/1572

Bug OCPBUGS-25831: Date&Time values are not showing as per browser default language

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-5113~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13467

Bug OCPBUGS-23125: User can impersonate to all the user without the appropriate rolebinding

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13345

Bug OCPBUGS-26036: [AWS] iam:TagInstanceProfile permission is required for ipi install

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25440~~. The following is the description of the original issue:
—
Description of problem:

iam:TagInstanceProfile is not listed in official document [1], IPI install would fail if iam:TagInstanceProfile permission is missing

level=error msg=Error: creating IAM Instance Profile (ci-op-4hw2rz1v-49c30-zt9vx-worker-profile): AccessDenied: User: arn:aws:iam::301721915996:user/ci-op-4hw2rz1v-49c30-minimal-perm is not authorized to perform: iam:TagInstanceProfile on resource: arn:aws:iam::301721915996:instance-profile/ci-op-4hw2rz1v-49c30-zt9vx-worker-profile because no identity-based policy allows the iam:TagInstanceProfile action
level=error msg=    status code: 403, request id: bb0641f5-d01c-4538-b333-261a804ddb59

[1] https://docs.openshift.com/container-platform/4.14/installing/installing_aws/installing-aws-account.html#installation-aws-permissions_installing-aws-account

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-14-115151

How reproducible:

Always

Steps to Reproduce:

    1. install a common IPI cluster with minimal permission provided in official document
    2.
    3.

Actual results:

Install failed.

Expected results:

Additional info:

install does a precheck for iam:TagInstanceProfile

https://github.com/openshift/installer/pull/7866

Bug OCPBUGS-32890: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/531

Bug OCPBUGS-31383: make verify should use MCO's kube version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31365~~. The following is the description of the original issue:
—
Description of problem:

make verify uses the latest version of setup-envtest, regardless of what go version the repo is currently on

How reproducible:

100%

Steps to Reproduce:

Run `make verify` without a local image of setup-envtest should cause the issue

Actual results:

go: sigs.k8s.io/controller-runtime/tools/setup-envtest@latest: sigs.k8s.io/controller-runtime/tools/setup-envtest@v0.0.0-20240323114127-e08b286e313e requires go >= 1.22.0 (running go 1.21.7; GOTOOLCHAIN=local)
Go compliance shim [5685] [rhel-8-golang-1.21][openshift-golang-builder]: Exited with: 1

Expected results:

make verify should be able to run without build errors

Additional info:

https://github.com/openshift/machine-config-operator/pull/4282

Bug OCPBUGS-33564: Errors not returned from wait-for-ceo cmd during bootstrap teardown

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33495~~. The following is the description of the original issue:
—
The wait-for-ceo cmd is used during bootstrap to wait until the bootstrap completion conditions are met i.e etcd has scaled up to 3 members + bootstrap.
https://github.com/openshift/installer/blob/d08c982cdbb7f66b810f71aa9608bf51cce8c38c/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L569-L576

Currently this cmd won't return errors in the following two places:

Only logging not returning: https://github.com/openshift/cluster-etcd-operator/blob/cbfb856ec8892687a303989b84e01c8f34c1967e/pkg/cmd/waitforceo/waitforceo.go#L51-L58
Defining a new err in the if block's scope means it doesn't get passed out: https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/operator/bootstrapteardown/waitforceo.go#L26-L28

https://github.com/openshift/cluster-etcd-operator/pull/1261

Bug OCPBUGS-42335: Slow network causes metal IPI bootstrap to fail

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41845~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-41500~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39081. The following is the description of the original issue:
—
If the network to the bootstrap VM is slow, the extract-machine-os.service can time out (after 180s). If this happens, it will be restarted but services that depend on it (like ironic) will never be started even once it succeeds. systemd added support for Restart:on-failure for Type:oneshot services, but they still don't behave the same way as other types of services.

This can be simulated in dev-scripts by doing:

sudo tc qdisc add dev ostestbm root netem rate 33Mbit

https://github.com/openshift/installer/pull/9051

Bug OCPBUGS-15201: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1981

Bug OCPBUGS-20391: Revert https://issues.redhat.com//browse/NETOBSERV-987

View the Description View the linked PRs

Revert https://github.com/openshift/must-gather/pull/357 as it's part of a dedicated https://github.com/netobserv/must-gather image

https://github.com/openshift/must-gather/pull/390

Bug OCPBUGS-21597: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4000

Bug OCPBUGS-24329: Update 4.15 ose-csi-snapshot-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/121

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/121

Bug OCPBUGS-18892: ovn-ipsec pods CLBO when IPSec NS extension/svc is enabled

View the Description View the linked PRs

Description of problem:

ovn-ipsec pods Crashes when IPSec NS extension/svc is enabled on any $ROLE nodes

IPSec ext and svc were enabled for 2 WORKERS only and their corresponding ovn-ipsec pods are in CLBO


[root@dell-per740-36 ipsec]# oc get pods 
NAME                                       READY   STATUS             RESTARTS         AGE
dell-per740-14rhtsengpek2redhatcom-debug   1/1     Running            0                3m37s
ovn-ipsec-bptr6                            0/1     CrashLoopBackOff   26 (3m58s ago)   130m
ovn-ipsec-bv88z                            1/1     Running            0                3h5m
ovn-ipsec-pre414-6pb25                     1/1     Running            0                3h5m
ovn-ipsec-pre414-b6vzh                     1/1     Running            0                3h5m
ovn-ipsec-pre414-jzwcm                     1/1     Running            0                3h5m
ovn-ipsec-pre414-vgwqx                     1/1     Running            3                132m
ovn-ipsec-pre414-xl4hb                     1/1     Running            3                130m
ovn-ipsec-qb2bj                            1/1     Running            0                3h5m
ovn-ipsec-r4dfw                            1/1     Running            0                3h5m
ovn-ipsec-xhdpw                            0/1     CrashLoopBackOff   28 (116s ago)    132m
ovnkube-control-plane-698c9845b8-4v58f     2/2     Running            0                3h5m
ovnkube-control-plane-698c9845b8-nlgs8     2/2     Running            0                3h5m
ovnkube-control-plane-698c9845b8-wfkd4     2/2     Running            0                3h5m
ovnkube-node-l6sr5                         8/8     Running            27 (66m ago)     130m
ovnkube-node-mj8bs                         8/8     Running            27 (75m ago)     132m
ovnkube-node-p24x8                         8/8     Running            0                178m
ovnkube-node-rlpbh                         8/8     Running            0                178m
ovnkube-node-wdxbg                         8/8     Running            0                178m
[root@dell-per740-36 ipsec]#

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-12-024050

How reproducible:

Always

Steps to Reproduce:

1.Install OVN IPSec cluster (East-West) 
2.Enable IPSec OS extension for North-South
3.Enable IPSec service for North-South

Actual results:

ovn-ipsec pods in CLBO state

Expected results:

All pods under ovn-kubernetes ns should be Running fine

Additional info:

One of the ovn-ipsec CLBO pods logs

# oc logs ovn-ipsec-bptr6
Defaulted container "ovn-ipsec" out of: ovn-ipsec, ovn-keys (init)
+ rpm --dbpath=/usr/share/rpm -q libreswan
libreswan-4.9-4.el9_2.x86_64
+ counter=0
+ '[' -f /etc/cni/net.d/10-ovn-kubernetes.conf ']'
+ echo 'ovnkube-node has configured node.'
ovnkube-node has configured node.
+ ip x s flush
+ ip x p flush
+ ulimit -n 1024
+ /usr/libexec/ipsec/addconn --config /etc/ipsec.conf --checkconfig
+ /usr/libexec/ipsec/_stackmanager start
+ /usr/sbin/ipsec --checknss
+ /usr/libexec/ipsec/pluto --leak-detective --config /etc/ipsec.conf --logfile /var/log/openvswitch/libreswan.log
FATAL ERROR: /usr/libexec/ipsec/pluto: lock file "/run/pluto/pluto.pid" already exists
leak: string logger, item size: 48
leak: string logger prefix, item size: 27
leak detective found 2 leaks, total size 75

journalctl -u ipsec here: https://privatebin.corp.redhat.com/?216142833d016b3c#2Es8ACSyM3VWvwi85vTaYtSx8X3952ahxCvSHeY61UtT

https://github.com/openshift/cluster-network-operator/pull/1999

Bug OCPBUGS-19966: Builds - BuildConfigs : i18n misses

View the Description View the linked PRs

Description of problem:

Change UI to non en_US locale.
Navigate to Builds - BuildConfigs
Click on kebabmenu, 'Start last run' is in English

Version-Release number of selected component (if applicable):

4.14.0-rc.2

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Content is in English

Expected results:

Content should be localized

Additional info:

Reference screenshot https://drive.google.com/file/d/1XrQwpJxftcsvE8rPGvItTaCZ4Sr1Rj1l/view?usp=sharing

https://github.com/openshift/console/pull/13211

Bug OCPBUGS-24102: Update 4.15 ose-cluster-autoscaler-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-autoscaler-operator/pull/302

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-autoscaler-operator/pull/302

Bug OCPBUGS-15220: The multus-admission-controller deployment in a hypershift cluster needs to ensure pods run in separate zones

View the Description View the linked PRs

Description of problem:

To ensure pods run in separate zone for a hypershift cluster, a PodAntiAffinity spec should be provided.

Version-Release number of selected component (if applicable):

4.12, 4.13, 4.14

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift control plane in ha mode.
2. Observe the multus admission controller pods.
3.

Actual results:

Not all pods scheduled on separate zones.

Expected results:

Pods scheduled on separate zones.

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1795

Bug OCPBUGS-24133: Update 4.15 coredns-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/coredns/pull/106

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/coredns/pull/106

Bug OCPBUGS-23765: Helm README spacing issue in dark mode

View the Description View the linked PRs

Issue 30 from https://docs.google.com/spreadsheets/d/1TR3ENY-GE_LQL9F-xH6NahtRHdu6IrYGhLqDDA8y0EI/edit#gid=1035185624

In Helm page, on click of README link, margin spacing is missing in all direction

Screenshot: https://drive.google.com/file/d/1pYFsVxJrB4m2s7pYuw1QeTW3j38A_fRT/view?usp=drive_link

https://github.com/openshift/console/pull/13370

Bug OCPBUGS-27847: origin tests fail when two systemd service files are present in the kubelet machineconfig

View the Description View the linked PRs

This is a clone of issue OCPBUGS-27465. The following is the description of the original issue:
—
Description of problem:

The test implementation in https://github.com/openshift/origin/commit/5487414d8f5652c301a00617ee18e5ca8f339cb4#L56 assumes there is just one kubelet service or at least that it is always the first one in the MCP. Which just changed in https://github.com/openshift/machine-config-operator/pull/4124 and the test is failing.

Version-Release number of selected component (if applicable):

master branch of 4.16

How reproducible:

always during test

Steps to Reproduce:

    1. Test with https://github.com/openshift/machine-config-operator/pull/4124 applied

Actual results:

Test detects a wrong service and fails

Expected results:

Test finds the proper kubelet.service and passes

Additional info:

https://github.com/openshift/origin/pull/28545

Bug OCPBUGS-28910: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4161

Bug OCPBUGS-31939: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/747

Task MGMT-17541: golang-ci lint reference is broken in release-4.15 branch

View the Description View the linked PRs

The reference to the quay image needs to be replaced with

RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b /usr/bin v1.53.2

If we leave this unchanged, the skipper build container fails with an error about a missing image.

Bug OCPBUGS-22767: pod IP routing broken if KubeVirt VM migration fails

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/1952

Bug OCPBUGS-27174: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2202

Story OSASINFRA-3199: openshift-install should configure User-Agent

View the Description View the linked PRs

openshift-install makes many calls to OpenStack APIs when installing OpenShift on OpenStack. Currently all of these calls use the same default User-Agent header gophercloud/x.y.z, where x.y.z is the version of the gophercloud that openshift-install was built with.

Keystone logs the User-Agent string, as do other OpenStack services, and it can provide important information about who is interacting with the cloud. As recently seen in ~~OCPBUGS-14049~~, it can also be useful when debugging issues with components.

We should configure the User-Agent header for openshift-install and all other OpenShift components that talk to OpenStack APIs.

https://github.com/openshift/installer/pull/7548

Bug OCPBUGS-23015: KAS HSTS is not configured on Hypershift control planes

View the Description View the linked PRs

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1392

configured HSTS for the KAS in standalone and we need to follow

https://github.com/openshift/hypershift/pull/3088

Bug OCPBUGS-46389: use TaskRuns `results.tekton.dev/record` annotation to get the logs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-45245~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-44169~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42000. The following is the description of the original issue:
—
Description of problem:

1. We are making 2 API calls to get the logs for the PipelineRuns. instead, we can make use of `results.tekton.dev/record` annotation and replace the `records` in the value of the annotation with `logs` to get the logs of the PipelineRuns.

2. Tekton results will return back only v1 version of PipelineRun and TaskRun from Pipelines 1.16, so data type has to be v1 version for 1.16 version and for lower version it is v1beta1

https://github.com/openshift/console/pull/14622

Bug OCPBUGS-8512: WebhookConfiguration caBundle injection is incorrect when some webhooks already confiugred

View the Description View the linked PRs

Description of problem:

WebhookConfiguration caBundle injection is incorrect when some webhooks already configured with caBundle.

Behavior seems to be that the first n number of webhooks in `.webhooks` array have caBundle injected, where n is the number of webhooks that do not have caBundle set.

Version-Release number of selected component (if applicable):

How reproducible

Steps to Reproduce:

1. Create a validatingwebhookconfigurations or mutatingwebhookconfigurations with `service.beta.openshift.io/inject-cabundle: "true"` annotation.

2. oc edit validatingwebhookconfigurations (or oc edit mutatingwebhookconfigurations)

3. Add a new webhook to the end of the list `.webhooks`. It will not have caBundle set manually as service-ca should inject it. 

4. Observe new webhook does not get caBundle injected.

Note: it is important in step. 3 that the new webhook is added to the end of the list.

Actual results:

Only the first n webhooks have caBundle injected where n is the number of webhooks without caBundle set.

Expected results:

All webhooks have caBundle injected when they do not have it set.

Additional info:

Open PR here: https://github.com/openshift/service-ca-operator/pull/207

The issue seems to be a mistake with go-lang for range syntax where "i" is the index of desired "i" to update.  

tl dr; code should update the value of the int in the array, not the index of the int in the array.

https://github.com/openshift/service-ca-operator/pull/219

Bug OCPBUGS-19156: Update 4.15 ose-machine-api-provider-openstack image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-openstack/pull/84

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-openstack/pull/84

Bug OCPBUGS-25016: Need to bump api at oc to include the CloudCredential capability

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24834~~. The following is the description of the original issue:
—
Background:

CCO was made optional in https://issues.redhat.com/browse/OCPEDGE-69. CloudCredential was introduced as a new capability to openshift/api. We need to bump api at oc to include the CloudCredential capability so oc adm release extract works correctly.

Description of problem:

Some relevant CredentialsRequests are not extracted by the following command: oc adm release extract --credentials-requests --included --install-config=install-config.yaml ...
where install-config.yaml looks like the following:
...
capabilities:
  baselineCapabilitySet: None
  additionalEnabledCapabilities:
  - MachineAPI
  - CloudCredential
platform:
  aws:
...

Logs:

...
I1209 19:57:25.968783   79037 extract.go:418] Found manifest 0000_50_cloud-credential-operator_05-iam-ro-credentialsrequest.yaml
I1209 19:57:25.968902   79037 extract.go:429] Excluding Group: "cloudcredential.openshift.io" Kind: "CredentialsRequest" Namespace: "openshift-cloud-credential-operator" Name: "cloud-credential-operator-iam-ro": unrecognized capability names: CloudCredential
...

https://github.com/openshift/oc/pull/1623

Bug OCPBUGS-33506: PipelineRun details page break for pipeline with when expression using CEL expression

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29745~~. The following is the description of the original issue:
—
Description of problem:

   When expression using CEL is the alpha feature of the Pipeline. and it is not handled and not supported in the UI console so that UI breaks

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create a Pipeline with when expression using the CEL expression
    2. Run the pipeline and navigate to the PipelineRun details page
    3.

Actual results:

    UI breaks

Expected results:

    UI should not break

Additional info:

    CEL expression doc https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#use-cel-expression-in-whenexpression

https://github.com/openshift/console/pull/13835

Bug OCPBUGS-18386: Cluster Version Operator does not correctly reconcile SCC resources

View the Description View the linked PRs

How reproducible:

Always

Steps to Reproduce:

1. the Kubernetes API introduces a new Pod Template parameter (`ephemeral`)
2. this parameter is not in the allowed list of the default SCC
3. customer is not allowed to edit the default SCCs nor we have a  mechanism in  place to update the built in SCCs AFAIK
4. users of existing clusters cannot use the new parameter without creating manual SCCs and assigning this SCC to service accounts themselves which looks clunky. This is documented in https://access.redhat.com/articles/6967808

Actual results:

Users of existing clusters cannot use ephemeral volumes after an upgrade

Expected results:

Users of existing clusters *can* use ephemeral volumes after an upgrade

Current status

https://github.com/openshift/cluster-version-operator/pull/966

Bug OCPBUGS-19868: Avoid panicking on all-fresh-cache evaluation

View the Description View the linked PRs

Description of problem:

The cluster-version operator should not crash while trying to evaluate a bogus condition.

Version-Release number of selected component (if applicable):

4.10 and later are exposed to the bug. It's possible that the ~~OCPBUGS-19512~~ series increases exposure.

How reproducible:

Unclear.

Steps to Reproduce:

1. Create a cluster.
2. Point it at https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge.json (you may need to adjust version strings and digests for your test-cluster's release).
3. Wait around 30 minutes.
4. Point it at https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid-promql.json (again, may need some customization).

Actual results:

$ grep -B1 -A15 'too fresh' previous.log
I0927 12:07:55.594222       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid-promql.json?arch=amd64&channel=stable-4.15&id=dc628f75-7778-457a-bb69-6a31a243c3a9&version=4.15.0-0.test-2023-09-27-091926-ci-ln-01zw7kk-latest
I0927 12:07:55.726463       1 cache.go:118] {"type":"PromQL","promql":{"promql":"0 * group(cluster_version)"}} is the most stale cached cluster-condition match entry, but it is too fresh (last evaluated on 2023-09-27 11:37:25.876804482 +0000 UTC m=+175.082381015).  However, we don't have a cached evaluation for {"type":"PromQL","promql":{"promql":"group(cluster_version_available_updates{channel=buggy})"}}, so attempt to evaluate that now.
I0927 12:07:55.726602       1 cache.go:129] {"type":"PromQL","promql":{"promql":"0 * group(cluster_version)"}} is stealing this cluster-condition match call for {"type":"PromQL","promql":{"promql":"group(cluster_version_available_updates{channel=buggy})"}}, because its last evaluation completed 30m29.849594461s ago
I0927 12:07:55.758573       1 cvo.go:703] Finished syncing available updates "openshift-cluster-version/version" (170.074319ms)
E0927 12:07:55.758847       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 194 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1c4df00?, 0x32abc60})
        /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001489d40?})
        /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1c4df00, 0x32abc60})
        /usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/cluster-version-operator/pkg/clusterconditions/promql.(*PromQL).Match(0xc0004860e0, {0x220ded8, 0xc00041e550}, 0x0)
        /go/src/github.com/openshift/cluster-version-operator/pkg/clusterconditions/promql/promql.go:134 +0x419
github.com/openshift/cluster-version-operator/pkg/clusterconditions/cache.(*Cache).Match(0xc0002d3ae0, {0x220ded8, 0xc00041e550}, 0xc0033948d0)
        /go/src/github.com/openshift/cluster-version-operator/pkg/clusterconditions/cache/cache.go:132 +0x982
github.com/openshift/cluster-version-operator/pkg/clusterconditions.(*conditionRegistry).Match(0xc000016760, {0x220ded8, 0xc00041e550}, {0xc0033948a0, 0x1, 0x0?})

Expected results:

No panics.

Additional info:

I'm still not entirely clear on how ~~OCPBUGS-19512~~ would have increased exposure.

https://github.com/openshift/cluster-version-operator/pull/975

Bug OCPBUGS-20500: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-43026: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer/pull/921

Bug OCPBUGS-19715: Do not configure the node webhook if not using ovn-kubernetes

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2030

Bug OCPBUGS-29768: Power VS: Installer create workspace is not instantly ready for PER configuration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28203~~. The following is the description of the original issue:
—
Description of problem:

    If you allow the installer to provision a Power VS Workspace instead of bringing your own, it can sometimes fail when creating a network. This is because Power Edge Router can sometimes take up to a minute to configure.

Version-Release number of selected component (if applicable):

How reproducible:

    Infrequent, but will probably hit it within 50-100 runs

Steps to Reproduce:

    1. Install on Power VS with IPI with serviceInstanceGUID not set in the install-config.yaml
    2. Occasionally you'll observe a failure due to the workspace not being ready for networks

Actual results:

    Failure

Expected results:

    Success

Additional info:

    Not consistently reproducible

https://github.com/openshift/installer/pull/8052

Bug OCPBUGS-33038: [4.15] Azure Workload Identity in static PVs did not work for CSI-File

View the Description View the linked PRs

Description of problem:

The customer uses Azure File CSI driver and without this they cannot make use of the Azure Workload Identity work which was one of the banner features of OCP 4.14. This feature is currently available in 4.16, however it will take the customer 3-6 months to validate 4.16 and start its rollout putting their plans to complete a large migration to Azure by end of 2024 at risk.
Could you please backport either the 1.29.3 feature for Azure Workload Idenity or rebase our Azure File CSI driver in 4.14 and 4.15 to at least 1.29.3 which includes the desired feature.

Version-Release number of selected component (if applicable):

azure-file-csi-driver in 4.14 and 4.15
- In 4.14, azure-file-csi-driver is version 1.28.1
- In 4.15, azure-file-csi-driver is version 1.29.2

How reproducible:

Always

Steps to Reproduce:

    1. Install ocp 4.14 with Azure Workload Managed Identity
    2. Try to configure Managed Workload Identiy with Azure CSI file

https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/master/docs/workload-identity-static-pv-mount.md

Actual results:

Is not usable

Expected results:

Azure Workload Identity should be manage with Azure File CSi as part of the whole feature

Additional info:

Bug OCPBUGS-37732: Update owners in containernetworking-plugins

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37725~~. The following is the description of the original issue:
—
Backport ownerfile changes

https://github.com/openshift/containernetworking-plugins/pull/167

Bug OCPBUGS-28546: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7957

Bug OCPBUGS-17458: kube-controller-manager pod in openshift-kube-controller-manager namespace keeps reporting "failed to synchronize namespace" these messages appear even though the namespace is long gone

View the Description View the linked PRs

Description of problem:

The kube-controller-manager pod in openshift-kube-controller-manager namespace keeps reporting "failed to synchronize namespace" after deleing the namespace.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

The namespace has been deleted long ago, but we still  kube-controller-manager pod in openshift-kube-controller-manager namespace keeps reporting "failed to synchronize namespace"

Expected results:

It's should not report for deleted namespace

Additional info:

https://github.com/openshift/cluster-policy-controller/pull/130

Bug OCPBUGS-19635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1942

Bug OCPBUGS-24077: Update 4.15 ose-cluster-platform-operators-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/platform-operators/pull/104

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/platform-operators/pull/104

Bug OCPBUGS-25241: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/433

Bug OCPBUGS-37277: Update to cloud.google.com/go/storage v1.39.1 [4.15]

View the Description View the linked PRs

Description of problem:

    ci/prow/security is failing on google.golang.org/protobuf/internal/encoding/json

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    always

Steps to Reproduce:

    1. trigger ci/prow/security on a pull request.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/729

Bug OCPBUGS-38086: [4.15] etcd vertical scaling test should not rely on CPMS status.readyReplicas

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38015~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-37837~~. The following is the description of the original issue:
—
In our vertical scaling test, after we delete a machine, we rely on the `status.readyReplicas` field of the ControlPlaneMachineSet (CPMS) to indicate that it has successfully created a new machine that let's us scale up before we scale down.
https://github.com/openshift/origin/blob/3deedee4ae147a03afdc3d4ba86bc175bc6fc5a8/test/extended/etcd/vertical_scaling.go#L76-L87

As we've seen in the past as well, that status field isn't a reliable indicator of the scale up of machines, as status.readyReplicas might stay at 3 as the soon-to-be-removed node that is pending deletion can go Ready=Unknown in runs such as the following: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1286/pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling/1808186565449486336

Which then ends up the test timing out on waiting for status.readyReplicas=4 while the scale-up and down may already have happened.
This shows up across scaling tests on all platforms as:

fail [github.com/openshift/origin/test/extended/etcd/vertical_scaling.go:81]: Unexpected error:
    <*errors.withStack | 0xc002182a50>: 
    scale-up: timed out waiting for CPMS to show 4 ready replicas: timed out waiting for the condition
    {
        error: <*errors.withMessage | 0xc00304c3a0>{
            cause: <wait.errInterrupted>{
                cause: <*errors.errorString | 0xc0003ca800>{
                    s: "timed out waiting for the condition",
                },
            },
            msg: "scale-up: timed out waiting for CPMS to show 4 ready replicas",
        },

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-azure-ovn-etcd-scaling/1811686448848441344

https://sippy.dptools.openshift.org/sippy-ng/jobs/4.17?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522etcd-scaling%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&sort=asc&sortField=net_improvement

In hindsight all we care about is whether the deleted machine's member is replaced by another machine's member and can ignore the flapping of node and machine statuses while we wait for the scale-up then down of members to happen. So we can relax or replace that check on status.readyReplicas with just looking at the membership change.

PS: We can also update the outdated Godoc comments for the test to mention that it relies on CPMSO to create a machine for us https://github.com/openshift/origin/blob/3deedee4ae147a03afdc3d4ba86bc175bc6fc5a8/test/extended/etcd/vertical_scaling.go#L34-L38

https://github.com/openshift/origin/pull/28985

Bug OCPBUGS-24301: capo-controller-manager enters crash backoff loop

View the Description View the linked PRs

cluster-capi-operator is incorrectly updating the container command to /bin/cluster-api-provider-openstack-manager. It should leave it alone because it is already correct.

https://github.com/openshift/cluster-capi-operator/pull/148

Bug OCPBUGS-26409: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/207

Bug OCPBUGS-33058: [release-4.15] Unable to remove the AlternateBackends from the routes using the web console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33011~~. The following is the description of the original issue:
—
Issue customer is experiencing:
Despite manually removing the alternate service (old) and saving the configuration from the UI, the alternate service did not get removed from the route, and the changes did not take effect.

From the UI, if using the Form view and select Remove Alternate Service, click save, if they refresh the route information it still shows the route configuration with Alternate service defined.
If they use the YAML view, and remove the entry from there and save it's gone properly.
If they use the CLI and edit the route, and remove the alternate service section, it also works properly.

Tests:

I have tested this scenario in my test cluster with OCP v4.13

I have created a route with the Alternate Backends:
~~~

oc describe routes.route.openshift.io
Name: httpd-example
Namespace: test-ab
Created: 5 minutes ago
Labels: app=httpd-example
template=httpd-example
Annotations: openshift.io/generated-by=OpenShiftNewApp
openshift.io/host.generated=true
Requested Host: httpd-example-test-ab.apps.shrocp4upi413ovn.lab.upshift.rdu2.redhat.com
exposed on router default (host router-default.apps.shrocp4upi413ovn.lab.upshift.rdu2.redhat.com) 5 minutes ago
Path: <none>
TLS Termination: <none>
Insecure Policy: <none>
Endpoint Port: <all endpoint ports>
Service: httpd-example <-----------
Weight: 50 (50%).
Endpoints: <none>
Service: pod-b. <-----------
Weight: 50 (50%)
Endpoints: <none>
~~~

Then I tried deleting it from the Console.

After removing the Alternate Backend from the console in the Form view, I saved the config.

But upon checking the route details again in the CLI, I could see the same Alternate Backend even though I have removed it:
~~~

oc describe routes.route.openshift.io
Name: httpd-example
Namespace: test-ab
Created: 12 minutes ago
Labels: app=httpd-example
template=httpd-example
Annotations: openshift.io/generated-by=OpenShiftNewApp
openshift.io/host.generated=true
Requested Host: httpd-example-test-ab.apps.shrocp4upi413ovn.lab.upshift.rdu2.redhat.com
exposed on router default (host router-default.apps.shrocp4upi413ovn.lab.upshift.rdu2.redhat.com) 12 minutes ago
Path: <none>
TLS Termination: <none>
Insecure Policy: <none>
Endpoint Port: web
Service: httpd-example. <-----
Weight: 100 (66%)
Endpoints: 10.131.0.148:8080
Service: pod-b <-----
Weight: 50 (33%)
Endpoints: <none>
~~~

https://github.com/openshift/console/pull/13800

Bug OCPBUGS-33207: kube-scheduler doesn't need a readiness probe

View the Description View the linked PRs

Description of problem:

    There is no kubernetes service associated with the kube-scheduler, so it does not require a readiness probe.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

# In the control plane:
kubectl get services | grep scheduler
kubectl get deploy kube-scheduler | grep readiness

Actual results:

    Probe exists, but no service

Expected results:

    No probe or service

Additional info:

https://github.com/openshift/hypershift/pull/3955

Bug OCPBUGS-42470: List of default Camel K event sources disappears when adding a custom event source

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41905~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-39110~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-29528. The following is the description of the original issue:
—
Description of problem:

Camel K provides a list of Kamelets that are able to act as an event source or sink for a Knative eventing message broker.

Usually the list of Kamelets installed with the Camel K operator are displayed in the Developer Catalog list of available event sources with the provider "Apache Software Foundation" or "Red Hat Integration".

When a user adds a custom Kamelet custom resource to the user namespace the list of default Kamelets coming from the Camel K operator is gone. The Developer Catalog event source list then only displays the custom Kamelet but not the default ones.

Version-Release number of selected component (if applicable):

How reproducible:

Apply a custom Kamelet custom resource to the user namespace and open the list of available event sources in Dev Console Developer Catalog.

Steps to Reproduce:

    1. install global Camel K operator in operator namespace (e.g. openshift-operators)
    2. list all available event sources in "default" user namespace and see all Kamelets listed as event sources/sinks
    3. add a custom Kamelet custom resource to the default namespace
    4. see the list of available event sources only listing the custom Kamelet and the default Kamelets are gone from that list

Actual results:

Default Kamelets that act as event source/sink are only displayed in the Developer Catalog when there is no custom Kamelet added to a namespace.

Expected results:

Default Kamelets coming with the Camel K operator (installed in the operator namespace) should always be part of the Developer Catalog list of available event sources/sinks. When the user adds more custom Kamelets these should be listed, too.

Additional info:

Reproduced with Camel K operator 2.2 and OCP 4.14.8

screenshots: https://drive.google.com/drive/folders/1mTpr1IrASMT76mWjnOGuexFr9-mP0y3i?usp=drive_link

https://github.com/openshift/console/pull/14331

Bug OCPBUGS-33133: backport 4.15: Node provisioning fails due to metadata wipe of non-OS disks having invalid block size

View the Description View the linked PRs

backport for ~~OCPBUGS-31549~~

https://github.com/openshift/ironic-agent-image/pull/131

Bug OCPBUGS-19501: Add additonal certificate acceptance condition feature in ovnkube-identity

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/1895

Bug OCPBUGS-20479: Ignore pod sandbox creation failures due to networking when the node is NetworkUnavailable=true

View the Description View the linked PRs

The test:

[sig-network] pods should successfully create sandboxes by adding pod to network

Failed a couple payloads today with 1-2 failures in batches of 10 aggregated jobs. I looked at the most recent errors and they seem to often be the same:

1 failures to create the sandbox

ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-24-217.us-west-1.compute.internal - 475.52 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_c712fc61-5a1e-4cec-b6fa-18c8f2e91c0a_0(46df8384ffeb433fc0e4864262aa52f2ede570265c43bf8b0900f184b27b10f1): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF

This http://dummy/cni URL looked interesting and seemed worthy of a bug.

The problem is a rare failure overall, but happening quite frequently day to day, search.ci indicates lots of hits over the last two days in both 4.14 and 4.15, and seemingly ovn and sdn both:

https://search.ci.openshift.org/?search=Post+%22http%3A%2F%2Fdummy%2Fcni%22%3A+EOF&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Some of these will show as flakes as the test gets retried at times and then passes.

Additionally in 4.14 we are seeing similar failures reporting

No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?

4.14.0-0.nightly-2023-10-12-015817 show pod sandbox errors for azure & aws both show a drop from the 10th which comes after our force accept

4.14.0-0.nightly-2023-10-11-141212 had a host of failures but it is what killed aws sdn

4.14.0-0.nightly-2023-10-11-200059 aws sdn as well and shows in azure

https://github.com/openshift/origin/pull/28366

Bug OCPBUGS-23927: idp table line is missing

View the Description View the linked PRs

Description of problem:

Check on oauth page(/k8s/cluster/config.openshift.io~v1~OAuth/cluster), there is not table line for idp list now

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-22-204142

How reproducible:

Always

Steps to Reproduce:

    1.Check on oauth page(/k8s/cluster/config.openshift.io~v1~OAuth/cluster)
    2.
    3.

Actual results:

1. Miss table line for idp list

Expected results:

1. Should show idp tables

Additional info:

screenshot: https://drive.google.com/file/d/1xmF5_RYZtAfcfY57kWi9ttcahKFFd_Kc/view?usp=sharing

https://github.com/openshift/console/pull/13372

Bug OCPBUGS-30651: 4.14 CI blocked by Hypershift e2e test: TestNodePool/ValidateHostedCluster/EnsurePSANotPrivileged

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-21869~~. The following is the description of the original issue:
—
Description of problem:

Since 10/17, the Hypershift e2e job has been blocking the 4.14 CI payload. Here is a link for the job failures: 

https://sippy.dptools.openshift.org/sippy-ng/jobs/4.14/analysis?filters=%7B%22items%22:%5B%7B%22columnField%22:%22name%22,%22operatorValue%22:%22equals%22,%22value%22:%22periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn%22%7D%5D%7D

Main tests that failed:
: TestNodePool/ValidateHostedCluster/EnsurePSANotPrivileged expand_more	0s
: TestNodePool/ValidateHostedCluster expand_more	5m8s
: TestNodePool expand_more

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3744

Bug OCPBUGS-32444: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-apiserver/pull/428

Bug OCPBUGS-24035: On an SNO the new CA certificate is not loaded after updating user-ca-bundle configmap

View the Description View the linked PRs

Description of problem:

     
  On an SNO a new CA certificate is not loaded after updating user-ca-bundle
 configmap and as a result the cluster cannot pull images from a 
registry with a certificate signed by the new CA.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Update ca bundle.crt replace with a new certificate if applicable )      in `user-ca-bundle` configmap under openshift-config namespace : 
  * On the node ensure that /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt was updated with the new certificate 
     2. Create a pod which uses an image from a registry that has its certificate signed by the new CA cert provided in ca-bundle.crt 
     3.

Actual results:

    Pod fails to pull image
 *** Failed to pull image "registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/centos/centos:8": rpc error: {  code  = Unknown desc = pinging container registry registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com : 5000: Get "https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/v2/": tls: failed to vierify certificate: x509: certificate signed by unknown authority 
  * On the node try to reach the registry via curl [https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000|https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/] 
** certificate validation fails: curl [https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000|https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/] 
 curl: (60) SSL certificate problem: self-signed certificate 
 More details here: [https://curl.se/docs/sslcerts.html] 

 To be able to create a pod I had to 
  ** Run `sudo update-ca-trust`. After that curl [https//registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000|https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/]
 worked without issues but the pod creation still fails due to tls: 
failed to verify certificate: x509: certificate signed by unknown 
authority error 
  ** Run `sudo systemctl restart crio`. After that the pod creation succeeded and could pull the image

Expected results:

Additional info:

Attaching must gather

https://github.com/openshift/machine-config-operator/pull/4050

Bug OCPBUGS-29150: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2255

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	Managed
Classic (standalone cluster)	N/A
Hosted control planes	Yes
Multi node, Compact (three node), or Single node (SNO), or all	N/A
Connected / Restricted Network	Connected
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	x86_64 ARM
Operator compatibility	N/A
Backport needed (list applicable versions)	N/A
UI need (e.g. OpenShift Console, dynamic plugin, OCM)	N/A
Other (please specify)

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both
Classic (standalone cluster)
Hosted control planes	Applicable
Multi node, Compact (three node), or Single node (SNO), or all
Connected / Restricted Network
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)
Operator compatibility
Backport needed (list applicable versions)
UI need (e.g. OpenShift Console, dynamic plugin, OCM)
Other (please specify)

Requirements	Notes	IS MVP
Discover new offerings in Home Dashboard		Y
Access details outlining value of offerings		Y
Access step-by-step guide to install offering		N
Allow developers to easily find and use newly installed offerings		Y
Support air-gapped clusters		Y

4.15.44

Changes from 4.14.45

Complete Features

Note: phase 2 target is tech preview.

Feature Overview

Goals & Requirements

Current state

Desired outcome

User stories

Goal

Non-Requirements

Notes

Done Checklist

Feature Overview (aka. Goal Summary)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Feature Overview (aka. Goal Summary)

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Goal

Why is this important?

Acceptance Criteria

Feature Overview (aka. Goal Summary)

Documentation Considerations

Feature Overview:

Goals

Requirements:

Background

Customer Considerations

Documentation Considerations

Feature Overview (aka. Goal Summary)

User Story

Background

Steps

Stakeholders

Definition of Done

Background

Steps

Stakeholders

Feature Overview

Background, and strategic fit

Acceptance Criteria

Epic Goal

Why is this important?

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

User Story

Background

Steps

Stakeholders

Definition of Done

User Story