Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: Do not mutate pods in the pod manager #116482

Merged
merged 1 commit into from Apr 12, 2023

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Mar 10, 2023

The pod manager is a cache and modifying objects returned from the pod manager can cause race conditions in the Kubelet. In this case, it causes static pod status from the mirror pod to leak back to the config source, which means a static pod whose mirror pod is set to a terminal phase (succeeded or failed) cannot restart.

/kind bug
/sig node
/priority critical-urgent

Setting a mirror pod's phase to Succeeded or Failed can prevent the corresponding static pod from restarting due mutation of a Kubelet cache.

The pod manager is a cache and modifying objects returned from the
pod manager can cause race conditions in the Kubelet. In this case,
it causes static pod status from the mirror pod to leak back to
the config source, which means a static pod whose mirror pod is
set to a terminal phase (succeeded or failed) cannot restart.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 10, 2023
@k8s-ci-robot
Copy link
Contributor

@smarterclayton: The label(s) priority/important-critical cannot be applied, because the repository doesn't have them.

In response to this:

The pod manager is a cache and modifying objects returned from the
pod manager can cause race conditions in the Kubelet. In this case,
it causes static pod status from the mirror pod to leak back to
the config source, which means a static pod whose mirror pod is
set to a terminal phase (succeeded or failed) cannot restart.

/kind bug
/sig node
/priority important-critical

Setting a mirror pod's phase to Succeeded or Failed can prevent the corresponding static pod from restarting due mutation of a Kubelet cache.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 10, 2023
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2023
@smarterclayton
Copy link
Contributor Author

@SergeyKanzhelev

I may need to add a test for this, I'm testing it in #116479 which suggested this may be an issue.

@smarterclayton
Copy link
Contributor Author

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 10, 2023
@SergeyKanzhelev
Copy link
Member

/priority critical-urgent

Is it a regression?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton, Tusenka

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@smarterclayton
Copy link
Contributor Author

This regressed whenever this was introduced (mutating the cache), but is not a change in 1.27.

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Mar 10, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Mar 10, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 10, 2023
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Mar 10, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Apr 6, 2023

/lgtm
/assign @mrunalp @derekwaynecarr @dchen1107

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 6, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a4d1277516d8409766d0cb5e9dc2e4fafc0c59a5

@bart0sh bart0sh moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Apr 6, 2023
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

1 similar comment
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@pacoxu
Copy link
Member

pacoxu commented Apr 12, 2023

It should be cherry-picked to v1.24~v1.27.

/retest-required

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 2f1db33 into kubernetes:master Apr 12, 2023
11 checks passed
SIG Node PR Triage automation moved this from Needs Approver to Done Apr 12, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Apr 12, 2023
k8s-ci-robot added a commit that referenced this pull request Apr 13, 2023
…of-#116482-upstream-release-1.27

Automated cherry pick of #116482: kubelet: Do not mutate pods in the pod manager
k8s-ci-robot added a commit that referenced this pull request Apr 13, 2023
…of-#116482-upstream-release-1.26

Automated cherry pick of #116482: kubelet: Do not mutate pods in the pod manager
k8s-ci-robot added a commit that referenced this pull request Apr 13, 2023
…of-#116482-upstream-release-1.25

Automated cherry pick of #116482: kubelet: Do not mutate pods in the pod manager
k8s-ci-robot added a commit that referenced this pull request Apr 13, 2023
…of-#116482-upstream-release-1.24

Automated cherry pick of #116482: kubelet: Do not mutate pods in the pod manager
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

10 participants