Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare PodCIDRs in NodeRouteController #7034

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

antoninbas
Copy link
Contributor

When determining whether a Node's configuration has changed.

Note that when writing a controller, we should never assume that when the object name is the same, all immutable fields are guaranteed to be the same, and as a result we should always check all of them. This applies even for controllers without a workqueue (which is not the case here), as a Delete followed by a Create with the same name can show as an Update event. It is more likely to happen when the controller uses a workqueue, as in that case, even if the Delete and Add event handlers are called sequentially, the key may only be processed once, after the informer store has been updated with the new object.

Fixes #6965

When determining whether a Node's configuration has changed.

Note that when writing a controller, we should never assume that when
the object name is the same, all immutable fields are guaranteed to be
the same, and as a result we should always check all of them. This
applies even for controllers without a workqueue (which is not the case
here), as a Delete followed by a Create with the same name can show as
an Update event. It is more likely to happen when the controller uses a
workqueue, as in that case, even if the Delete and Add event handlers
are called sequentially, the key may only be processed once, after the
informer store has been updated with the new object.

Fixes antrea-io#6965

Signed-off-by: Antonin Bas <[email protected]>
@antoninbas antoninbas requested a review from tnqn February 27, 2025 23:01
Comment on lines +553 to +556
if len(podCIDRStrs) == 0 {
// If no valid PodCIDR is configured in Node.Spec, return immediately.
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the previous routes be deleted if there are? otherwise the following situation may cause an issue:

  1. Node A (uid 1) had a PodCIDR 10.0.0.0/24;
  2. Node A (uid 1) was deleted, PodCIDR 10.0.0.0/24 was released;
  3. Node B was created, PodCIDR 10.0.0.0/24 was allocated to Node B;
  4. Node A (uid 2) was created, no PodCIDR was allocated to it.

Node C observed two events: Node B was updated with PodCIDR 10.0.0.0/24, Node A was updated with empty PodCIDR.

When reconciling Node B, the controller won't create new route to 10.0.0.0/24 because of the following nodesHaveSamePodCIDR check.
When reconciling Node B, the controller won't delete old route to 10.0.0.0/24 because it returns immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right that's an issue.
I think I was assuming that a Node wouldn't stay without PodCIDR, and that it would be resolved, but I guess this situation can last if the cluster has run out of Node CIDRs.

Comment on lines +569 to +570
// Route is already added for this Node and Node MAC, transport IP, podCIDRs and WireGuard
// public key are not changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it didn't need to delete stale routes first because the PodCIDR is used as key in routes/flows. Since it could change now, we should delete stale routes first when it changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix that. Old flows should already be deleted because our OF client uses the node Name as the key (and I think we handle this case already). But there is definitely an issue with routes as they do use the PodCIDR as the key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NodeRouteController should account for PodCIDR updates
2 participants