-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare PodCIDRs in NodeRouteController #7034
base: main
Are you sure you want to change the base?
Compare PodCIDRs in NodeRouteController #7034
Conversation
When determining whether a Node's configuration has changed. Note that when writing a controller, we should never assume that when the object name is the same, all immutable fields are guaranteed to be the same, and as a result we should always check all of them. This applies even for controllers without a workqueue (which is not the case here), as a Delete followed by a Create with the same name can show as an Update event. It is more likely to happen when the controller uses a workqueue, as in that case, even if the Delete and Add event handlers are called sequentially, the key may only be processed once, after the informer store has been updated with the new object. Fixes antrea-io#6965 Signed-off-by: Antonin Bas <[email protected]>
if len(podCIDRStrs) == 0 { | ||
// If no valid PodCIDR is configured in Node.Spec, return immediately. | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the previous routes be deleted if there are? otherwise the following situation may cause an issue:
- Node A (uid 1) had a PodCIDR 10.0.0.0/24;
- Node A (uid 1) was deleted, PodCIDR 10.0.0.0/24 was released;
- Node B was created, PodCIDR 10.0.0.0/24 was allocated to Node B;
- Node A (uid 2) was created, no PodCIDR was allocated to it.
Node C observed two events: Node B was updated with PodCIDR 10.0.0.0/24, Node A was updated with empty PodCIDR.
When reconciling Node B, the controller won't create new route to 10.0.0.0/24 because of the following nodesHaveSamePodCIDR
check.
When reconciling Node B, the controller won't delete old route to 10.0.0.0/24 because it returns immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right that's an issue.
I think I was assuming that a Node wouldn't stay without PodCIDR, and that it would be resolved, but I guess this situation can last if the cluster has run out of Node CIDRs.
// Route is already added for this Node and Node MAC, transport IP, podCIDRs and WireGuard | ||
// public key are not changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously it didn't need to delete stale routes first because the PodCIDR is used as key in routes/flows. Since it could change now, we should delete stale routes first when it changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix that. Old flows should already be deleted because our OF client uses the node Name as the key (and I think we handle this case already). But there is definitely an issue with routes as they do use the PodCIDR as the key.
When determining whether a Node's configuration has changed.
Note that when writing a controller, we should never assume that when the object name is the same, all immutable fields are guaranteed to be the same, and as a result we should always check all of them. This applies even for controllers without a workqueue (which is not the case here), as a Delete followed by a Create with the same name can show as an Update event. It is more likely to happen when the controller uses a workqueue, as in that case, even if the Delete and Add event handlers are called sequentially, the key may only be processed once, after the informer store has been updated with the new object.
Fixes #6965