Skip to content
This repository was archived by the owner on Jan 2, 2025. It is now read-only.

RoundRobin Load Balancing: When a node is down, all 1/N requests always fail. #312

Open
lseelenbinder opened this issue Feb 8, 2020 · 7 comments
Assignees

Comments

@lseelenbinder
Copy link

Due to how the RoundRobin(Sync) is configured, whenever one of the backing nodes is down because of outages or maintenance, all requests that would be routed to that R2D2 pool fail (because that pool has no live connections and cannot create anymore).

This is a blocking bug to using the RoundRobin load balancing mechanism, in my opinion, since it removes all possibility of failover to another node, without implementing somewhat complex logic in the client.

Was this a known limitation I overlooked or should we look into adjusting the implementation so perhaps the collection of known nodes is used equally and when one is down, others can be used?

@AlexPikalov
Copy link
Owner

Hi @lseelenbinder ,
Yes, it was overlooked. Will try to come up with a solution for that bug.
Thanks for reporting

@AlexPikalov AlexPikalov added the bug label Feb 8, 2020
@AlexPikalov AlexPikalov self-assigned this Feb 8, 2020
@lseelenbinder
Copy link
Author

No problem, @AlexPikalov.

After I realized what was happening, I knew it was a bit of an edge case that wouldn't be too easy to accidentally replicate during testing, but quite common in production because machines are always coming and going during maintenance.

We're just going to revert to SingleNode and use HAProxy to load balance the actual instances, so this isn't a blocker for us to go into production.

@AlexPikalov
Copy link
Owner

@lseelenbinder
Good to know that it doesn't block you. However I think it's a good occasion to implement a feature that was requested almost 2 years ago #113 The solution itself may be based on Cassandra server events, namely on topology change: removed node. So if load balancer will remove a node reacting on this event, it may help to avoid the situation when load balancer returns a dead node

@lseelenbinder
Copy link
Author

@AlexPikalov, that's a great idea!

My only concern is keeping the ability to limit which nodes a specific config would ever connect to, regardless of added or removed nodes (even if that means it has no live nodes to talk to).

@AlexPikalov
Copy link
Owner

AlexPikalov commented Mar 2, 2020

Hi @lseelenbinder ,
I've just completed a draft implementation of dynamic clusters in #313.

These changes will remove dead node from cluster load balancing basing on received Topology Change event received from a node. I'm about to test it, but would really appreciate if you could check it from your end if it solves your case. Here is the new session factory function that will include this logic https://github.com/AlexPikalov/cdrs/blob/feat/113/src/cluster/session.rs#L236.

Comparing to new it has an extra argument event_src: NodeTcpConfig<'a, A> which is a configuration for a node that will be used as an event source.

@AlexPikalov
Copy link
Owner

So far, I've been able to find some issues with a proposed solution. Fixing it

@lseelenbinder
Copy link
Author

Hi @AlexPikalov,

Thanks for fixing this! I won't have a chance to test it for a few days, but a one thing about the design is confusing for me.

NodeTcpConfig implies a single node is the source for the events, which, in my mind, doesn't actually help us any, since we still have a single point of failure. If that node happens to fail (or go down for maintenance, in the more likely scenario), we're still in the same position as before where one node failing causes issues across the cluster. Am I missing something in how it's intended to be used or how it works?

Our method of using HAProxy to balance local DC nodes is working quite well, and it looks like this method would probably require us to continue doing that for the event source.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants