Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runner/rbd: do not reopen device if blacklisted #384

Closed
wants to merge 1 commit into from

Conversation

mpatlasov
Copy link

tcmu_acquire_dev_lock() must not skip tcmu_notify_conn_lost()
if handler told us explicitly that we are blaclisted.

Otherwise, kernel lio won't be flushed, and a stale request
sitting somewhere in kernel queues may come to us later, when
we have reacquired the lock.

Signed-off-by: Maxim Patlasov [email protected]

tcmu_acquire_dev_lock() must not skip tcmu_notify_conn_lost()
if handler told us explicitly that we are blaclisted.

Otherwise, kernel lio won't be flushed, and a stale request
sitting somewhere in kernel queues may come to us later, when
we have reacquired the lock.

Signed-off-by: Maxim Patlasov <[email protected]>
@@ -299,7 +299,8 @@ int tcmu_acquire_dev_lock(struct tcmu_device *dev)
retries++;
goto retry;
}

/* fall through */
case TCMUR_LOCK_BLACKLISTED:
tcmu_dev_dbg(dev, "Fail handler device connection.\n");
tcmu_notify_conn_lost(dev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch and debugging this.

We used to do something similar. The problem was that for failback or failover with N > 2 nodes, this would cause the initiator to failover extra times because the conn lost call kills the session the initiator is trying to use causing it to try another path then later switch back over when the tpg is enabled. I think that is definitely better than data corruption though.

Let me do some more testing in the morning.

@mikechristie
Copy link
Collaborator

Hey,

We are going to try and go with the STPG based approach I posted on the list. That should prevent us from hitting the ping pong storms during failback with multi-lun targets.

I will update the patches I posted on the list with vmware/windows support and fix some bugs in the next data.

@mpatlasov
Copy link
Author

Mike,

Does STPG based approach assume that tcmu-runner won't support implicit ALUA? Because otherwise, it seems to be a bug to skip tcmu_notify_conn_lost() on acquiring the lock. What do you think?

@mikechristie
Copy link
Collaborator

runner reports in the RTPG what we support and the initiator follows that. However, if there were to be a bug in the initiator where it ignores our settings, then runner will not perform implicit failovers when in explicit mode (the ceph-iscsi-config patch is what switches runner to use explicit instead of implicit mode).

So for your test it would work like this:

  1. STPG successfully executed on node1.
  2. WRITEs get stuck on node1.
  3. Failover to node2. WRITEs execute ok on this node.
  4. If the WRITEs are unjammed at this time they are just failed, because we will hit the blacklist checks or unlocked checks.
  5. If node2 were to fail while the commands were still stuck then node1's iscsi session would normally have dropped and not allowing new logins due to the stuck WRITEs on node1. If the initiator did not escalate to session level recovery though, then before doing new IO the initiator would send a STPG and that would be stuck behind the stuck WRITEs from step 2. Before we can execute the STPG then we have to wait for the stuck commands before it to unjam and get failed.
  6. Once the WRITEs unjam and are failed the STPG is executed. If the STPG successful that is reported to the initiator and it will start sending IO.

Note that for the tools/configfs we set explicit and implicit just because the latter is required for the tools to interact with the configfs interface. Those values do not get reported to the initiator for tcmu though, so in the tcmu runner RTPG handler in the posted patch we only report explicit support.

@mikechristie
Copy link
Collaborator

So just to be clear I will probably just delete the implicit support code, because with explicit it is never used and I am not sure if a user would ever want to turn it on.

@mpatlasov
Copy link
Author

Thank you for detailed explanation. Seems STPG plays the role of barrier:

If the initiator did not escalate to session level recovery though, then before doing new IO the initiator would send a STPG and that would be stuck behind the stuck WRITEs from step 2. Before we can execute the STPG then we have to wait for the stuck commands before it to unjam and get failed.

That's exactly I wanted to clarify, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants