runner/rbd: do not reopen device if blacklisted #384

mpatlasov · 2018-03-14T23:37:58Z

tcmu_acquire_dev_lock() must not skip tcmu_notify_conn_lost()
if handler told us explicitly that we are blaclisted.

Otherwise, kernel lio won't be flushed, and a stale request
sitting somewhere in kernel queues may come to us later, when
we have reacquired the lock.

Signed-off-by: Maxim Patlasov [email protected]

tcmu_acquire_dev_lock() must not skip tcmu_notify_conn_lost() if handler told us explicitly that we are blaclisted. Otherwise, kernel lio won't be flushed, and a stale request sitting somewhere in kernel queues may come to us later, when we have reacquired the lock. Signed-off-by: Maxim Patlasov <[email protected]>

mikechristie · 2018-03-15T08:47:47Z

tcmur_device.c

@@ -299,7 +299,8 @@ int tcmu_acquire_dev_lock(struct tcmu_device *dev)
 			retries++;
 			goto retry;
 		}
-
+		/* fall through */
+	case TCMUR_LOCK_BLACKLISTED:
 		tcmu_dev_dbg(dev, "Fail handler device connection.\n");
 		tcmu_notify_conn_lost(dev);


Thanks for the patch and debugging this.

We used to do something similar. The problem was that for failback or failover with N > 2 nodes, this would cause the initiator to failover extra times because the conn lost call kills the session the initiator is trying to use causing it to try another path then later switch back over when the tpg is enabled. I think that is definitely better than data corruption though.

Let me do some more testing in the morning.

mikechristie · 2018-03-15T17:32:11Z

Hey,

We are going to try and go with the STPG based approach I posted on the list. That should prevent us from hitting the ping pong storms during failback with multi-lun targets.

I will update the patches I posted on the list with vmware/windows support and fix some bugs in the next data.

mpatlasov · 2018-03-15T20:08:16Z

Mike,

Does STPG based approach assume that tcmu-runner won't support implicit ALUA? Because otherwise, it seems to be a bug to skip tcmu_notify_conn_lost() on acquiring the lock. What do you think?

mikechristie · 2018-03-15T21:02:27Z

runner reports in the RTPG what we support and the initiator follows that. However, if there were to be a bug in the initiator where it ignores our settings, then runner will not perform implicit failovers when in explicit mode (the ceph-iscsi-config patch is what switches runner to use explicit instead of implicit mode).

So for your test it would work like this:

STPG successfully executed on node1.
WRITEs get stuck on node1.
Failover to node2. WRITEs execute ok on this node.
If the WRITEs are unjammed at this time they are just failed, because we will hit the blacklist checks or unlocked checks.
If node2 were to fail while the commands were still stuck then node1's iscsi session would normally have dropped and not allowing new logins due to the stuck WRITEs on node1. If the initiator did not escalate to session level recovery though, then before doing new IO the initiator would send a STPG and that would be stuck behind the stuck WRITEs from step 2. Before we can execute the STPG then we have to wait for the stuck commands before it to unjam and get failed.
Once the WRITEs unjam and are failed the STPG is executed. If the STPG successful that is reported to the initiator and it will start sending IO.

Note that for the tools/configfs we set explicit and implicit just because the latter is required for the tools to interact with the configfs interface. Those values do not get reported to the initiator for tcmu though, so in the tcmu runner RTPG handler in the posted patch we only report explicit support.

mikechristie · 2018-03-15T21:05:31Z

So just to be clear I will probably just delete the implicit support code, because with explicit it is never used and I am not sure if a user would ever want to turn it on.

mpatlasov · 2018-03-15T21:11:53Z

Thank you for detailed explanation. Seems STPG plays the role of barrier:

If the initiator did not escalate to session level recovery though, then before doing new IO the initiator would send a STPG and that would be stuck behind the stuck WRITEs from step 2. Before we can execute the STPG then we have to wait for the stuck commands before it to unjam and get failed.

That's exactly I wanted to clarify, thank you!

mikechristie reviewed Mar 15, 2018

View reviewed changes

mpatlasov closed this Mar 15, 2018

mikechristie mentioned this pull request May 21, 2018

data consistency #420

Closed

mikechristie mentioned this pull request Aug 7, 2018

vmware clustering issues with ceph/rbd iscsi HA support #341

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runner/rbd: do not reopen device if blacklisted #384

runner/rbd: do not reopen device if blacklisted #384

mpatlasov commented Mar 14, 2018

mikechristie Mar 15, 2018

mikechristie commented Mar 15, 2018

mpatlasov commented Mar 15, 2018

mikechristie commented Mar 15, 2018

mikechristie commented Mar 15, 2018

mpatlasov commented Mar 15, 2018

runner/rbd: do not reopen device if blacklisted #384

runner/rbd: do not reopen device if blacklisted #384

Conversation

mpatlasov commented Mar 14, 2018

mikechristie Mar 15, 2018

Choose a reason for hiding this comment

mikechristie commented Mar 15, 2018

mpatlasov commented Mar 15, 2018

mikechristie commented Mar 15, 2018

mikechristie commented Mar 15, 2018

mpatlasov commented Mar 15, 2018