-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit fo #407
Explicit fo #407
Conversation
One note. I left implicit failover support, but I am planning on removing it when I figure out if we need to make the explicit failover operation execute in a different thread than the cmdproc one, because that would use the same locking/threading as implicit support. |
@mikechristie We also need to consider the upgrade path from implicit configurations -> explicit. How would an initiator detect that it should use explicit ALUA on an active session? Would we have to upgrade and restart tcmu-runner's across all gateway nodes prior to upgrading ceph-iscsi-config? |
case -ENOENT: | ||
/* No lock holder yet */ | ||
break; | ||
case -ESHUTDOWN: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: seems to be leaking RBD-specific error codes back to the generic handler. Could you just re-use the existing lock state enums (a la TCMUR_DEV_LOCK_FENCED
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to use the TCMUR_DEV_LOCK codes like suggested above.
Note off list I said I was going to use the TCMU error codes being added in 403, but that is not done yet. Will convert to them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I flip flopped again on this. In this PR I left the -Exyz codes. In this patchset
I went and made all the core and callout IO path code including above use common TCMU_ status codes.
rbd.c
Outdated
@@ -74,6 +74,10 @@ | |||
#endif | |||
#endif | |||
|
|||
#define TCMU_RBD_LOCKER_TAG_KEY "tcmu_rbd_locker_tag" | |||
#define TCMU_RBD_LOCKER_TAG_FMT "tcmu_tag=%hu,rbd_client=%s" | |||
#define TCMU_RBD_LCOKER_BUF_LEN 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: typo
rbd.c
Outdated
metadata_owner = strstr(buf, "rbd_client="); | ||
if (!metadata_owner) { | ||
tcmu_dev_err(dev, "Invalid lock tag %s.\n", buf); | ||
/* Force initaitor to retry STPG */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: typo
tcmu_dev_dbg(dev, "owner mismatch %s %s\n", metadata_owner, | ||
owners[0]); | ||
/* | ||
* A node could be in the middle of grabbing the lock or it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concern about this potential race? I had originally thought of having each target register a metadata key of something like tcmu_target_group_<group id>=<client #>
just after opening the image. Then, once the lock is acquired, the owner is already in the format of client.<client #>
so you would just need to query the 2 - 4 metadata tags for each group to find the matching client id. Your solution is nice since it only involves a single metadata key lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... apparently never mind. I thought that the librbd API provided the client name but it looks like it only provides the IP address of the owner. I guess that will be something to fix when I add shared lock support (along w/ the ability to store user-provided tags on the lock).
rbd.c
Outdated
if (tcmu_rbd_has_lock(dev) != 1) | ||
return TCMUR_DEV_LOCK_UNLOCKED; | ||
|
||
ret = rbd_metadata_remove(state->image, TCMU_RBD_LOCKER_TAG_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible this could race w/ another client stealing the lock? Perhaps just leave it set to the old value and let the mismatch code above detect that it's out-of-date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good. I was thinking if another client took the lock we would be blacklisted by the time we got here. But yeah your suggestion is nicer because we have that other mismatch code and then it saves the worry here. Will drop.
The next patches will not want infinite retries during STPG handling. Make this configurable. Signed-off-by: Mike Christie <[email protected]>
In this first simple pass we will run tcmu_acquire_dev_lock from the STPG handler. In this case we cannot and do not want to flush the ring because we would end up waiting on ourself. Signed-off-by: Mike Christie <[email protected]>
Signed-off-by: Mike Christie <[email protected]>
Add more lock states. Signed-off-by: Mike Christie <[email protected]>
Signed-off-by: Mike Christie <[email protected]>
tcmu_acquire_dev_lock should only try to reopen if the device is fenced. If we cannot reach the cluster due to a network issue then there is no point in trying to do a reopen. Also just return a new lock state instead of having the extra TCMUR_LOCK return value. Signed-off-by: Mike Christie <[email protected]>
Add back explicit failover support. Signed-off-by: Mike Christie <[email protected]>
We need to report the remote ports alua state, but the rbd handler is not able to notify remote nodes when a lock has been taken. This has handlers like this set the alua active optimized alua group when the lock is taken. When a RTPG is sent we then get this info and update our local state. Signed-off-by: Mike Christie <[email protected]>
This has rbd get/set the lock tag in the image metadata. It also stores the current locker with the tag so if we should fail during the lock,metadata_set operation we will know if the tag is valid. Signed-off-by: Mike Christie <[email protected]>
get_lock_state is no longer used, so remove it. Signed-off-by: Mike Christie <[email protected]>
I am still working on upgrade instructions. In some cases it is going to be messy because the switch in supported failover types is not automatically detected. It will take some manual commands. |
Add back explicit failover support. It requires the ceph-iscsi-config changes here:
ceph/ceph-iscsi-config#54