-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB][PITR] yb-admin command is timing out with PITR+Tablet splitting #13022
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
pitr
priority/medium
Medium priority issue
Comments
sanketkedia
added a commit
that referenced
this issue
Jul 21, 2022
…time in the past when tablet splitting was ongoing Summary: This diff adds support for restoring to points in time in the past when a tablet splitting was ongoing. Briefly the following algorithm is used: 1. If either of the child tablets (or both) are not registered on the master as of the time to which we are restoring then we restore the parent tablet and hide the child tablets. 2. If both the child tablets are registered on the master then we restore the child tablets and hide the parent. This works because at the time when restore was initiated, we are waiting for splits to complete. Thus at current time split children are ready, so its safe to restore the children and use hybrid time filter added as part of the PITR to ensure only restored rows are visible. Test Plan: Different phases like 1. Restore before the middle key is fetched: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeGetSplitKey 2. Restore after only one child is registered with the master: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminRestoreDuringSplit.RestoreAfterOneChildRegistered 3. Restore after both the children registered but SPLIT_OP not applied: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeSplitOpIsApplied 4. Restore after children RUNNING but parent not HIDDEN: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeParentHidden 5. Restore after children RUNNING and parent HIDDEN: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminSnapshotScheduleTest.VerifyRestoreWithDeletedTablets Reviewers: slingam, timur, sergei, asrivastava, zdrudi Reviewed By: asrivastava, zdrudi Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D18299
sanketkedia
added a commit
that referenced
this issue
Jul 22, 2022
Summary: Commit 675d486 introduced compilation error on gcc11. This diff fixes them. The following errors were noticed: ``` /------------------------------------------------------------------------------- [2022-07-21T06:22:35.385Z] | COMPILATION FAILED [2022-07-21T06:22:35.385Z] |------------------------------------------------------------------------------- [2022-07-21T06:22:35.385Z] ent/src/yb/master/restore_sys_catalog_state.cc:282:5: error: multi-line comment [-Werror=comment] [2022-07-21T06:22:35.385Z] 282 | // / \ [2022-07-21T06:22:35.385Z] | ^ [2022-07-21T06:22:35.385Z] ent/src/yb/master/restore_sys_catalog_state.cc:284:5: error: multi-line comment [-Werror=comment] [2022-07-21T06:22:35.385Z] 284 | // / \ / \ [2022-07-21T06:22:35.385Z] | ^ [2022-07-21T06:22:35.385Z] cc1plus: all warnings being treated as errors [2022-07-21T06:22:35.385Z] [2022-07-21T06:22:35.385Z] Input files: [2022-07-21T06:22:35.385Z] src/yb/master/CMakeFiles/master.dir/__/__/__/ent/src/yb/master/restore_sys_catalog_state.cc.o [2022-07-21T06:22:35.385Z] ent/src/yb/master/restore_sys_catalog_state.cc [2022-07-21T06:22:35.385Z] Output file (from -o): src/yb/master/CMakeFiles/master.dir/__/__/__/ent/src/yb/master/restore_sys_catalog_state.cc.o [2022-07-21T06:22:35.385Z] \------------------------------------------------------------------------------- ``` ``` /------------------------------------------------------------------------------- [2022-07-21T06:22:25.963Z] | COMPILATION FAILED [2022-07-21T06:22:25.963Z] |------------------------------------------------------------------------------- [2022-07-21T06:22:25.963Z] src/yb/master/master_snapshot_coordinator.cc: In member function 'yb::Status yb::master::MasterSnapshotCoordinator::Impl::RestoreSysCatalogReplicated(int64_t, const yb::tablet::SnapshotOperation&, yb::Status*)': [2022-07-21T06:22:25.963Z] src/yb/master/master_snapshot_coordinator.cc:466:5: error: missing initializer for member 'yb::master::SnapshotScheduleRestoration::non_system_tablets_to_restore' [-Werror=missing-field-initializers] [2022-07-21T06:22:25.963Z] 466 | }); [2022-07-21T06:22:25.963Z] | ^ [2022-07-21T06:22:25.963Z] cc1plus: all warnings being treated as errors [2022-07-21T06:22:25.963Z] [2022-07-21T06:22:25.963Z] Input files: [2022-07-21T06:22:25.963Z] src/yb/master/CMakeFiles/master.dir/master_snapshot_coordinator.cc.o [2022-07-21T06:22:25.963Z] src/yb/master/master_snapshot_coordinator.cc [2022-07-21T06:22:25.963Z] Output file (from -o): src/yb/master/CMakeFiles/master.dir/master_snapshot_coordinator.cc.o [2022-07-21T06:22:25.963Z] \------------------------------------------------------------------------------- ``` Test Plan: Jenkins Reviewers: asrivastava Reviewed By: asrivastava Subscribers: jason, zyu, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D18482
sanketkedia
added a commit
that referenced
this issue
Jul 25, 2022
…rent points of time in the past when tablet splitting was ongoing Summary: This diff adds support for restoring to points in time in the past when a tablet splitting was ongoing. Briefly the following algorithm is used: 1. If either of the child tablets (or both) are not registered on the master as of the time to which we are restoring then we restore the parent tablet and hide the child tablets. 2. If both the child tablets are registered on the master then we restore the child tablets and hide the parent. This works because at the time when restore was initiated, we are waiting for splits to complete. Thus at current time split children are ready, so its safe to restore the children and use hybrid time filter added as part of the PITR to ensure only restored rows are visible. Original commit: 675d486 / D18299 Test Plan: Jenkins: rebase: 2.14 Jenkins: urgent Different phases like 1. Restore before the middle key is fetched: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeGetSplitKey 2. Restore after only one child is registered with the master: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminRestoreDuringSplit.RestoreAfterOneChildRegistered 3. Restore after both the children registered but SPLIT_OP not applied: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeSplitOpIsApplied 4. Restore after children RUNNING but parent not HIDDEN: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeParentHidden 5. Restore after children RUNNING and parent HIDDEN: ybd --cxx_test yb-admin-snapshot-schedule-test --gtest-filter YbAdminSnapshotScheduleTest.VerifyRestoreWithDeletedTablets Reviewers: zdrudi, asrivastava Reviewed By: asrivastava Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D18494
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
pitr
priority/medium
Medium priority issue
Jira Link: DB-2738
Description
Version: 2.14.0.0-b61
Steps:
Create database and table
Collect tablets count and create snapshot schedule and collect time(t1)
Load data and observe the increase in tablets
Then stop/kill the workload and restore back to collected time(t1) and observe the restore snapshot schedule time.
With less number of tablets yb-admin command(restore_snapshot_schedule) taking > 30 sec and < 60 sec
If there are more tablets then there are chances of timeout error of yb-admin command
ex:
Error running restore_snapshot_schedule: Timed out (yb/tools/yb-admin_client_ent.cc:561): Timed out waiting for tablet splitting to complete.
once observed this timeout error with < 10 tablets/table and yb-admin command(restore_snapshot_schedule) got stuck and it is timing out even with 10 min wait time(some debug gflags are enabled).
Note: It is good to document this behaviour( restore takes some more time with tablet splitting )
The text was updated successfully, but these errors were encountered: