Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB][PITR] yb-admin command is timing out with PITR+Tablet splitting #13022

Closed
Arjun-yb opened this issue Jun 23, 2022 · 0 comments
Closed
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug pitr priority/medium Medium priority issue

Comments

@Arjun-yb
Copy link
Contributor

Arjun-yb commented Jun 23, 2022

Jira Link: DB-2738

Description

Version: 2.14.0.0-b61

Steps:

  1. 
Create database and table

  2. Collect tablets count and create snapshot schedule and collect time(t1)

  3. Load data and observe the increase in tablets

  4. Then stop/kill the workload and restore back to collected time(t1) and observe the restore snapshot schedule time.

  5. With less number of tablets yb-admin command(restore_snapshot_schedule) taking > 30 sec and < 60 sec

  6. If there are more tablets then there are chances of timeout error of yb-admin command

ex: Error running restore_snapshot_schedule: Timed out (yb/tools/yb-admin_client_ent.cc:561): Timed out waiting for tablet splitting to complete.

once observed this timeout error with < 10 tablets/table and yb-admin command(restore_snapshot_schedule) got stuck and it is timing out even with 10 min wait time(some debug gflags are enabled).

Note: It is good to document this behaviour( restore takes some more time with tablet splitting )

@Arjun-yb Arjun-yb added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jun 23, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 23, 2022
@Arjun-yb Arjun-yb added the pitr label Jun 23, 2022
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jun 28, 2022
sanketkedia added a commit that referenced this issue Jul 21, 2022
…time in the past when tablet

splitting was ongoing

Summary:
This diff adds support for restoring to points in time in the past when a tablet splitting
was ongoing. Briefly the following algorithm is used:
1. If either of the child tablets (or both) are not registered on the master as of the time to which we are restoring then we
restore the parent tablet and hide the child tablets.
2. If both the child tablets are registered on the master then we restore the child tablets and hide the parent.

This works because at the time when restore was initiated, we are waiting for splits to complete.
Thus at current time split children are ready, so its safe to restore the children and
use hybrid time filter added as part of the PITR to ensure only restored rows are visible.

Test Plan:
Different phases like
1. Restore before the middle key is fetched: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeGetSplitKey
2. Restore after only one child is registered with the master: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminRestoreDuringSplit.RestoreAfterOneChildRegistered
3. Restore after both the children registered but SPLIT_OP not applied: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeSplitOpIsApplied
4. Restore after children RUNNING but parent not HIDDEN: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeParentHidden
5. Restore after children RUNNING and parent HIDDEN: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminSnapshotScheduleTest.VerifyRestoreWithDeletedTablets

Reviewers: slingam, timur, sergei, asrivastava, zdrudi

Reviewed By: asrivastava, zdrudi

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D18299
sanketkedia added a commit that referenced this issue Jul 22, 2022
Summary:
Commit 675d486 introduced compilation error on gcc11. This
diff fixes them. The following errors were noticed:

```
/-------------------------------------------------------------------------------
[2022-07-21T06:22:35.385Z] | COMPILATION FAILED
[2022-07-21T06:22:35.385Z] |-------------------------------------------------------------------------------
[2022-07-21T06:22:35.385Z] ent/src/yb/master/restore_sys_catalog_state.cc:282:5: error: multi-line comment [-Werror=comment]
[2022-07-21T06:22:35.385Z]   282 |     //                               /   \
[2022-07-21T06:22:35.385Z]       |     ^
[2022-07-21T06:22:35.385Z] ent/src/yb/master/restore_sys_catalog_state.cc:284:5: error: multi-line comment [-Werror=comment]
[2022-07-21T06:22:35.385Z]   284 |     //                              / \    /  \
[2022-07-21T06:22:35.385Z]       |     ^
[2022-07-21T06:22:35.385Z] cc1plus: all warnings being treated as errors
[2022-07-21T06:22:35.385Z]
[2022-07-21T06:22:35.385Z] Input files:
[2022-07-21T06:22:35.385Z]   src/yb/master/CMakeFiles/master.dir/__/__/__/ent/src/yb/master/restore_sys_catalog_state.cc.o
[2022-07-21T06:22:35.385Z]   ent/src/yb/master/restore_sys_catalog_state.cc
[2022-07-21T06:22:35.385Z] Output file (from -o): src/yb/master/CMakeFiles/master.dir/__/__/__/ent/src/yb/master/restore_sys_catalog_state.cc.o
[2022-07-21T06:22:35.385Z] \-------------------------------------------------------------------------------
```

```
/-------------------------------------------------------------------------------
[2022-07-21T06:22:25.963Z] | COMPILATION FAILED
[2022-07-21T06:22:25.963Z] |-------------------------------------------------------------------------------
[2022-07-21T06:22:25.963Z] src/yb/master/master_snapshot_coordinator.cc: In member function 'yb::Status yb::master::MasterSnapshotCoordinator::Impl::RestoreSysCatalogReplicated(int64_t, const yb::tablet::SnapshotOperation&, yb::Status*)':
[2022-07-21T06:22:25.963Z] src/yb/master/master_snapshot_coordinator.cc:466:5: error: missing initializer for member 'yb::master::SnapshotScheduleRestoration::non_system_tablets_to_restore' [-Werror=missing-field-initializers]
[2022-07-21T06:22:25.963Z]   466 |     });
[2022-07-21T06:22:25.963Z]       |     ^
[2022-07-21T06:22:25.963Z] cc1plus: all warnings being treated as errors
[2022-07-21T06:22:25.963Z]
[2022-07-21T06:22:25.963Z] Input files:
[2022-07-21T06:22:25.963Z]   src/yb/master/CMakeFiles/master.dir/master_snapshot_coordinator.cc.o
[2022-07-21T06:22:25.963Z]   src/yb/master/master_snapshot_coordinator.cc
[2022-07-21T06:22:25.963Z] Output file (from -o): src/yb/master/CMakeFiles/master.dir/master_snapshot_coordinator.cc.o
[2022-07-21T06:22:25.963Z] \-------------------------------------------------------------------------------
```

Test Plan: Jenkins

Reviewers: asrivastava

Reviewed By: asrivastava

Subscribers: jason, zyu, ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D18482
sanketkedia added a commit that referenced this issue Jul 25, 2022
…rent points of time in the past when tablet splitting was ongoing

Summary:
This diff adds support for restoring to points in time in the past when a tablet splitting
was ongoing. Briefly the following algorithm is used:
1. If either of the child tablets (or both) are not registered on the master as of the time to which we are restoring then we
restore the parent tablet and hide the child tablets.
2. If both the child tablets are registered on the master then we restore the child tablets and hide the parent.

This works because at the time when restore was initiated, we are waiting for splits to complete.
Thus at current time split children are ready, so its safe to restore the children and
use hybrid time filter added as part of the PITR to ensure only restored rows are visible.

Original commit: 675d486 / D18299

Test Plan:
Jenkins: rebase: 2.14
Jenkins: urgent

Different phases like
1. Restore before the middle key is fetched: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeGetSplitKey
2. Restore after only one child is registered with the master: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminRestoreDuringSplit.RestoreAfterOneChildRegistered
3. Restore after both the children registered but SPLIT_OP not applied: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeSplitOpIsApplied
4. Restore after children RUNNING but parent not HIDDEN: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminRestoreDuringSplit.RestoreBeforeParentHidden
5. Restore after children RUNNING and parent HIDDEN: ybd --cxx_test yb-admin-snapshot-schedule-test
--gtest-filter YbAdminSnapshotScheduleTest.VerifyRestoreWithDeletedTablets

Reviewers: zdrudi, asrivastava

Reviewed By: asrivastava

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D18494
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug pitr priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants