Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lfs: add support for rate limit retries #340

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

sisp
Copy link
Collaborator

@sisp sisp commented Mar 13, 2024

I've added support for handling rate limit responses and retrying requests accordingly.

Typical rate limit responses use the status code 429 and include one of the following headers:

  • Retry-After: <delay in seconds> is used by, e.g., GitLab
  • RateLimit-Reset: <timestamp> is used by, e.g., GitLab
  • X-RateLimit-Reset: <timestamp> is used by, e.g., GitHub

The implementation is inspired by the corresponding implementation in the python-gitlab library and based on the GitHub documentation on rate limits.

@sisp sisp force-pushed the lfs/rate-limit-retry branch from 28ace21 to 42f0798 Compare March 13, 2024 11:01
@skshetry
Copy link
Member

@sisp, does this honor attempts?

@sisp
Copy link
Collaborator Author

sisp commented Mar 13, 2024

@skshetry Do you mean the attempt argument of the get_timeout() method?

@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2024

Codecov Report

Attention: Patch coverage is 30.76923% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 77.38%. Comparing base (a96c366) to head (42f0798).

Files Patch % Lines
src/scmrepo/git/lfs/client.py 30.76% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #340      +/-   ##
==========================================
- Coverage   77.50%   77.38%   -0.12%     
==========================================
  Files          39       39              
  Lines        5068     5081      +13     
  Branches      909      915       +6     
==========================================
+ Hits         3928     3932       +4     
- Misses        985      994       +9     
  Partials      155      155              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@skshetry
Copy link
Member

I am not sure if it is the same as one we pass to ExponentialRetry. But we have retries set to 5 attempts.

_SESSION_RETRIES = 5

What happens if a request gets Retry-After on all 5 attempts? Does that fail?
Or, does it keep on retrying?

@sisp
Copy link
Collaborator Author

sisp commented Mar 13, 2024

Ah, now I see what you mean. If a request gets Retry-After on all 5 attempts, then no further attempts are made. That's because get_timeout() may only use the attempt counter, but the attempts are counted and managed outside this class in aiohttp_retry.client._RequestContext._do_request(). So, it doesn't keep retrying beyond 5 in this case.

@skshetry
Copy link
Member

That's great. The only concern that I have with this PR is that a malicious server could return a large value, that might lead the user to believe dvc is frozen.

So I am not sure if this should be the default, or should be enabled via some config.

@skshetry
Copy link
Member

How does git-lfs work?

@sisp
Copy link
Collaborator Author

sisp commented Mar 13, 2024

According to the Git LFS config documentation:

  • lfs.transfer.maxretries

    Specifies how many retries LFS will attempt per OID before marking the transfer as failed. Must be an integer which is at least one. If the value is not an integer, is less than one, or is not given, a value of eight will be used instead.

  • lfs.transfer.maxretrydelay

    Specifies the maximum time in seconds LFS will wait between each retry attempt. LFS uses exponential backoff for retries, doubling the time between each retry until reaching this limit. If a server requests a delay using the Retry-After header, the header value overrides the exponential delay for that attempt and is not limited by this option.

It sounds to me as if rate limit retries can't be disabled. But I'm not able to prove it using the official Git LFS implementation.

Although I consider it unlikely, I understand your concern. How about applying the same maximum timeout from ExponentialRetry to the rate limit delay?

 class _ExponentialRetry(ExponentialRetry):
     def get_timeout(
         self, attempt: int, response: Optional[aiohttp.ClientResponse] = None
     ) -> float:
         if response is not None and response.status == HTTPStatus.TOO_MANY_REQUESTS:
+            timeout: Optional[float] = None
             if "Retry-After" in response.headers:
                 with suppress(ValueError):
-                    return float(response.headers["Retry-After"])
-            for k in ["RateLimit-Reset", "X-RateLimit-Reset"]:
-                if k in response.headers:
-                    with suppress(ValueError):
-                        return float(response.headers[k]) - time()
+                    timeout = float(response.headers["Retry-After"])
+            else:
+                for k in ["RateLimit-Reset", "X-RateLimit-Reset"]:
+                    if k in response.headers:
+                        with suppress(ValueError):
+                            timeout = float(response.headers[k]) - time()
+                            break
+            if timeout is not None:
+                return min(timeout, self._max_timeout)
         return super().get_timeout(attempt, response)

@skshetry skshetry requested a review from shcheklein March 13, 2024 16:11
@skshetry
Copy link
Member

@pmrowla, if you have any thoughts? 👋🏽

@skshetry
Copy link
Member

The only concern that I have with this PR is that a malicious server could return a large value, that might lead the user to believe dvc is frozen.

We do assume that the remotes are trusted by the users, so that may not be a big issue as I am making.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, a few questions to clarify. Thanks @sisp !

Can we also add some tests for this please?

@sisp
Copy link
Collaborator Author

sisp commented Mar 14, 2024

@shcheklein There aren't any integration tests against an LFS test server yet, so testing rate limit retries would mean significant test setup work first. I'd defer this to follow-up PRs. WDYT?

@shcheklein
Copy link
Member

@sisp yes, absolutely, I didn't mean proper integration tests. Primarily was thinking about unit tests with mocks in this case - to make sure that just cover this new Retry.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your call on unit tests. Thanks @sisp !

@sisp
Copy link
Collaborator Author

sisp commented Mar 14, 2024

I'll look into adding unit tests tomorrow.

@shcheklein
Copy link
Member

@sisp you also should be able to merge PRs now (I've added you as collaborator to the project).

@sisp
Copy link
Collaborator Author

sisp commented Mar 14, 2024

Wow, thanks, I appreciate your trust. 🙏

@skshetry skshetry merged commit 5d237d7 into iterative:main Mar 15, 2024
13 checks passed
@sisp sisp deleted the lfs/rate-limit-retry branch March 15, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants