lfs: add support for rate limit retries #340

sisp · 2024-03-13T10:59:55Z

I've added support for handling rate limit responses and retrying requests accordingly.

Typical rate limit responses use the status code 429 and include one of the following headers:

Retry-After: <delay in seconds> is used by, e.g., GitLab
RateLimit-Reset: <timestamp> is used by, e.g., GitLab
X-RateLimit-Reset: <timestamp> is used by, e.g., GitHub

The implementation is inspired by the corresponding implementation in the python-gitlab library and based on the GitHub documentation on rate limits.

skshetry · 2024-03-13T11:06:25Z

@sisp, does this honor attempts?

sisp · 2024-03-13T11:08:39Z

@skshetry Do you mean the attempt argument of the get_timeout() method?

codecov-commenter · 2024-03-13T11:10:51Z

Codecov Report

Attention: Patch coverage is 30.76923% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 77.38%. Comparing base (a96c366) to head (42f0798).

Files	Patch %	Lines
src/scmrepo/git/lfs/client.py	30.76%	9 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #340      +/-   ##
==========================================
- Coverage   77.50%   77.38%   -0.12%     
==========================================
  Files          39       39              
  Lines        5068     5081      +13     
  Branches      909      915       +6     
==========================================
+ Hits         3928     3932       +4     
- Misses        985      994       +9     
  Partials      155      155

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

skshetry · 2024-03-13T11:12:07Z

I am not sure if it is the same as one we pass to ExponentialRetry. But we have retries set to 5 attempts.

scmrepo/src/scmrepo/git/lfs/client.py

Line 41 in 42f0798

_SESSION_RETRIES = 5

What happens if a request gets Retry-After on all 5 attempts? Does that fail?
Or, does it keep on retrying?

sisp · 2024-03-13T11:18:02Z

Ah, now I see what you mean. If a request gets Retry-After on all 5 attempts, then no further attempts are made. That's because get_timeout() may only use the attempt counter, but the attempts are counted and managed outside this class in aiohttp_retry.client._RequestContext._do_request(). So, it doesn't keep retrying beyond 5 in this case.

skshetry · 2024-03-13T11:48:22Z

That's great. The only concern that I have with this PR is that a malicious server could return a large value, that might lead the user to believe dvc is frozen.

So I am not sure if this should be the default, or should be enabled via some config.

skshetry · 2024-03-13T11:48:51Z

How does git-lfs work?

src/scmrepo/git/lfs/client.py

sisp · 2024-03-13T14:32:27Z

According to the Git LFS config documentation:

lfs.transfer.maxretries

Specifies how many retries LFS will attempt per OID before marking the transfer as failed. Must be an integer which is at least one. If the value is not an integer, is less than one, or is not given, a value of eight will be used instead.

lfs.transfer.maxretrydelay

Specifies the maximum time in seconds LFS will wait between each retry attempt. LFS uses exponential backoff for retries, doubling the time between each retry until reaching this limit. If a server requests a delay using the Retry-After header, the header value overrides the exponential delay for that attempt and is not limited by this option.

It sounds to me as if rate limit retries can't be disabled. But I'm not able to prove it using the official Git LFS implementation.

Although I consider it unlikely, I understand your concern. How about applying the same maximum timeout from ExponentialRetry to the rate limit delay?

 class _ExponentialRetry(ExponentialRetry):
     def get_timeout(
         self, attempt: int, response: Optional[aiohttp.ClientResponse] = None
     ) -> float:
         if response is not None and response.status == HTTPStatus.TOO_MANY_REQUESTS:
+            timeout: Optional[float] = None
             if "Retry-After" in response.headers:
                 with suppress(ValueError):
-                    return float(response.headers["Retry-After"])
-            for k in ["RateLimit-Reset", "X-RateLimit-Reset"]:
-                if k in response.headers:
-                    with suppress(ValueError):
-                        return float(response.headers[k]) - time()
+                    timeout = float(response.headers["Retry-After"])
+            else:
+                for k in ["RateLimit-Reset", "X-RateLimit-Reset"]:
+                    if k in response.headers:
+                        with suppress(ValueError):
+                            timeout = float(response.headers[k]) - time()
+                            break
+            if timeout is not None:
+                return min(timeout, self._max_timeout)
         return super().get_timeout(attempt, response)

skshetry · 2024-03-13T16:12:17Z

@pmrowla, if you have any thoughts? 👋🏽

skshetry · 2024-03-13T16:17:04Z

The only concern that I have with this PR is that a malicious server could return a large value, that might lead the user to believe dvc is frozen.

We do assume that the remotes are trusted by the users, so that may not be a big issue as I am making.

src/scmrepo/git/lfs/client.py

shcheklein

Looks good to me, a few questions to clarify. Thanks @sisp !

Can we also add some tests for this please?

sisp · 2024-03-14T11:20:40Z

@shcheklein There aren't any integration tests against an LFS test server yet, so testing rate limit retries would mean significant test setup work first. I'd defer this to follow-up PRs. WDYT?

shcheklein · 2024-03-14T18:38:40Z

@sisp yes, absolutely, I didn't mean proper integration tests. Primarily was thinking about unit tests with mocks in this case - to make sure that just cover this new Retry.

shcheklein

Your call on unit tests. Thanks @sisp !

sisp · 2024-03-14T20:36:04Z

I'll look into adding unit tests tomorrow.

shcheklein · 2024-03-14T20:37:58Z

@sisp you also should be able to merge PRs now (I've added you as collaborator to the project).

sisp · 2024-03-14T20:43:19Z

Wow, thanks, I appreciate your trust. 🙏

lfs: add support for rate limit retries

42f0798

sisp force-pushed the lfs/rate-limit-retry branch from 28ace21 to 42f0798 Compare March 13, 2024 11:01

skshetry reviewed Mar 13, 2024

View reviewed changes

src/scmrepo/git/lfs/client.py Show resolved Hide resolved

skshetry requested a review from shcheklein March 13, 2024 16:11

shcheklein reviewed Mar 13, 2024

View reviewed changes

src/scmrepo/git/lfs/client.py Show resolved Hide resolved

shcheklein reviewed Mar 13, 2024

View reviewed changes

src/scmrepo/git/lfs/client.py Show resolved Hide resolved

shcheklein reviewed Mar 13, 2024

View reviewed changes

shcheklein reviewed Mar 14, 2024

View reviewed changes

shcheklein approved these changes Mar 14, 2024

View reviewed changes

pmrowla approved these changes Mar 14, 2024

View reviewed changes

skshetry merged commit 5d237d7 into iterative:main Mar 15, 2024
13 checks passed

sisp deleted the lfs/rate-limit-retry branch March 15, 2024 08:25

sisp mentioned this pull request Mar 15, 2024

lfs: add tests for rate limit retries #341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lfs: add support for rate limit retries #340

lfs: add support for rate limit retries #340

sisp commented Mar 13, 2024 •

edited

Loading

skshetry commented Mar 13, 2024

sisp commented Mar 13, 2024 •

edited

Loading

codecov-commenter commented Mar 13, 2024 •

edited

Loading

skshetry commented Mar 13, 2024

sisp commented Mar 13, 2024

skshetry commented Mar 13, 2024

skshetry commented Mar 13, 2024

sisp commented Mar 13, 2024

skshetry commented Mar 13, 2024

skshetry commented Mar 13, 2024

shcheklein left a comment

sisp commented Mar 14, 2024

shcheklein commented Mar 14, 2024

shcheklein left a comment

sisp commented Mar 14, 2024

shcheklein commented Mar 14, 2024

sisp commented Mar 14, 2024

lfs: add support for rate limit retries #340

lfs: add support for rate limit retries #340

Conversation

sisp commented Mar 13, 2024 • edited Loading

skshetry commented Mar 13, 2024

sisp commented Mar 13, 2024 • edited Loading

codecov-commenter commented Mar 13, 2024 • edited Loading

Codecov Report

skshetry commented Mar 13, 2024

sisp commented Mar 13, 2024

skshetry commented Mar 13, 2024

skshetry commented Mar 13, 2024

sisp commented Mar 13, 2024

skshetry commented Mar 13, 2024

skshetry commented Mar 13, 2024

shcheklein left a comment

Choose a reason for hiding this comment

sisp commented Mar 14, 2024

shcheklein commented Mar 14, 2024

shcheklein left a comment

Choose a reason for hiding this comment

sisp commented Mar 14, 2024

shcheklein commented Mar 14, 2024

sisp commented Mar 14, 2024

sisp commented Mar 13, 2024 •

edited

Loading

sisp commented Mar 13, 2024 •

edited

Loading

codecov-commenter commented Mar 13, 2024 •

edited

Loading