Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] Refact tos downloader in Runtime #510

Merged
merged 13 commits into from
Dec 10, 2024
Merged

Conversation

brosoul
Copy link
Collaborator

@brosoul brosoul commented Dec 9, 2024

Pull Request Description

  • Refact tos downloader in Runtime
  • Fix bug that download target could contain directory
  • Add env DOWNLOADER_S3_MAX_IO_QUEUE and DOWNLOADER_S3_IO_CHUNKSIZE to control the memory could be used during downloading

TODO: find suitable envs settings for MAX_IO_QUEUE and S3_IO_CHUNKSIZE.

Related Issues

Resolves: #320 , #358

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@brosoul brosoul force-pushed the linhui/refact-tos-download branch from 4bbc9a1 to 5b2f729 Compare December 9, 2024 15:47
}


class TOSDownloader(S3BaseDownloader):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, can we keep the tos file and move TOSDownloader to tos.py?

Copy link
Collaborator

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good to me. I left some comments on legacy codes and file should store TosDownloader

@brosoul
Copy link
Collaborator Author

brosoul commented Dec 10, 2024

cc @Jeffwan
The following figure shows the comparison of the download speed from TOS SDK and Boto3 SDK when we set the number of download threads to 32 (DOWNLOADER_NUM_THREADS=32) and the shard size to 64MB (DOWNLOADER_PART_CHUNKSIZE=67108864). We downloaded the safetensors file from deepseek-coder-6.7b-instruct , totaling 12.6 GB.
sdk_compare
Note:

  • The display of chunk_size in TOS SDK is represented by the chunk_size parameter in utils.copy_and_verify_length function. It is hard coded and cannot be changed directly through environment variables or passing parameters.
  • The display of chunk_size in Boto3 SDK is represented by theio_chunksize in TransferConfig. It can be changed through parameter passing.

During the comparison process, we can draw the following conclusions:

  • After parameter adjustment, the Boto3 SDK can achieve a bandwidth limit of 10Gbps for TOS (Small model files) that better than the performance of TOS SDK
  • Boto3 allows for more convenient adjustment of parameter settings
  • During the download process, due to the slower speed of disk IO compared to network IO, the downloaded files will be temporarily stored in buff/cache. Guesswork: So when downloading large model files, when the available memory for caching the downloaded files is exhausted, the download speed will eventually approach disk bandwidth. Screenshots of memory and bandwidth during the download of 200b may illustrate this point.

image

@Jeffwan Jeffwan changed the title [Misc] Refact tos downloader in Runtime [perf] Refact tos downloader in Runtime Dec 10, 2024
@Jeffwan
Copy link
Collaborator

Jeffwan commented Dec 10, 2024

great. this is excellent!

Copy link
Collaborator

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems all comments have been addressed. LGTM

@Jeffwan Jeffwan merged commit 83d2f03 into main Dec 10, 2024
10 checks passed
@Jeffwan Jeffwan deleted the linhui/refact-tos-download branch December 10, 2024 18:30
gangmuk pushed a commit that referenced this pull request Jan 25, 2025
* refact: refact TOSDownloader use boto

* test: fix test case after refact

* fix: download all the directory failed

* style

* feat: set max memory avaiable while download

* fix dependency

* refact: move the action that filter file from folder into

* style

* refact: keep origin tos downloader implement and add version control env

* test: fix test

* fix test

* style

* misc: use better envs config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refact the implementation of download from tos with boto3
2 participants