GitHub - K-Wu/FlashTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training

Code Health Badges

Use

Please set up the environment variable first in order for the megatron package to find out the location of this package.

export PYTHONPATH=/path/to/FlashTrain/third_party/Megatron-DeepSpeed:/home/kunwu2/FlashTrain:$PYTHONPATH

Avoid Excessive Thread Launched by `import deepspeed`

In deepspeed/init.py, move from .runtime.hybrid_engine import DeepSpeedHybridEngine to the if clause that uses it.

Dependencies

Install GPUDirect-Storage

Check flashTrain/docs/GPUDIRECT_STORAGE.md

Install Kvikio

Check Kvikio

Notice that >=24.06 shall be installed to get the new raw_read|write_async API. 24.06 is currently the nightly release, so please install it through the nightly channel as instructed.

To update an existing version to a nightly build, the command is something like:

conda create --name dev_flashtrain python==3.11
conda activate dev_flashtrain
conda search rapidsai-nightly::kvikio
conda install -c rapidsai-nightly -c conda-forge kvikio==24.08.00a libkvikio==24.08.00a

Install Python Package Dependencies

pip install -r requirements.txt
pip install -r requirements_torch.txt

Install apex

Follow the instruction here to install from source. Do not install apex via pip directly because the megatron code dependent on apex won't work in this case.

Set path to the CUDA version associated with the PyTorch library. E.g.,

export PATH=/usr/local/cuda-12.1/bin:$PATH

When there is a crypt.h not found error, install the following package.

conda install --channel=conda-forge libxcrypt
export CPATH=/path/to/conda/envs/<env_name>/include/

Install Megatron-DeepSpeed

Go to third_party/Megatron-DeepSpeed and execute the following command.

pip install .

Building the Cufile Malloc Hook and Use It

We created a simple cuda malloc hook that registers every allocated memory to cuFile in order to get the optimized pinned gpu memory transfer performance without the need to make a custom PyTorch allocator or alternation to the PyTorch runtime binary. Please build it by executing the flashtrain/malloc_hook/make.sh.

You will need to modify the following code to use the actual location of the built hook.so

LD_PRELOAD path in custom scripts such as third_party/Megatron-DeepSpeed/examples/pretrain_bert_distributed.sh.

The hard-coded ctypes.CDLL path in flashtrain/tensor_cache/__init__.py

Install Transformer-Engine (Optional)

pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

If cudnn is not found, please download the cudnn tarball, retry with CUDNN_PATH environment variable set to the folder containing the cudnn library.

If CUDA:cublas is not found by CMake, do the following.

conda install nvidia/label/cuda-12.2.0::cuda-libraries
conda install nvidia/label/cuda-12.2.0::cuda-libraries-dev
conda install nvidia/label/cuda-12.2.0::cuda-tools

If CUDA::nvToolsExt is not found, replace it with CUDA::nvtx3 transformer_engine/common/CMakeLists.txt

Reference: NVIDIA/TransformerEngine#879

Contact

Kun Wu kunwu2 (at) illinois (dot) edu

Name		Name	Last commit message	Last commit date
Latest commit History 322 Commits
flashtrain		flashtrain
third_party		third_party
.deepsource.toml		.deepsource.toml
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
requirements_torch.txt		requirements_torch.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Health Badges

Use

Avoid Excessive Thread Launched by `import deepspeed`

Dependencies

Install GPUDirect-Storage

Install Kvikio

Install Python Package Dependencies

Install apex

Install Megatron-DeepSpeed

Building the Cufile Malloc Hook and Use It

Install Transformer-Engine (Optional)

Contact

About

Releases

Packages

Languages

License

K-Wu/FlashTrain

Folders and files

Latest commit

History

Repository files navigation

Code Health Badges

Use

Avoid Excessive Thread Launched by import deepspeed

Dependencies

Install GPUDirect-Storage

Install Kvikio

Install Python Package Dependencies

Install apex

Install Megatron-DeepSpeed

Building the Cufile Malloc Hook and Use It

Install Transformer-Engine (Optional)

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Avoid Excessive Thread Launched by `import deepspeed`

Packages