Feat: add shm configure into helm chart to support tensor parallel #97
Labels
feature request
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
Describe the feature
To run vLLM with TP > 1, vLLM requires existence of shared memory to avoid NCCL shared memory allocation issues (as described in vllm-project/vllm#6574).
To address this problem, we need to allocate the shared memory when starting the vLLM pod (potentially by mounting a volume to /dev/shm).
Ideally, users don't need to configure the shared memory by default and whether to mount the shared memory should be determined by the helm template (e.g., mount the volume when number of GPUs > 1, or have a specific configuration item in
vllmConfig
about TP).Also, it would be great if we could have a new tutorial about how to set up multi-GPU vLLM instance.
Why do you need this feature?
No response
Additional context
Related issues:
#44
#50
#95
The text was updated successfully, but these errors were encountered: