Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaVA-onevision-qwen2-7b 模型并行多卡推理报错 #1084

Open
GraygoodsEiko opened this issue Feb 28, 2025 · 1 comment
Open

LLaVA-onevision-qwen2-7b 模型并行多卡推理报错 #1084

GraygoodsEiko opened this issue Feb 28, 2025 · 1 comment
Assignees

Comments

@GraygoodsEiko
Copy link

系统:Ubuntu 20.04.6 LTS
显卡:NVIDIA GeForce RTX 3090 * 4,CUDA=12.4
Paddle 环境:paddlemix 0.1.0;paddlenlp 3.0.0b3;paddlepaddle-gpu 3.0.0rc

运行代码:

# test.py
def main():
    strategy = fleet.DistributedStrategy()
    strategy.hybrid_configs = {
        "dp_degree": 1,
        "mp_degree": 4,
        "pp_degree": 1,
        "sharding_degree": 1,
    }
    fleet.init(is_collective=True, strategy=strategy)
    hcg = fleet.get_hybrid_communicate_group()
    tensor_parallel_rank = hcg.get_model_parallel_rank()
    paddle.seed(seed=0)

    model_name = "lmms-lab/llava-onevision-qwen2-7b-si"
    compute_dtype = "float16"

    model = LlavaQwenForCausalLM.from_pretrained(model_name, tensor_parallel_degree=4, tensor_parallel_rank=tensor_parallel_rank, dtype=compute_dtype).eval() # 此处报错
    tokenizer = Qwen2Tokenizer.from_pretrained(model_name)
    image_processor = SigLipImageProcessor()

执行指令:python -m paddle.distributed.launch --gpus="0,1,2,3" test.py

读入时发生以下报错,显示模型大小不匹配:

Loading checkpoint shards: 100%|██████████| 4/4 [01:56<00:00, 29.08s/it]
Traceback (most recent call last):
  File "/home/mengxy/Chat/PaddleMIX/test.py", line 75, in <module>
    main()
  File "/home/mengxy/Chat/PaddleMIX/test.py", line 40, in main
    model = LlavaQwenForCausalLM.from_pretrained(model_name, tensor_parallel_degree=4, tensor_parallel_rank=tensor_parallel_rank, dtype=compute_dtype).eval()
  File "/home/mengxy/anaconda3/envs/paddlemix/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 2529, in from_pretrained
    model, missing_keys, unexpected_keys, mismatched_keys = cls._load_pretrained_model(
  File "/home/mengxy/anaconda3/envs/paddlemix/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 2216, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlavaQwenForCausalLM:
        Skip loading for qwen2.embed_tokens.weight. qwen2.embed_tokens.weight receives a shape [152064, 3584], but the expected shape is [38016, 3584].
        Skip loading for qwen2.layers.0.self_attn.q_proj.weight. qwen2.layers.0.self_attn.q_proj.weight receives a shape [3584, 3584], but the expected shape is [3584, 896].
        Skip loading ......

希望知道如何设置代码,谢谢!

@luyao-cv
Copy link
Collaborator

luyao-cv commented Mar 4, 2025

这个组网不支持多卡推理,可以迁移到qwen2vl / qwen2.5Vl 做多卡推理

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants