Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model]: Qwen2-VL #246

Open
Yikun opened this issue Mar 5, 2025 · 1 comment
Open

[New Model]: Qwen2-VL #246

Yikun opened this issue Mar 5, 2025 · 1 comment

Comments

@Yikun
Copy link
Collaborator

Yikun commented Mar 5, 2025

The model to consider.

https://huggingface.co/Qwen/Qwen2-VL-2B
https://huggingface.co/Qwen/Qwen2-VL-7B

The closest model vllm already supports.

No response

What's your difficulty of supporting the model you want?

No response

@Yikun Yikun added the new model label Mar 5, 2025
@Yikun Yikun mentioned this issue Mar 5, 2025
37 tasks
@Potabk
Copy link
Contributor

Potabk commented Mar 6, 2025

Once in a test, I found that qwen2_vl is already supported. Not sure if it is a correct test method and the script is derived from vllm example

python examples/offline_inference/vision_language.py 
INFO 03-06 11:04:11 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 03-06 11:04:11 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 03-06 11:04:11 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 03-06 11:04:11 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 03-06 11:04:11 [__init__.py:44] plugin ascend loaded.
INFO 03-06 11:04:11 [__init__.py:247] Platform plugin ascend is activated
INFO 03-06 11:04:26 [config.py:576] This model supports multiple tasks: {'classify', 'reward', 'embed', 'generate', 'score'}. Defaulting to 'generate'.
INFO 03-06 11:04:26 [llm_engine.py:235] Initializing a V0 LLM engine (v0.7.3.dev245+gcd1f843f) with config: model='/root/wl/cache/modelscope/models/Qwen/Qwen2-VL-2B-Instruct', speculative_config=None, tokenizer='/root/wl/cache/modelscope/models/Qwen/Qwen2-VL-2B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/root/wl/cache/modelscope/models/Qwen/Qwen2-VL-2B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs={'min_pixels': 784, 'max_pixels': 1003520}, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[8,4,2,1],"max_capture_size":8}, use_cached_outputs=False, 
INFO 03-06 11:04:28 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 03-06 11:04:28 [utils.py:2300] Methods add_lora,add_prompt_adapter,cache_config,compilation_config,current_platform,list_loras,list_prompt_adapters,load_config,pin_lora,pin_prompt_adapter,remove_lora,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffd0fd9bbe0>
INFO 03-06 11:04:36 [parallel_state.py:948] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
WARNING 03-06 11:04:36 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 03-06 11:04:36 [config.py:3164] cudagraph sizes specified by model runner [1, 2, 4, 8] is overridden by config [8, 1, 2, 4]
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00,  2.25it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00,  2.25it/s]

INFO 03-06 11:04:38 [loader.py:422] Loading weights took 1.12 seconds
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Keyword argument `min_pixels` is not a valid argument for this processor and will be ignored.
Keyword argument `max_pixels` is not a valid argument for this processor and will be ignored.
It looks like you are trying to rescale already rescaled images. If the input images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again.
/root/wl/oldfiles/vllm-project/vllm/vllm/model_executor/models/qwen2_vl.py:626: UserWarning: current tensor is running as_strided, don't perform inplace operations on the returned value. If you encounter this warning and have precision issues, you can try torch.npu.config.allow_internal_format = False to resolve precision issues. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:123.)
  x = x.unsqueeze(1)
INFO 03-06 11:04:49 [executor_base.py:111] # npu blocks: 112772, # CPU blocks: 9362
INFO 03-06 11:04:49 [executor_base.py:116] Maximum concurrency for 4096 tokens per request: 440.52x
INFO 03-06 11:04:50 [llm_engine.py:441] init engine (profile, create kv cache, warmup model) took 11.37 seconds
WARNING 03-06 11:04:57 [utils.py:1485] The following intended overrides are not keyword-only args and and will be dropped: {'min_pixels', 'max_pixels'}
WARNING 03-06 11:04:57 [utils.py:1485] The following intended overrides are not keyword-only args and and will be dropped: {'min_pixels', 'max_pixels'}
WARNING 03-06 11:04:57 [utils.py:1485] The following intended overrides are not keyword-only args and and will be dropped: {'min_pixels', 'max_pixels'}
WARNING 03-06 11:04:57 [utils.py:1485] The following intended overrides are not keyword-only args and and will be dropped: {'min_pixels', 'max_pixels'}
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.10s/it, est. speed input: 1158.04 toks/s, output: 54.45 toks/s]
The image shows a close-up view of cherry blossoms in full bloom, with the Tokyo Skytree tower visible through the branches. The sky is clear and blue, and the cherry blossoms are in shades of pink and white, creating a beautiful contrast against the blue sky.
The image shows a beautiful view of a cherry blossom tree with pink flowers in full bloom, set against a clear blue sky. In the background, there is a tall tower with a white and gray structure, possibly a telecommunications or observation tower. The cherry blossoms frame the tower, creating a picturesque and serene scene.
The image shows a close-up view of cherry blossoms in full bloom, with the Tokyo Skytree tower visible through the branches. The sky is clear and blue, and the cherry blossoms are in shades of pink and white. The Tokyo Skytree is a famous landmark in Tokyo, Japan, and is known for its
The image shows a close-up view of cherry blossoms in full bloom, with the Tokyo Skytree tower visible through the branches. The sky is clear and blue, and the cherry blossoms are in shades of pink and white, creating a beautiful contrast against the blue background.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants