-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Installation]: XPU dependencies are missing #11173
Comments
i have the exact same problem with almost the exact same installation process: instead of removing the AWS URLs i used this as the requirements-xpu.txt: Common dependencies-r requirements-common.txt ray >= 2.9 #torch @ https://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/ipex_dev/xpu/torch-2.5.0a0%2Bgite84e33f-cp310-cp310-linux_x86_64.whl triton-xpu == 3.0.0b1 and then used this: python -m pip install torch==2.5.1+cxx11.abi torchvision==0.20.1+cxx11.abi torchaudio==2.5.1+cxx11.abi intel-extension-for-pytorch==2.5.10+xpu oneccl_bind_pt==2.5.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ As for the system i am using: [W1227 23:08:36.920706038 OperatorEntry.cpp:155] Warning: Warning only once for all operators, other operators may also be overridden. OS: Ubuntu 24.10 (x86_64) Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime) CPU: Versions of relevant libraries: LD_LIBRARY_PATH=/home/quentin/miniconda3/envs/vllm_env310/lib/python3.10/site-packages/cv2/../../lib64:/opt/intel/oneapi/tcm/1.2/lib:/opt/intel/oneapi/umf/0.9/lib:/opt/intel/oneapi/tbb/2022.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/pti/0.10/lib:/opt/intel/oneapi/mpi/2021.14/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.14/lib:/opt/intel/oneapi/mkl/2025.0/lib:/opt/intel/oneapi/ippcp/2025.0/lib/:/opt/intel/oneapi/ipp/2022.0/lib:/opt/intel/oneapi/dnnl/2025.0/lib:/opt/intel/oneapi/debugger/2025.0/opt/debugger/lib:/opt/intel/oneapi/dal/2025.0/lib:/opt/intel/oneapi/compiler/2025.0/opt/compiler/lib:/opt/intel/oneapi/compiler/2025.0/lib:/opt/intel/oneapi/ccl/2021.14/lib/:/home/quentin/miniconda3/envs/vllm_env310/lib/libfabric: |
Some sudo python3 -m vllm.entrypoints.openai.api_server
Traceback (most recent call last):
File "<frozen runpy>", line 189, in _run_module_as_main
File "<frozen runpy>", line 112, in _get_module_details
File "/home/a5770/rm05/develop/vllmrun/vllm/__init__.py", line 3, in <module>
from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
File "/home/a5770/rm05/develop/vllmrun/vllm/engine/arg_utils.py", line 8, in <module>
import torch
ModuleNotFoundError: No module named 'torch'
(vllm) a5770@a5770-PA602-12900K:~/rm05/develop/vllmrun$ python3 -m vllm.entrypoints.openai.api_server
INFO 01-02 04:16:20 api_server.py:705] vLLM API server version 0.6.6.post2.dev47+ga115ac46
INFO 01-02 04:16:20 api_server.py:706] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='facebook/opt-125m', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False)
DEBUG 01-02 04:16:20 __init__.py:26] No plugins for group vllm.platform_plugins found.
[W102 04:16:21.808120355 OperatorEntry.cpp:155] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_cummax_helper(Tensor self, Tensor(a!) values, Tensor(b!) indices, int dim) -> ()
registered at /build/pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /build/pytorch/build/aten/src/ATen/RegisterCPU.cpp:30476
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:2960 (function operator())
INFO 01-02 04:16:22 __init__.py:179] Automatically detected platform xpu.
DEBUG 01-02 04:16:22 __init__.py:26] No plugins for group vllm.general_plugins found.
DEBUG 01-02 04:16:22 api_server.py:171] Multiprocessing frontend to use ipc:///tmp/f6342432-5d7d-4ae8-8cdf-eaa7f53f4c5c for IPC Path.
INFO 01-02 04:16:22 api_server.py:190] Started engine process with PID 4024903
DEBUG 01-02 04:16:23 __init__.py:26] No plugins for group vllm.platform_plugins found.
[W102 04:16:23.401228885 OperatorEntry.cpp:155] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_cummax_helper(Tensor self, Tensor(a!) values, Tensor(b!) indices, int dim) -> ()
registered at /build/pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /build/pytorch/build/aten/src/ATen/RegisterCPU.cpp:30476
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:2960 (function operator())
INFO 01-02 04:16:24 __init__.py:179] Automatically detected platform xpu.
DEBUG 01-02 04:16:24 __init__.py:26] No plugins for group vllm.general_plugins found.
INFO 01-02 04:16:29 config.py:517] This model supports multiple tasks: {'embed', 'generate', 'score', 'reward', 'classify'}. Defaulting to 'generate'.
WARNING 01-02 04:16:29 _logger.py:68] CUDA graph is not supported on XPU, fallback to the eager mode.
INFO 01-02 04:16:31 config.py:517] This model supports multiple tasks: {'reward', 'classify', 'generate', 'score', 'embed'}. Defaulting to 'generate'.
WARNING 01-02 04:16:31 _logger.py:68] CUDA graph is not supported on XPU, fallback to the eager mode.
INFO 01-02 04:16:31 llm_engine.py:234] Initializing an LLM engine (v0.6.6.post2.dev47+ga115ac46) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=xpu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=facebook/opt-125m, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
INFO 01-02 04:16:32 xpu.py:26] Cannot use _Backend.FLASH_ATTN backend on XPU.
INFO 01-02 04:16:32 selector.py:151] Using IPEX attention backend.
WARNING 01-02 04:16:32 _logger.py:68] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 01-02 04:16:32 importing.py:14] Triton not installed or not compatible; certain GPU-related functions will not be available.
DEBUG 01-02 04:16:32 parallel_state.py:959] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.0.231:57029 backend=ccl
2025:01:02-04:16:32:(4024903) |CCL_WARN| value of CCL_ATL_TRANSPORT changed to be ofi (default:mpi)
2025:01:02-04:16:32:(4024903) |CCL_WARN| value of CCL_LOCAL_RANK changed to be 0 (default:-1)
2025:01:02-04:16:32:(4024903) |CCL_WARN| value of CCL_LOCAL_SIZE changed to be 1 (default:-1)
2025:01:02-04:16:32:(4024903) |CCL_WARN| value of CCL_PROCESS_LAUNCHER changed to be none (default:hydra)
DEBUG 01-02 04:16:32 decorators.py:105] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.opt.OPTModel'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
DEBUG 01-02 04:16:32 config.py:3325] enabled custom ops: Counter()
DEBUG 01-02 04:16:32 config.py:3327] disabled custom ops: Counter()
INFO 01-02 04:16:33 weight_utils.py:251] Using model weights format ['*.bin']
Loading pt checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
/home/a5770/rm05/develop/vllmrun/vllm/model_executor/model_loader/weight_utils.py:450: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(bin_file, map_location="cpu")
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.18it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.18it/s]
WARNING 01-02 04:16:34 _logger.py:68] Pin memory is not supported on XPU.
INFO 01-02 04:16:34 xpu_model_runner.py:415] Loading model weights took 0.2389 GB
ERROR 01-02 04:16:34 engine.py:366] varlen_fwd() takes 14 positional arguments but 15 were given
ERROR 01-02 04:16:34 engine.py:366] Traceback (most recent call last):
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 01-02 04:16:34 engine.py:366] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
ERROR 01-02 04:16:34 engine.py:366] return cls(ipc_path=ipc_path,
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/engine/multiprocessing/engine.py", line 71, in __init__
ERROR 01-02 04:16:34 engine.py:366] self.engine = LLMEngine(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/engine/llm_engine.py", line 276, in __init__
ERROR 01-02 04:16:34 engine.py:366] self._initialize_kv_caches()
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/engine/llm_engine.py", line 416, in _initialize_kv_caches
ERROR 01-02 04:16:34 engine.py:366] self.model_executor.determine_num_available_blocks())
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/executor/gpu_executor.py", line 68, in determine_num_available_blocks
ERROR 01-02 04:16:34 engine.py:366] return self.driver_worker.determine_num_available_blocks()
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 01-02 04:16:34 engine.py:366] return func(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/worker/xpu_worker.py", line 104, in determine_num_available_blocks
ERROR 01-02 04:16:34 engine.py:366] self.model_runner.profile_run()
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 01-02 04:16:34 engine.py:366] return func(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/worker/xpu_model_runner.py", line 492, in profile_run
ERROR 01-02 04:16:34 engine.py:366] self.execute_model(model_input, kv_caches, intermediate_tensors)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 01-02 04:16:34 engine.py:366] return func(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/worker/xpu_model_runner.py", line 566, in execute_model
ERROR 01-02 04:16:34 engine.py:366] hidden_or_intermediate_states = model_executable(
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 01-02 04:16:34 engine.py:366] return self._call_impl(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 01-02 04:16:34 engine.py:366] return forward_call(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 372, in forward
ERROR 01-02 04:16:34 engine.py:366] hidden_states = self.model(input_ids, positions, kv_caches,
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/compilation/decorators.py", line 168, in __call__
ERROR 01-02 04:16:34 engine.py:366] return self.forward(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 323, in forward
ERROR 01-02 04:16:34 engine.py:366] return self.decoder(input_ids,
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 01-02 04:16:34 engine.py:366] return self._call_impl(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 01-02 04:16:34 engine.py:366] return forward_call(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 280, in forward
ERROR 01-02 04:16:34 engine.py:366] hidden_states = layer(hidden_states,
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 01-02 04:16:34 engine.py:366] return self._call_impl(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 01-02 04:16:34 engine.py:366] return forward_call(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 173, in forward
ERROR 01-02 04:16:34 engine.py:366] hidden_states = self.self_attn(hidden_states=hidden_states,
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 01-02 04:16:34 engine.py:366] return self._call_impl(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 01-02 04:16:34 engine.py:366] return forward_call(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 113, in forward
ERROR 01-02 04:16:34 engine.py:366] attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 01-02 04:16:34 engine.py:366] return self._call_impl(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 01-02 04:16:34 engine.py:366] return forward_call(*args, **kwargs)
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/attention/layer.py", line 134, in forward
ERROR 01-02 04:16:34 engine.py:366] return self.impl.forward(query,
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/attention/backends/ipex_attn.py", line 244, in forward
ERROR 01-02 04:16:34 engine.py:366] ipex_ops.varlen_attention(
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/rm05/develop/vllmrun/vllm/_ipex_ops.py", line 188, in varlen_attention
ERROR 01-02 04:16:34 engine.py:366] ipex.llm.functional.varlen_attention(query.contiguous(),
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/llm/functional/fusions.py", line 283, in varlen_attention
ERROR 01-02 04:16:34 engine.py:366] return VarlenAttention.apply_function(
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/llm/modules/mha_fusion.py", line 379, in apply_function
ERROR 01-02 04:16:34 engine.py:366] ).apply_function(
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/models/xpu/fusions/mha_fusion.py", line 237, in apply_function
ERROR 01-02 04:16:34 engine.py:366] _IPEXVarlenScaledDotProductXPU.apply_function_flash_varlen(
ERROR 01-02 04:16:34 engine.py:366] File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/models/xpu/fusions/mha_fusion.py", line 311, in apply_function_flash_varlen
ERROR 01-02 04:16:34 engine.py:366] torch.xpu.varlen_fwd(
ERROR 01-02 04:16:34 engine.py:366] TypeError: varlen_fwd() takes 14 positional arguments but 15 were given
Process SpawnProcess-1:
Traceback (most recent call last):
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
raise e
File "/home/a5770/rm05/develop/vllmrun/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
File "/home/a5770/rm05/develop/vllmrun/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
return cls(ipc_path=ipc_path,
File "/home/a5770/rm05/develop/vllmrun/vllm/engine/multiprocessing/engine.py", line 71, in __init__
self.engine = LLMEngine(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/engine/llm_engine.py", line 276, in __init__
self._initialize_kv_caches()
File "/home/a5770/rm05/develop/vllmrun/vllm/engine/llm_engine.py", line 416, in _initialize_kv_caches
self.model_executor.determine_num_available_blocks())
File "/home/a5770/rm05/develop/vllmrun/vllm/executor/gpu_executor.py", line 68, in determine_num_available_blocks
return self.driver_worker.determine_num_available_blocks()
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/worker/xpu_worker.py", line 104, in determine_num_available_blocks
self.model_runner.profile_run()
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/worker/xpu_model_runner.py", line 492, in profile_run
self.execute_model(model_input, kv_caches, intermediate_tensors)
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/worker/xpu_model_runner.py", line 566, in execute_model
hidden_or_intermediate_states = model_executable(
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 372, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
File "/home/a5770/rm05/develop/vllmrun/vllm/compilation/decorators.py", line 168, in __call__
return self.forward(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 323, in forward
return self.decoder(input_ids,
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 280, in forward
hidden_states = layer(hidden_states,
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 173, in forward
hidden_states = self.self_attn(hidden_states=hidden_states,
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/model_executor/models/opt.py", line 113, in forward
attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/a5770/rm05/develop/vllmrun/vllm/attention/layer.py", line 134, in forward
return self.impl.forward(query,
File "/home/a5770/rm05/develop/vllmrun/vllm/attention/backends/ipex_attn.py", line 244, in forward
ipex_ops.varlen_attention(
File "/home/a5770/rm05/develop/vllmrun/vllm/_ipex_ops.py", line 188, in varlen_attention
ipex.llm.functional.varlen_attention(query.contiguous(),
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/llm/functional/fusions.py", line 283, in varlen_attention
return VarlenAttention.apply_function(
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/llm/modules/mha_fusion.py", line 379, in apply_function
).apply_function(
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/models/xpu/fusions/mha_fusion.py", line 237, in apply_function
_IPEXVarlenScaledDotProductXPU.apply_function_flash_varlen(
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/models/xpu/fusions/mha_fusion.py", line 311, in apply_function_flash_varlen
torch.xpu.varlen_fwd(
TypeError: varlen_fwd() takes 14 positional arguments but 15 were given
DEBUG 01-02 04:16:40 client.py:252] Shutting down MQLLMEngineClient output handler.
Traceback (most recent call last):
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/a5770/rm05/develop/vllmrun/vllm/entrypoints/openai/api_server.py", line 767, in <module>
uvloop.run(run_server(args))
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
return await main
File "/home/a5770/rm05/develop/vllmrun/vllm/entrypoints/openai/api_server.py", line 733, in run_server
async with build_async_engine_client(args) as engine_client:
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/home/a5770/rm05/develop/vllmrun/vllm/entrypoints/openai/api_server.py", line 120, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/home/a5770/miniforge3/envs/vllm/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/home/a5770/rm05/develop/vllmrun/vllm/entrypoints/openai/api_server.py", line 214, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
` |
I'm facing the exact same issue. Any updates? |
are you using intel arc graphic card? there is a bug in dispatch code on arc card. a quick solution is |
I'm also having issues with Arc770, but I think I've solved the
If you want the older oneapi 2024.2 install, you can get it if you search for the download page using the wayback machine. |
Related issue? |
Also note - From - https://github.com/intel/intel-xpu-backend-for-triton Looks like Triton and intel-extension-for-pytorch are mutually exclusive. |
|
Your current environment
How you are installing vllm
after removing the AWS URLs, this works:
but there appears to be a version mismatch:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: