Do you have any pointers on resource to help with installation of the models after MAX install and Model install and trying to execute. I am persistently getting FlashAttention error while trying to execute (after installing and loading MSFT 3.5-vision-instruct model) - RuntimeError: FlashAttention only support fp16 and bf16 data type
Stack (most recent call last):
File “”, line 1, in
File “/usr/lib/python3.10/multiprocessing/spawn.py”, line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File “/usr/lib/python3.10/multiprocessing/spawn.py”, line 129, in _main
return self._bootstrap(parent_sentinel)
File “/usr/lib/python3.10/multiprocessing/process.py”, line 314, in _bootstrap
self.run()
File “/usr/lib/python3.10/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/profiler/tracing.py”, line 85, in wrapper
return func(*args, **kwargs)
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/pipelines/model_worker.py”, line 204, in call
logger.exception(
[2025-06-28 12:37:06] ERROR queues.py:143: Model worker process is not healthy
Task completed with error. Stopping
Traceback (most recent call last):
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/queue/zmq_queue.py”, line 261, in _pull_from_socket
msg = self.pull_socket.recv(**kwargs)
File “zmq/backend/cython/_zmq.py”, line 1203, in zmq.backend.cython._zmq.Socket.recv
File “zmq/backend/cython/_zmq.py”, line 1238, in zmq.backend.cython._zmq.Socket.recv
File “zmq/backend/cython/_zmq.py”, line 1398, in zmq.backend.cython._zmq._recv_copy
File “zmq/backend/cython/_zmq.py”, line 1393, in zmq.backend.cython._zmq._recv_copy
File “zmq/backend/cython/_zmq.py”, line 183, in zmq.backend.cython._zmq._check_rc
zmq.error.Again: Resource temporarily unavailable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/scheduler/queues.py”, line 124, in response_worker
responses_list = self.response_pull_socket.get_nowait()
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/profiler/tracing.py”, line 85, in wrapper
return func(*args, **kwargs)
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/queue/zmq_queue.py”, line 287, in get_nowait
return self.get(flags=zmq.NOBLOCK, **kwargs)
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/queue/zmq_queue.py”, line 283, in get
return self._pull_from_socket(**kwargs)
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/queue/zmq_queue.py”, line 264, in _pull_from_socket
raise queue.Empty()
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/scheduler/queues.py”, line 145, in response_worker
raise Exception(“Worker failed!”)
Exception: Worker failed!
[2025-06-28 12:37:06] INFO server.py:264: Shutting down
[2025-06-28 12:37:06] INFO server.py:299: Waiting for connections to close. (CTRL+C to force quit)
For reference, I did the following
(1) This is the code that I loaded the model :
unset MAX_SERVE_PORT
unset MAX_SERVE_USE_HEARTBEAT
max serve --model-path=microsoft/Phi-3.5-vision-instruct --trust-remote-code --torch-dtype=bfloat16 --disable-telemetry --port=9999
(2) This is the output when the server is launched:
2025-06-28 12:35:13] WARNING hf_pipeline.py:89: eos_token_id provided in huggingface config (2), does not match provided eos_token_id (32000), using provided eos_token_id 12:35:13.355 WARNING: 7971 MainThread: max.pipelines: eos_token_id provided in huggingface config (2), does not match provided eos_token_id (32000), using provided eos_token_id [2025-06-28 12:35:13] INFO api_server.py:153:
********** Server ready on http://0.0.0.0:9999 (Press CTRL+C to quit) **********
[2025-06-28 12:35:13] ERROR metrics.py:195: instrument maxserve.pipeline_load is not one of the supported sdk types [2025-06-28 12:35:13] INFO on.py:62: Application startup complete. [2025-06-28 12:35:13] INFO server.py:216: Uvicorn running on http://0.0.0.0:9999 (Press CTRL+C to quit)
(3) I try testing the model with both text and image inputs and got the FlashAttention issue due to variable type mismatch. Below are the two test codes
curl -X POST http://localhost:9999/v1/chat/completions
-H “Content-Type: application/json”
-d ‘{
“model”: “microsoft/Phi-3.5-vision-instruct”,
“messages”: [{“role”: “user”, “content”: [
{“type”: “text”, “text”: “What do you see in this image?”},
{“type”: “image_url”, “image_url”: {“url”: “_BASE64_IMAGE”}}
]}],
“max_tokens”: 100
}’
curl -X POST http://localhost:9999/v1/chat/completions /v1/chat/completions
-H “Content-Type: application/json”
-d ‘{
“model”: “microsoft/Phi-3.5-vision-instruct”,
“messages”: [{“role”: “user”, “content”: “Hello, how are you?”}],
“max_tokens”: 50
}’
Both the above gives following error and trace logs:
ERROR queues.py:143: Model worker process is not healthy
Task completed with error. Stopping
Traceback (most recent call last):
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/queue/zmq_queue.py”, line 261, in _pull_from_socket
msg = self.pull_socket.recv(**kwargs)
File “zmq/backend/cython/_zmq.py”, line 1203, in zmq.backend.cython._zmq.Socket.recv
File “zmq/backend/cython/_zmq.py”, line 1238, in zmq.backend.cython._zmq.Socket.recv
File “zmq/backend/cython/_zmq.py”, line 1398, in zmq.backend.cython._zmq._recv_copy
File “zmq/backend/cython/_zmq.py”, line 1393, in zmq.backend.cython._zmq._recv_copy
File “zmq/backend/cython/_zmq.py”, line 183, in zmq.backend.cython._zmq._check_rc
zmq.error.Again: Resource temporarily unavailable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/scheduler/queues.py”, line 124, in response_worker
responses_list = self.response_pull_socket.get_nowait()
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/profiler/tracing.py”, line 85, in wrapper
return func(*args, **kwargs)
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/queue/zmq_queue.py”, line 287, in get_nowait
return self.get(flags=zmq.NOBLOCK, **kwargs)
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/queue/zmq_queue.py”, line 283, in get
return self._pull_from_socket(**kwargs)
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/queue/zmq_queue.py”, line 264, in _pull_from_socket
raise queue.Empty()
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/bsam/phi35_vision/.venv/lib/python3.10/site-packages/max/serve/scheduler/queues.py”, line 145, in response_worker
raise Exception(“Worker failed!”)
Exception: Worker failed!
[2025-06-28 12:37:06] INFO server.py:264: Shutting down