Package Version
max 26.1.0.dev2026010820
max-core 26.1.0.dev2026010820
max-mojo-libs 26.1.0.dev2026010820
max-shmem-libs 26.1.0.dev2026010820
- Tensor Allocates GPU Memory on All Devices
When creating a simple CPU tensor, the process allocates approximately 616 MiB of GPU memory on all available GPUs (8 GPUs in my case)
Reproduction Code:
from max.dtype import DType
from max.experimental.tensor import Tensor
from max.driver import CPU, Accelerator
gpu_tensor = Tensor.ones([2,3], dtype=DType.float32, device=Accelerator())
# cpu_tensor = Tensor.ones([2,3], dtype=DType.float32, device=CPU()) # similar to gpu tensor
After running this code, nvidia-smi shows that the Python process has allocated memories on all 8 GPUs:
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 843022 C python 3894MiB |
| 1 N/A N/A 843022 C python 616MiB |
| 2 N/A N/A 843022 C python 616MiB |
| 3 N/A N/A 843022 C python 616MiB |
| 4 N/A N/A 843022 C python 616MiB |
| 5 N/A N/A 843022 C python 616MiB |
| 6 N/A N/A 843022 C python 616MiB |
| 7 N/A N/A 843022 C python 616MiB |
+-----------------------------------------------------------------------------------------+
2: Setting CUDA_VISIBLE_DEVICES Causes Failure
When using CUDA_VISIBLE_DEVICES, the code fails with an error with both CPU and Accelerator device tensors are requested:
$ CUDA_VISIBLE_DEVICES=1 python tests/tensor_test.py
Traceback (most recent call last):
File "/home/jovyan/eunikpark/modular_workspace/tests/tensor_test.py", line 8, in <module>
cpu_tensor = Tensor.ones([2,3], dtype=DType.float32, device=CPU())
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/site-packages/max/experimental/tensor.py", line 768, in ones
return cls.full(shape, value=1, dtype=dtype, device=device)
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/site-packages/max/experimental/tensor.py", line 612, in full
cls.constant(value, dtype=dtype, device=device), shape
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/site-packages/max/experimental/tensor.py", line 570, in constant
return F.constant(value, dtype, device)
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/site-packages/max/experimental/functional.py", line 185, in wrapped
with contextlib.ExitStack() as stack:
~~~~~~~~~~~~~~~~~~~~^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/contextlib.py", line 619, in __exit__
raise exc
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/contextlib.py", line 604, in __exit__
if cb(*exc_details):
~~^^^^^^^^^^^^^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/site-packages/max/experimental/realization_context.py", line 315, in __exit__
F._run(self.realize_all())
~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/site-packages/max/experimental/functional.py", line 79, in _run
return asyncio.run(coro)
~~~~~~~~~~~^^^^^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/asyncio/runners.py", line 195, in run
return runner.run(main)
~~~~~~~~~~^^^^^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/asyncio/base_events.py", line 725, in run_until_complete
return future.result()
~~~~~~~~~~~~~^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/site-packages/max/experimental/realization_context.py", line 224, in realize_all
model = _session().load(graph)
~~~~~~~~^^
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/site-packages/max/experimental/realization_context.py", line 128, in _session
devices = driver.load_devices(device_specs)
File "/home/jovyan/eunikpark/.conda/envs/max/lib/python3.13/site-packages/max/driver/driver.py", line 94, in load_devices
devices.append(Accelerator(device_spec.id))
~~~~~~~~~~~^^^^^^^^^^^^^^^^
ValueError: failed to create device: No supported "gpu" device available.
CUDA information: CUDA call failed: CUDA_ERROR_INVALID_DEVICE (invalid device ordinal)
To get more accurate error information, set MODULAR_DEVICE_CONTEXT_SYNC_MODE=true.
HIP information: Failed to open library "libamdhip64.so": libamdhip64.so: cannot open shared object file: No such file or directory
To get more accurate error information, set MODULAR_DEVICE_CONTEXT_SYNC_MODE=true.