How to run bazel tests on gpu

Using the instructions from @BradLarson I can build mojo and run the cpu tests but I am not familiar with bazel at all so I’m not sure what the recommended way to run the gpu tests is? If I try

./bazelw test //max/kernels/test/gpu/layout:all

the tests are skipped

INFO: Invocation ID: 58eb8d4f-9e2e-4324-bea5-71743bfb836d
INFO: Analyzed 46 targets (7 packages loaded, 23 targets configured).
INFO: Found 14 targets and 32 test targets…
INFO: Elapsed time: 0.764s, Critical Path: 0.03s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action

Executed 0 out of 32 tests: 32 were skipped.

If this is documented somewhere then please point me to it and I’ll close the topic.

There are some instructions for Bazel in the Modular GitHub repository:

I’m currently experimenting a bit and trying to build the modular repo using Bazel.
Which GPU do you have? I’m using an NVIDIA RTX 3090 and a GTX 1050, and I noticed that a few things are important for GPU detection.

For example, you need at least the NVIDIA driver version 550 installed:

sudo apt install nvidia-driver-550

Additionally, Bazel must be able to execute the nvidia-smi command. In my case, this wasn’t always working out of the box. However, once I passed the correct path --repo_env=PATH=$PATH to Bazel, it started working.

You can check whether nvidia-smi is being called by using the MOJO_VERBOSE_GPU_DETECT environment variable like this:

./bazelw test --repo_env=MOJO_VERBOSE_GPU_DETECT=1

If nvidia-smi is not found, you’ll see a debug message like this:

DEBUG: .... nvidia-smi path: None, rocm-smi path: None, amd-smi path: None

If nvidia-smi is found and can be executed successfully, the output looks like this:

DEBUG: ... nvidia-smi path: /run/current-system/sw/bin/nvidia-smi, rocm-smi path: None, amd-smi path: None
DEBUG: 
------ /run/current-system/sw/bin/nvidia-smi:
exit status: 0
stdout: NVIDIA GeForce RTX 3090

stderr:
------ end /run/current-system/sw/bin/nvidia-smi info
1 Like

Thank you for the your detailed reply. Adding my GPU using the instructions you linked to and including the path to nvidia-smi as you suggested worked!!


I have two additional questions if you have time:

  1. I have to run ./bazelw test with the explicit path to the module I want to test

    ./bazelw test //max/kernels/test/gpu/layout:all --repo_env=MOJO_VERBOSE_GPU_DETECT=1
    

    How or where can I call the command without the explicit path to the module? If I call

    ./bazelw test --repo_env=MOJO_VERBOSE_GPU_DETECT=1
    

    from the root directory I get the following errors.

    If the definition of 'repository @@aspect_rules_js+' was updated, verify that the hashes were also updated.
    ERROR: /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/bazel_tools/tools/build_defs/repo/http.bzl:139:45: An error occurred during the fetch of repository 'aspect_rules_js+':
       Traceback (most recent call last):
            File "/home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/bazel_tools/tools/build_defs/repo/http.bzl", line 139, column 45, in _http_archive_impl
                    download_info = ctx.download_and_extract(
    Error in download_and_extract: java.io.IOException: Error extracting /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp7219254014548800952/rules_js-v2.3.8.tar.gz to /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp7219254014548800952: [unix_jni.cc:281] /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/js/private/test/image/non_ascii/empty empty.?? (No such file or directory)
    ERROR: Skipping '//...': error loading package under directory '': no such package '@@aspect_rules_js+//js': java.io.IOException: Error extracting /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp7219254014548800952/rules_js-v2.3.8.tar.gz to /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp7219254014548800952: [unix_jni.cc:281] /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/js/private/test/image/non_ascii/empty empty.?? (No such file or directory)
    ERROR: error loading package under directory '': no such package '@@aspect_rules_js+//js': java.io.IOException: Error extracting /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp7219254014548800952/rules_js-v2.3.8.tar.gz to /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp7219254014548800952: [unix_jni.cc:281] /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/js/private/test/image/non_ascii/empty empty.?? (No such file or directory)
    INFO: Elapsed time: 0.554s
    INFO: 0 processes.
    ERROR: Build did NOT complete successfully
    ERROR: Couldn't start the build. Unable to run tests
    FAILED:
        Fetching repository @@rules_multirun+; Patching repository
        Fetching /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+; Extracting rules_js-v2.3.8.tar.gz
    
  2. I would also like to run all the tests for all gpu modules, how do you use ./bazelw test command to do this?

i think this command should do it…

./bazelw test --repo_env=MOJO_VERBOSE_GPU_DETECT=1 --repo_env=PATH=$PATH

i had this error as well and in my case this solved the error.

sudo apt install locales
sudo locale-gen en_US.UTF-8
sudo update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8

restart shell after you updated locales.

you can filter for gpu tests with this option .. --test_tag_filters=gpu

I must be missing some additional config because if I specify the module to test everything works but if I don’t and I run the command from the root directory I still get the error:

ERROR: /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/bazel_tools/tools/build_defs/repo/http.bzl:139:45: An error occurred during the fetch of repository 'aspect_rules_js+':
   Traceback (most recent call last):
        File "/home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/bazel_tools/tools/build_defs/repo/http.bzl", line 139, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error extracting /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp8154037439045774578/rules_js-v2.3.8.tar.gz to /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp8154037439045774578: [unix_jni.cc:281] /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/js/private/test/image/non_ascii/empty empty.?? (No such file or directory)
ERROR: error loading package under directory '': no such package '@@aspect_rules_js+//js': java.io.IOException: Error extracting /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp8154037439045774578/rules_js-v2.3.8.tar.gz to /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/temp8154037439045774578: [unix_jni.cc:281] /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+/js/private/test/image/non_ascii/empty empty.?? (No such file or directory)
INFO: Elapsed time: 0.408s
INFO: 0 processes.
ERROR: Build did NOT complete successfully
ERROR: Couldn't start the build. Unable to run tests
FAILED:
    Fetching repository @@rules_multirun+; Patching repository
    Fetching /home/b/.cache/bazel/_bazel_b/9c06a8210864b248e851345905ea6514/external/aspect_rules_js+; Extracting rules_js-v2.3.8.tar.gz

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.