Using conv2d op on RGB image

bgreni · March 7, 2025, 6:02pm

As someone that doesn’t have a lot of experience using Tensors, I find myself a bit confused about the expected shape of the the filter argument in the builtin conv2d operation.

I’m trying to apply a gaussian blur effect to an rgb image, and I understand that I should be able to use a simple NxN matrix that can be applied to each color channel of each pixel, but I can’t figure out the relation between the in/out channels component of the filter shape, and the groups arg of the conv2d op in order to achieve that?

austin · March 7, 2025, 10:08pm

Hey there! to apply an NxN filter on an RGB image, we’d expect the filter to have shape [N, N, 3, 3], where 3 is the number of input channels (RGB) and the number of output channels (still RGB) per the docs. If you would like to apply the same filter to each channel, then you can broadcast the filter [N,N] filter to the expected [N, N, 3, 3,]

bgreni · March 8, 2025, 4:46am

Hello Austin thanks! I have this current snippet, and while it does work if N=3 (using the term work loosely since the output is a bit odd which could be a symptom of me doing this wrong), but if I try any other value of N the broadcast fails.

image = orig_image.broadcast_to((1,) + tuple(orig_image.shape.static_dims)).cast(DType.float64)
N = 3
arr = gkern(N, sig=1.)
kernel = ops.constant(arr, dtype=DType.float64)
kernel = kernel.broadcast_to((N, N, 3, 3))
res = ops.conv2d(
    image,
    kernel
)[0].tensor

austin · March 10, 2025, 1:10pm

Interesting! Could you share the implementation of the gkern function and also an example of what the output is when you try a different value of N?

bgreni · March 10, 2025, 8:50pm

I’ve just inlined the content of gkern here

def gaussian_blur(orig_image: TensorValue, sigma: float=1.) -> TensorValue:
    assert_rgb(orig_image)

    orig_dtype = orig_image.dtype

    image = orig_image.broadcast_to((1,) + tuple(orig_image.shape.static_dims)).cast(DType.float64)
    N = 6

    ax = np.linspace(-(N - 1) / 2., (N - 1) / 2., N)
    gauss = np.exp(-0.5 * np.square(ax) / np.square(sigma))
    kernel = np.outer(gauss, gauss)
    arr =  kernel / np.sum(kernel)

    kernel = ops.constant(arr, dtype=DType.float64)
    kernel = kernel.broadcast_to((N, N, 3, 3))
    res = ops.conv2d(
        image,
        kernel
    )[0].tensor

    return res.cast(orig_dtype)

This is the error I get when it tries to broadcast the kernel with N=6

  File "/Users/bgreni/Coding/max-cv/.magic/envs/showcase/lib/python3.12/site-packages/max/graph/graph.py", line 314, in _add_op_get_op_with_results
    raise ValueError(
ValueError: Failed to create op 'broadcast_to':
Inputs:
    input = TensorValue(dtype=float64, shape=[Dim(6), Dim(6)], device=None)
    new_shape = Attribute(#mosh<ape[6, 6, 3, 3]> : !mosh.ape)

Diagnostics:
    [broadcast_to] input dim must be either 1 or equal to corresponding output dim starting from the rightmost dim
Operation creation failed

austin · March 12, 2025, 4:57pm

Awesome, thanks for the context! I see what is happening here, and I think it might be a bug. I’m going to follow up with the team and get it fixed, but you should be able to workaround the issue if you change change your code to the following:

    kernel = ops.constant(arr, dtype=DType.float64)
    kernel = kernel.reshape(N, N, 1, 1)
    kernel = kernel.broadcast_to((N, N, 3, 3))

EDIT: Actually, this is expected behavior, we don’t have a bug to fix here broadcasts only implicitly create leading dimenmsions, any trailing dimensions will have to be explicitly created via reshape. Other frameworks (ex: PyTorch) have similar behavior to this.

the_og_andrew · March 12, 2025, 5:14pm

Hi @bgreni

The offending line is
kernel = kernel.broadcast_to((N, N, 3, 3))

Where you are trying to broadcast shape (6, 6) to (6, 6, 3, 3) which is not considered legal. The MAX stack implements the broadcasting rules found in numpy/torch and under these frameworks this will fail too. This is a good link to learn the mechanics fully.

I think I see what you want to do however and there are a variety of solutions. I think the simplest is to simply reshape your (6, 6) tensor to shape (6, 6, 1, 1). Then broadcast shape (6, 6, 1, 1) → (6, 6, 3, 3).

This detail simply has to do with how broadcasting deals with “padding” dimensions. Instead of approach things from the trailing direction, the mechanics instead approach things from the leading direction.

bgreni · March 12, 2025, 5:14pm

Yup that works, thank you so much!

system · March 19, 2025, 5:15pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Hackathon] Canny edge detection Community Showcase modular-hack-weekend	4	35	June 29, 2025
Compile and call Max model from c++ MAX	3	70	May 21, 2025
Idioms for using an index tensor in kernel computations GPU Programming	7	63	June 6, 2025
ONNX: difference in MAX cpu <-> gpu execution MAX debugging , 25_2	3	164	April 15, 2025
Modular weekend hack project - NMS (non max suppression) kernel in Mojo + Pytorch integration with YOLOv10 Community Showcase modular-hack-weekend	2	44	July 1, 2025

Using conv2d op on RGB image

Related topics