Lifetime/performance questions

spluta · September 28, 2025, 3:44pm

I am writing an audio engine in mojo. FYI - mojo is great for dsp!

Audio graphs run at 48000+ times per second more or less, so I am trying to write efficient code this isn’t constantly allocating and destroying memory.

Question 1: This is a question about memory allocation and the efficiency of different approaches to using variables in a function. The following are 3 different versions of the same one-pole filter function that exist inside a struct. I would like to know which of the following is the best approach when the function is being called over an over again many thousands of times per second:

# new sample gets created in the fn and returned
fn next0(
    mut self, sample: SIMD[DType.float64, N], coef: SIMD[DType.float64, N]
) -> SIMD[DType.float64, N]:
    loc_samp = (1 - abs(coef)) * sample + coef * self.last_samp
    self.last_samp = loc_samp
    return loc_samp

# sample gets mutated in the fn and returned - not sure if this achieves anything
fn next1(
    mut self, mut sample: SIMD[DType.float64, N], coef: SIMD[DType.float64, N]
) -> SIMD[DType.float64, N]:
    sample = (1 - abs(coef)) * sample + coef * self.last_samp
    self.last_samp = sample
    return sample

# sample just gets mutated with no return. no variable declaration at all.
fn next2(
    mut self, mut sample: SIMD[DType.float64, N], coef: SIMD[DType.float64, N]
):
    sample = (1 - abs(coef)) * sample + coef * self.last_samp
    self.last_samp = sample

The ideal approach would get me the effect of next2 with the syntax of next0. What I want to do is just mutate the sample, but I would also prefer to call the function this way:

sample = one_pole.next1(sample, 0.99)

# vs

one_pole.next2(sample, 0.99)

next2 does what I want, but the syntax is inconsistent with regular mojo code.

Is there a performance cost in next0 vs next2 or am I overthinking this? Does next1 let me have my cake and eat it too or is it achieving nothing (is the sample returned as a copy no matter if it is mutable)?

Question 2: Inside a function of a struct which is called like above (audio-rate at 48K per second), is it more performant to declare a variable (SIMD values - floats and ints and such) in the fn of the struct or to use a struct variable, so:

fn next(...):
    temp: Float64 = whatever

# vs

fn next(...):
    self.temp = whatever

Thanks,

Sam

sora · September 28, 2025, 4:05pm

There should be no significant performance difference among the 3 spellings. I would suggest choosing the one that’s most sound from the API design point of view.
Again, there should be no difference, especially for small/reg types like float.

spluta · September 29, 2025, 2:13pm

Thank you for the answer. I am really loving this language. It is so elegant and expressive. I just figured out the power of parameters. Awesome stuff!

Sam

martinvuyk · September 29, 2025, 7:47pm

FYI since you are into signal processing. I’m about to merge (still working through some details) an fft implementation into MAX written in pure Mojo here is the repo.

Since you are one of the potential first users, I would welcome any feedback. The implementation forces the user to set compile-time-known tensor sizes for example, would that be a huge limiting factor for your use-case or not? (padding, etc. would also be done at the user-side)

spluta · September 29, 2025, 9:46pm

I have been following your amazing work. Can’t wait to try it. Looks like you got it working on cpu now as well. Love it.

Compile time tensor size makes sense to me. This seems idiomatic to audio processing, where you mostly want to set up the FFT and then run an fft - process - ifft.

I’d want to declare my rfft with rfft[1024]() and then run rfff(audio).

Since you say it is implemented in MAX, does that mean I have to call it from python and not Mojo? For audio, I’d want to call it straight from Mojo. For analysis, probably from python. It would be amazing if both were possible.

It would be great if there were some instructions on how to get it working. Right now i get an error “module ‘int_tuple’ does not contain ‘zip’“. I’m on mac so maybe I am missing some dependencies?

Sam

martinvuyk · September 29, 2025, 10:11pm

Compile time tensor size makes sense to me. This seems idiomatic to audio processing, where you mostly want to set up the FFT and then run an fft - process - ifft.

I’d want to declare my rfft with rfft[1024]() and then run rfff(audio).

You’ll be able to use the same function parametrized on whether it’s an inverse fft. Both use the same function just different parameters. I also made rfft use just the same fft function, the underlying implementation takes into account when the tensor shape means the input is real-valued.

Another big limitation is that I need every input to look like (batch_size, sequence_length, 1 / 2) (depending on real or imaginary input), and the output is always (batch_size, sequence_length, 2).

Since you say it is implemented in MAX, does that mean I have to call it from python and not Mojo? For audio, I’d want to call it straight from Mojo. For analysis, probably from python. It would be amazing if both were possible.

I mean I’ll put it in the MAX Mojo kernels directory. Though now that you mention it I’m not sure if that means it isn’t exposed to Mojo users. I would like it to be usable as a Mojo lib as well.

It would be great if there were some instructions on how to get it working. Right now i get an error “module ‘int_tuple’ does not contain ‘zip’“. I’m on mac so maybe I am missing some dependencies?

I get that particular LSP error as well, it still compiles for me. The code should be pretty much just pixi run mojo package ./fft -o ./build/fft.mojopkg and you should be able to take the .mojopkg and use it elsewhere for the same CPU (AFAIK)

martinvuyk · October 1, 2025, 11:16pm

from the latest Max nightly I was just able to import cufft’s irrft that is wrapped in Mojo by doing

from nn.irfft import irfft


fn main():
    irfft()

So once [kernels] Add fft implementation by martinvuyk · Pull Request #5378 · modular/modular · GitHub lands, you’ll be able to use it directly in Mojo

sletz · October 10, 2025, 6:40am

@spluta : hoping to add a MOJO backed in the Faust language in 2026.

sletz · October 10, 2025, 6:48am

You are using SIMD type with a recursive filter right ? To compute several filters in parallel ?

spluta · October 10, 2025, 2:27pm

Awesome and perfect timing for me, as I am now at the point where I can start to try and fold this in.

Sam

spluta · October 10, 2025, 2:48pm

Hi Stefan. So glad to hear this. Faust outputting Mojo would be hot.

You are using SIMD type with a recursive filter right ? To compute several filters in parallel ?

Yes. This is amazing, right? Below is my OnePole, which will accept an N “channel” SIMD and process in parallel. I have also made the Simper SVF, JOS Reson filters, and Zavalishen VALadder (with OverSampling). What approach is Faust going to take to building the “UGens”? Parameters make things very flexible.

Writing in Mojo feels similar to Faust in that you can build complex structures through composition. For example, I made the LP_Comb found in Freeverb with an integrated Zavalishen VAOnePole instead of a standard one pole for some extra sauce, but could easy pop that out and put the normal one back in. Good stuff.


struct OnePole[N: Int = 1](Representable, Movable, Copyable):
    """
    Simple one-pole IIR filter that can be configured as lowpass or highpass.

    ``OnePole[N]()``

    Parameters:
        N: Number of channels to process in parallel.
    """
    var last_samp: SIMD[DType.float64, N]  # Previous output
    
    fn __init__(out self):
        """Initialize the one-pole filter"""

        self.last_samp = SIMD[DType.float64, N](0.0)
    
    fn __repr__(self) -> String:
        return String("OnePoleFilter")

    fn next(mut self, input: SIMD[DType.float64, N], coef: SIMD[DType.float64, N]) -> SIMD[DType.float64, N]:
        """Process one sample through the filter

        Args:
            input: The input signal to process. Can be a SIMD vector for parallel processing.
            coef: The filter coefficient.

        Returns:
            The filtered output signal. Will be a SIMD vector if input is SIMD, otherwise a Float64.
        """
        coef2 = clip(coef, -0.999999, 0.999999)
        var output = (1 - abs(coef2)) * input + coef2 * self.last_samp
        self.last_samp = output
        return output

sletz · October 10, 2025, 3:11pm

The Faust/MOJO backend will be simple in a first step, probably not directly using the SIMD type. Is you MOJO DSP project public and visible somewhere ?

spluta · October 10, 2025, 3:13pm

Yes. I was just about to share this:

MMMAudio

Sam

Topic		Replies	Views
Llama2.🔥 performance degradation after updating code to the latest Mojo compiler Mojo performance	2	215	November 21, 2025
I have discovered a suspect efficiency anomaly in the mojo compiler, how to proceed? Mojo discussion , mojo-compiler , 25_1	20	411	March 8, 2025
High performance, fixed size 1D and 2D arrays on a CPU General	4	167	July 29, 2025
Exploring Metaprogramming in Mojo Content	1	586	May 28, 2025
Mojo vs max mandelbrot performance Mojo performance	6	290	May 19, 2025

Lifetime/performance questions

Related topics