Call site inlining

Does mojo plan to support function inlining at the function call level as well as the function definition level?

For example:

# We don't want to declare this as always_inline as this will bloat our binary
fn large_function(i: Int64):
    # 100+ lines of code
    pass
    

def main():
    for i in range(1_000_000):
        # since this is a hot loop, inlining `large_function` here and here only could provide performance benefits
        @inline_function_call
        large_function(i)

For a more concrete example:

# pseudo code, i forget the syntax for with capacity in mojo and my LSP is broken 😅
var lst = List[Int].with_capacity(10_000)

for i in range(10_000):
    @inline_function_call
    lst.append(i)

    # inlining could potentially eliminate the lst.length == lst.capacity check 
    # which would normally run 10,000 times for no reason (as well as eliminate the overhead of 
    # 10,000 function calls)
1 Like

In practice, the Mojo compiler inlines List.append in this case. Since large_function is called at a single site, it’s reasonable to expect inlining here as well.

In the general case, llvm is aware of things like the icache size for a particular CPU, and Mojo tells it what CPU it is compiling for. It will make attempts to not overflow that for anything it thinks could be a hot loop, since icache misses are not great for performance.

Things like “inline this particular invocation” are useful tools, but they can also cause performance regressions if misused. For example, that “large function” might be constant folded into something tiny.

If you care about code size, -Os is available to you. Otherwise, on modern systems, especially for ML apps like Mojo uses, the stuff you package with your application will often be much larger than the application itself.

I like this idea! It feels inline (pun intended) with the goal modular has stated of moving magic out of the compiler and into the libraries.