Call site inlining

Does mojo plan to support function inlining at the function call level as well as the function definition level?

For example:

# We don't want to declare this as always_inline as this will bloat our binary
fn large_function(i: Int64):
    # 100+ lines of code
    pass
    

def main():
    for i in range(1_000_000):
        # since this is a hot loop, inlining `large_function` here and here only could provide performance benefits
        @inline_function_call
        large_function(i)

For a more concrete example:

# pseudo code, i forget the syntax for with capacity in mojo and my LSP is broken šŸ˜…
var lst = List[Int].with_capacity(10_000)

for i in range(10_000):
    @inline_function_call
    lst.append(i)

    # inlining could potentially eliminate the lst.length == lst.capacity check 
    # which would normally run 10,000 times for no reason (as well as eliminate the overhead of 
    # 10,000 function calls)

In practice, the Mojo compiler inlines List.append in this case. Since large_function is called at a single site, it’s reasonable to expect inlining here as well.

In the general case, llvm is aware of things like the icache size for a particular CPU, and Mojo tells it what CPU it is compiling for. It will make attempts to not overflow that for anything it thinks could be a hot loop, since icache misses are not great for performance.

Things like ā€œinline this particular invocationā€ are useful tools, but they can also cause performance regressions if misused. For example, that ā€œlarge functionā€ might be constant folded into something tiny.

If you care about code size, -Os is available to you. Otherwise, on modern systems, especially for ML apps like Mojo uses, the stuff you package with your application will often be much larger than the application itself.

I like this idea! It feels inline (pun intended) with the goal modular has stated of moving magic out of the compiler and into the libraries.