Reviewing the Iterator/Iterable Abstraction

With the recent work on Iterators in the stdlib, I think it’s a good idea to have another discussion on Mojo’s iterator API.

Explicitly not in Scope

Things which are out of scope for this discussion.

  • The inability to put trait bounds on Iterator.Element, requires solves this.

Current API

trait Iterator(Copyable, Movable):
    """The `Iterator` trait describes a type that can be used as an
    iterator, e.g. in a `for` loop.
    """

    alias Element: Copyable & Movable

    fn __has_next__(self) -> Bool:
        ...

    fn __next__(mut self) -> Self.Element:
        ...

    fn bounds(self) -> Tuple[Int, Optional[Int]]:
        """Returns bounds `[lower, upper]` for the remaining iterator length.

        Returns a tuple where the first element is the lower bound and the second
        is an optional upper bound (`None` means unknown or `upper > Int.MAX`).
        This helps collections pre-allocate memory when constructed from iterators.

        The default implementation returns `(0, None)`.

        ### Safety
        If the upper bound is not None, implementations must ensure that `lower <= upper`.
        The bounds are hints only - iterators may not comply with them. Never omit safety
        checks when using `bounds` to build collections.

        Example:
        ```mojo
        fn build_list[I: Iterator & Iterable](iter: I) -> List[I.Element]:
            var lower, _upper = iter.bounds()
            var list = List[I.Element](capacity=lower)
            for element in iter:
                list.append(element^)
            return list
        ```
        """
        return (0, None)

trait Iterable:
    """The `Iterator` trait describes a type that can be turned into an
    iterator.
    """

    alias IteratorType[mut: Bool, //, origin: Origin[mut]]: Iterator

    fn __iter__(ref self) -> Self.IteratorType[__origin_of(self)]:
        ...

There is also fn __next_ref__(mut self) -> ref [origin] Self.Element:, which is present on many collections and is somewhat attached to the Iterator API.

Alternatives In Other Systems Languages

Rust

pub trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;

   //76 optional methods with defaults based on `next`.
   //I have included some notable ones

   // Used for chunked iteration, often with SIMD
   fn next_chunk<const N: usize>(
        &mut self,
    ) -> Result<[Self::Item; N], IntoIter<Self::Item, N>>
       where Self: Sized { ... }

    // Equivalent to Mojo's `bounds`
    fn size_hint(&self) -> (usize, Option<usize>) { ... }

   // We need to be able to union two origins in order to handle this
   fn chain<U>(self, other: U) -> Chain<Self, <U as IntoIterator>::IntoIter> ⓘ
       where Self: Sized,
             U: IntoIterator<Item = Self::Item> { ... }
   
   // Do we want the ability to have an iterator over `out` values to handle things like this? 
   fn collect_into<E>(self, collection: &mut E) -> &mut E
       where E: Extend<Self::Item>,
             Self: Sized { ... }
}

C++

The actual interface is too much to include here, so here’s a link: std::iterator_traits - cppreference.com

Included for completeness, I don’t think this is a great idea.

Issues with the current Mojo API

__has_next__

For many kinds of iterators, such as those which pull from linked lists, the work required to answer this and the work required to retrieve the next item are nearly identical. This can also introduce odd side-effects for iterators which take multiple items from another iterator they contain, for instance, a SIMD iterator may need __has_n_next__[N: UInt] where N is the SIMD width. I am concerned that this API may lead to a lot of duplicated checking or buffering inside of iterators. Additionally, we should consider APIs which are pulling from *PMC queues, where __has_next__ is a race condition without buffering.

The other major issue is what iterators should do when __has_next__ is false and the user calls __next__. The current behavior for List is to abort the program with debug assertions enabled or have UB in the default case. However, in some circumstances it may be preferable to raise. If we’re already raising, that’s functionally equivalent to returning Tuple[Bool, T], at which point Optional is right there.

Iteration by reference

Right now, we can do iteration by reference via either the __next_ref__ helper, which isn’t technically part of the iterator API, or by returning Pointer[T, origin]. The latter was considered a very strong point of annoyance as far as the community was concerned. However, owning iterators have no good way to implement this behavior since they may be returning a reference to a temporary, such as would be the case for range.

Two options discussed in the past to help handle this are __deref__, which is used in a similar manner to Rust’s Deref/DerefMut. Another option would be a __getattr__ overload but that might introduce a lot of compiler overhead. A third option is an OwnedOrigin which can be used to shuffle everything through ref.

Iterable.__iter__ taking ref self

This means that you can’t transfer ownership of a collection to an iterator, which may cause issues for iterators intended to drain the collection for use with Linear types to allow destroying collections which hold linear types.

__has_next__ and Generators

Ideally, we want to have python-style generators at some point, since they are useful for implementing both futures for async and to do make the compiler automate many of the currently manual transformations used when writing iterators. We could easily make all generators which return None map to an Optional-style iterator. However, we don’t really have a good way to do __has_next__ without buffering.

Alternatives

Optional

  • Solves many of the issues with __has_next__.
  • Iterable.__iter__ taking ref self is orthogonal.
  • Requires returning Pointer[T, _] for references.
  • Requires that Element can move/copy into Optional, which may require special compiler handling to elide.
  • Problems with batching
// Iterable left the same

trait Iterator(Copyable, Movable):
    """The `Iterator` trait describes a type that can be used as an
    iterator, e.g. in a `for` loop.
    """
    alias Element: Copyable & Movable

    fn __next__(mut self) -> Optional[Self.Element]:
        ...

    fn bounds(self) -> Tuple[Int, Optional[Int]]:
        ...

Just return a boolean

This would require multiple out arguments to be allowed, which may be desirable anyway. It also has the issue of making any users who forget to check is_valid walk into an uninitialized elem, or one which may hold data from the previous iteration.

I personally don’t think this is a good idea outside of very specialized circumstances

// Iterable left the same

trait Iterator(Copyable, Movable):
    """The `Iterator` trait describes a type that can be used as an
    iterator, e.g. in a `for` loop.
    """
    alias Element: Copyable & Movable

    fn __next__(mut self, out elem: Element, out is_valid: Bool):
        ...

    fn bounds(self) -> Tuple[Int, Optional[Int]]:
        ...

raise StopIteration

The pythonic approach.

// Iterable left the same

trait Iterator(Copyable, Movable):
    """The `Iterator` trait describes a type that can be used as an
    iterator, e.g. in a `for` loop.
    """
    alias Element: Copyable & Movable

    fn __next__(mut self) raises StopIteration -> Element:
        ...

    fn bounds(self) -> Tuple[Int, Optional[Int]]:
        ...

Example Usage

This is what the implementation would need to look like after desugaring.

try:
    while True:
        var item = next(iter)
        ...
except StopIteration:
    pass

Handling Batching

In order to make good use of SIMD, we’ll need an iterator interface that can handle batching items. We need to support non-masked SIMD due to many SIMD ISAs not having masking, which means that we can’t just have a MaskedSIMD type which has a paired mask. I’m going to implement these using the Optional form of the API but most can be translated to __has_next__ or one of the alternatives.

Variant

This can handle SIMD and drain loops, and may have better instruction locality than a normal iterator. However, the interface is much more complex as a result.

// Iterable left the same

trait Iterator(Copyable, Movable):
    """The `Iterator` trait describes a type that can be used as an
    iterator, e.g. in a `for` loop.
    """
    alias Element: Copyable & Movable
    alias FixedBatchElement: Copyable & Movable & Iterator requires FixedBatchElement.Element == Element
    alias DynamicBatchElement: Copyable & Movable & Iterator requires DynamicBatchElement.Element == Element

    fn __next__(mut self) -> Variant[FixedBatchElement, DynamicBatchElement, None]:
        ...

    # What should this return? The number of fixed batches left? 
    fn bounds(self) -> Tuple[Int, Optional[Int]]:
        ...

Multiple Iterators

This could potentially be handled with an iterator that transforms along a progression of return types for more normal loops, but I can imagine circumstances (such as a producer/consumer queue), where having both fixed and dynamic batch sizes intermingled would be helpful.

For example:

trait Iterator(Copyable, Movable):
    """The `Iterator` trait describes a type that can be used as an
    iterator, e.g. in a `for` loop.
    """
    alias Element: Copyable & Movable

    fn __next__(mut self) -> Optional[Element]:
        ...

    fn bounds(self) -> Tuple[Int, Optional[Int]]:
        ...

struct ListSimdIterator[mut: Bool, //, dtype: DType, origin: Origin[mut]](Iterator):
    alias Element = SIMD[dtype, simdwidthof[dtype]()]
    
    var idx: UInt
    var inner: Pointer[List[Scalar[dtype]], origin]

    fn __next__(mut self) -> Optional[Element]:
        ...

    fn bounds(self) -> Tuple[Int, Optional[Int]]:
        ...

extension[dtype: DType] List[Scalar[dtype]]:
    fn simd_iter(ref self) -> (ListSimdIterator[dtype, Origin[self]], ListIter[Scalar[dtype], Origin[self]]):
        ...

Additional misuse-resistance can be added using proof tokens or other techniques. In theory, we could also use type-level programing to produce a drain loop which uses less instructions (ex: make the main loop 256-bit, then have an optional 128-bit SIMD, then a scalar iterator). However, as currently written this may require duplicating the main loop.

Masking

We can also decide that performance on anything that isn’t a a recent ARM server chip, AMD Zen 4 or newer, Intel server, or RISC-V Vector CPU doesn’t matter that much. Most GPUs can do masking at the Warp/Wavefront level, so perhaps some creative abstractions can make it work there. If we do that, then batching becomes much less of an issue.

Other considerations

  • Do we want to handle async iterators (Rust Streams)? Those need a discussion about async cancellation and how you add a timeout to __next__.
  • This entire API needs to be duplicated for raising iterators, as well as any other “effects” that Mojo finds (async, blocking, interrupt safety stuff, potentially warp32 vs warp64 on AMD, etc).
  • Given that Mojo tires to avoid magic compiler optimizations, how do we make iterators inspectable so that we can perform optimizations with @parameter if (ex: merging adjacent Filter/Map iterators, making .count() on an iterator with a known size just grab that size, etc)
  • Should we wait until Mojo has Generators and yield to implement this so that the compiler is responsible for all of the the state tracking and transformations?

For example here’s what the current ListIter could turn into:

extension[T: Copyable & Movable] List[T](Iterable):
    fn __iter__(ref self):
        for i in range(len(self)):
            # `get_unsafe` used to avoid bounds checks
            yield self.get_unsafe(i).copy() # `ListIter.__next__` copies right now

Although we lose bounds, that might be possible to add back with extensions provided it’s possible to name the generator. This might not be possible for compiler reasons in which case we’d need to revisit the idea.

ex:

extension[T: Copyable & Movable, origin: Origin] List[T].IteratorType[origin]:
    @always_inline
    fn bounds(self) -> Tuple[Int, Optional[Int]]:
        ...

Using generators also means that the compiler can check that a particular type is only returned once. For example, None or the drain loop part of a SIMD iterator.

  • What about bidirectional iterators?
  • Linear types and iterators which buffer things, how do we handle that?
  • Should thread-safe queues *PMC go under the iterator abstraction at all?

cc @duck_tape @martinvuyk @nate

4 Likes

Awesome write up!

I’d suggest one more discussion point, which is that I don’t think that for loops should require Iterable and instead that the item just be an Iterator, adopting the rust convention (or something like it) of having collections have methods like .iter(), .into_iter(), etc (TBD on how iter mut would propagate).

This would (I think) solve the owned iterator issue because we wouldn’t have to go through a trait that requires ref self.

Example:

trait RustyIterator(Copyable, Movable):
    alias Element: Copyable & Movable

    fn next(mut self) -> Optional[Self.Element]:
        ...


@fieldwise_init
struct OwningIterator(RustyIterator):
    alias Element = Int

    var inner: MyDatastructure

    fn next(mut self) -> Optional[Self.Element]:
        return self.inner.some_data.pop() if len(
            self.inner.some_data
        ) > 0 else Optional[Self.Element](None)


@fieldwise_init
struct RefIterator[
    inner_mut: Bool, //,
    inner_origin: Origin[inner_mut],
](RustyIterator):
    alias Element = Pointer[Int, inner_origin]

    var inner: Pointer[MyDatastructure, inner_origin]
    var index: UInt

    fn next(mut self) -> Optional[Self.Element]:
        if self.index == 0:
            return Optional[Self.Element](None)
        else:
            self.index -= 1
            return rebind[Self.Element](
                Pointer(to=self.inner[].some_data[self.index])
            )


@fieldwise_init
struct MyDatastructure(Copyable, Movable):
    var some_data: List[Int]

    fn into_iter(var self) -> OwningIterator:
        return OwningIterator(self^)

    fn iter(ref self) -> RefIterator[__origin_of(self)]:
        return RefIterator(Pointer(to=self), len(self.some_data))


fn main():
    var mydata = MyDatastructure(List(1, 2, 3, 4, 5))

    var iter_ref = mydata.iter()
    while datum := iter_ref.next():
        var ptr = datum.take()
        ptr[] += 1
        print(ptr[])

    print("---")

    var iter = mydata^.into_iter()
    while datum2 := iter.next():
        print(datum2.take())

Those loops could then look like the following if for loops only required Iterator:

# Iter by ref
for datum in mydata.iter():
    print(datum)

# Iter over owned values
for datum2 in mydate^.into_iter():
    print(datum2)

Note that this could work with Optional or __has_next__ as Owen outlined in the original post.

IMO Python has already “solved” this problem by having __next__ raise an exception, and I can’t see why Mojo needs to innovate here. Exceptions are often more efficient than optionals, because (for owning iterators) you can eliminate a move operation.

Exceptions would be especially nice if at some point Mojo gains a shorthand for converting an exception into an optional, e.g. x = next?(iter).

If we are going to diverge from Python, we at least need to have a solid justification.

1 Like

IMO Python has already “solved” this problem by having __next__ raise an exception, and I can’t see why Mojo needs to innovate here. Exceptions are often more efficient than optionals, because (for owning iterators) you can eliminate a move operation.

That is one of the options, but, if next(iter) also does that, it would mean a try/except around all of those or to make the function raising? That would make writing iterator adapters like map very annoying, and allowing __next__ to raise would mean you could raise other things from it, so we may have a case where iterators can raise any exception until we get to checked exceptions and can specify what types may be raised by a function in a trait. I’m not sure I like that.

I also checked and, at least for a simple case, the current __has_next__ and raisesgenerate the same assembly.

Exceptions would be especially nice if at some point Mojo gains a shorthand for converting an exception into an optional, e.g. x = next?(i@exporter).

I think that converting into a Rust-style Result, so that you can get the error out, would be better.

I think it may be a good idea to do what you suggest and back off to “for loops need an Iterator" until the dust settles on what these APIs are going to look like. We probably want to for i in [1, 2, 3] work at some point, but there aren’t many good defaults for what kind of iteration to do, especially in a way that won’t require people to remember what collections do what. In many cases copies are unwelcome, in others not doing them may cause a lot of lifetime issues. I lean towards making ref iteration the default, but some people may find things like integers from a list not copying surprising.

1 Like

Right now “raises” is tied to the Error type which involves malloc/free - using that as a stop condition is not going to be very efficient. We’re aiming for “typed raises” in the next quarter or so, which should solve this.

2 Likes

Based on the godbolt link above in my response to Nick, it looks like the compiler can actually eliminate the malloc/free in Error if it’s a static string if that string is never accessed since it’s a static storage duration. Even the following works fine and only has a free for the list, which handles iterators which need to raise:

@fieldwise_init
struct ListIter:
    var inner: List[UInt64]
    var index: Int

    fn __next__(mut self, out result: UInt64) raises:
        if self.index >= len(self.inner):
            raise "Stop Iteration"
        
        result = self.inner[self.index]

        self.index += 1


@export(ABI="C")
fn sum_list_raising(var l: List[UInt64]) raises -> UInt64:
    var iter = ListIter(l^, 0)

    var sum: UInt64 = 0

    try:
        while True:
            sum += iter.__next__()
    except e:
        if "Stop Iteration" != e.as_string_slice():
            raise e

    return sum

In more complex scenarios it might need an allocation, but I couldn’t make it do an extra one even when adding a raise based on information about the inner list of ListIter.

I know it will malloc/free if you ask for stack traces with your errors, which may cause issues. Is that what you’re concerned about?

If this were the case though, you wouldn’t be able to write the following

var list = [1, 2, 3]
for elem in list:
    print(elem)

Since the List type only implements Iterable and not Iterator.

To make it work with what you’re suggesting we would have to write

for elem in iter(list): ...
for elem in list.__iter__(): ...

And maybe that’s what we want? It just seems a bit annoying from a UX perspective but I could be convinced.

What mojo is currently laking is the ability to say if the __iter__ function can also take an “owned” value of self (not just a ref). Not sure if there is a solution in something like OwnedIterable or a different overload of __iter__(var self) which the compiler calls in certain cases.

+1 on at least taking a real look at the raises route.

I do see though how needing to deal with try/except blocks would be annoying and can lead to a bad UX. However, the nice part is most devs won’t even have to deal with the try/except block at all if they use the for <x> in <y> syntax. Writing adaptors may be a bit of pain but that’s why the stdlib should provide them so people don’t have to write them.

And as Chris said - a “typed” raises eliminates a lot of other concerns too.
In the meantime if we pursue this route we can just have a standard IteratorError type/function as a placeholder.

If we went with what I’m suggesting, that example would look like:

for elem in [1, 2, 3].iter(): # or into_iter()
    ...

I think that the flexibility of not having to go through the Iterable trait would be worth it in the short term as we explore the Iterator design space. It would be pretty easy to change back to requiring Iterable at a future point with this choice.

If there were a way to overload __iter__(var self), that would solve the same problem and I’d be totally fine with that! I vaguely remember that with implicit conformance / some of the magic dunder methods, you could actually use either owned or ref for __iter__?

As for exceptions vs Optional vs __has_next__ I have no preference outside of whichever has the least overhead and the best performance.

I find it annoying to write iterators that use __has_next__ vs just Optional, but if it saves a move, it’s worth it. I’m skeptical of exceptions for use as control flow, I don’t see why that would be better than Optional.

1 Like

Would using exceptions back us into a corner on some hardware targets where we may want to disallow raises entirely? Or do typed exceptions sidestep that whole problem? (Specifically thinking about nostd and panics in Rust)

Mojo execeptions are pretty special and don’t behave like Python/C++ exception. Mojo execution are super light weight and don’t have some of the nasty stack unwinding properties of C++ exception or Rust panic!.
In short you can think of the following functions are essentially equivalent. :slight_smile:

fn foo() raises -> Int:
    ...

# compiler magically transforms the above function into this (kinda...)
fn foo() -> Result[Int, Error]:
    ...

Typed exception only improve on this even more! Since instead of returning Result[T, Error] where Error is the stdlib Error type, you can return a custom error type which will likely be lighter weight (since the stdlib’s Error can potentially allocate).

2 Likes

I think that there is a reasonable argument to be made that until we can figure out a good UX for read ref vs mut ref vs owned, it may be better to leave the syntax sugar out. Right now, specifying that you want a particular kind of iterator is difficult and reference iteration isn’t even technically a part of the iterator interface. I’ve personally found read vs mut inference to be fairly fragile, and to me LayoutTensor bypassing most of this and doing very custom things means that the current interface isn’t good enough.

I agree that .__iter__() is quite annoying, so we could provisionally adopt Rust’s iter(), iter_mut() and into_iter().

1 Like

If there were a way to overload __iter__(var self), that would solve the same problem and I’d be totally fine with that! I vaguely remember that with implicit conformance / some of the magic dunder methods, you could actually use either owned or ref for __iter__?

I think that we’re finding another motivation for having an OwnedOrigin to help abstract these kinds of things. Many iterator types, like adapters, simply do not care about the calling convention.

I think that this is something that Mojo should address for the sake of unifying abstractions, since I’ve also wanted it for other proposals like my IO engine proposal.

Regarding performance

Just take a look at rust’s Zip implementation. It defines a optimization for some stdlib iterator and only for those. I.e. custom iterators don’t benefit from these optimizations.

LLVM do while optimization

__var__, ok = next() # not inlined
if not ok:
	goto @done
@loop:
#<block>#
__var__, ok = next()
if ok:
	goto @loop
@done:

Here, the naive iterator design sacrifices performance.

Solution sketch

Iterator like zip, enumerate and map benefit from the current __has__next__ design.
Thus the ideal solution is to let implementations choose which design they implement.

struct Map(Iterator):
	fn __has__next__(): 
		...
	fn __next_checked__():
		...
struct Filter(Iterator):
	fn __next__():
		...

Regarding optimal usage of iterators

In my opinion mojo should lean more heavily towards generators and other python sugar.
These are more readable, easier to refactor, more familiar to a python programmer.
This means limiting the iterator APIs to builtins and itertools.

@python_compat("flatcase")
fn take_while(iter, pred)
fn filter_false(iter, pred)
fn filter(iter, pred)

While I see potential for an interface of this kind, I don’t think the Iterator interface is the right abstraction. It introduces to much complexity, while only benefiting a niche use case.

In my opinion mojo should lean more heavily towards generators and other python sugar.
These are more readable, easier to refactor, more familiar to a python programmer.
This means limiting the iterator APIs to builtins and itertools.

I don’t think making the iterator API non-extensible is a good idea. There are a lot of things which probably don’t belong in the stdlib, like parallel iterators a-la Rust’s rayon, that I think are valuable to enable.

I’m in favor of generators so long as we can add extensions to them to make the API better (ex: add a bounds method). However, that may mean delaying iterators for a year or more.

1 Like

Not sure how I feel about this, but maybe this is the Python side of me coming out. This feels super redundant.

for elem in iter(list): ...
for elem in list.__iter__(): ...

I do wonder whether we should have a hybrid approach:

for elem in list: ...             # immutable ref.
for elem in list.mut_iter(): ...  # mutable ref
for elem in list.into_iter(): ... # deinit list, owned iter, whatever it 
                                  # wants to do with it's values
for elem in list.xor_iter(): ...  # (eventually) exclusive borrow iter  

I will say that there are good arguments for removing the syntax sugar for the immutable scenario.

I think the current iterator ownership / mutability sugar breaks down super quick. For example:

for ref elem in list: ... # Why is this mutable
for elem in list: ...     # but this immutable

I understand the ref takes the mutability of the list and thats why its mutable, but it just doesn’t feel like ref is doing much for us here otherwise. If we have the following:

for ref elem in list: ...           # mutable
for elem in list: ...               # immutable
for ref elem in read_only_list: ... # immutable (ref does nothing)
for elem in read_only_list: ...     # immutable

It doesn’t feel very parametric for syntax to just say “Hey this mutable element from this mutable list should remain mutable” and have no other purpose.

Is there an example for for ref element in collection does something else? Otherwise, I think requiring explicit iterator calls probably is the way to go for now.

Aren’t functions by definition more extensible than methods. The only difference I see is ergonomics.
If you wanted an parallel iterator I would rather add an language extension than for_each.

I have no issues with methods like bounds, because those don’t lead to unreadable code. However I think mojo should find another way to add extension for other use cases.
Adding methods to generators requires access to underlying field, which can be tricky because of optimizations, i.e. the field can’t be removed. Nevertheless I see no reason why it shouldn’t be possible.
If you are concerned about ergonomics, generator expression can be implemented in terms of map, filter and flatten and do not require coroutine support.

def generator_expression_or_iterator_adaptor(
...     X,
...     Y_fn,
...     expr,
...     cond_x=lambda x: True,
...     cond_y=lambda x, y: True,
...     return_type='iterator'  
... ):
...     if return_type == 'generator':
...         return (expr(x, y) for x in X if cond_x(x) for y in Y_fn(x) if cond_y(x, y))
...     elif return_type == 'iterator':
...         return chain.from_iterable(
...             map(
...                 lambda x: map(
...                     lambda y: expr(x, y),
...                     filter(lambda y: cond_y(x, y), Y_fn(x))
...                 ),
...                 filter(cond_x, X)
...             )
...         )
...     else:
...         raise ValueError("return_type must be either 'iterator' or 'generator'")
...

I just don’t see why mojo should make unreadable iterator adapters like these easier to write. Python has already sugar which can be easily refactored into loops if required.

I don’t think making the iterator API non-extensible is a good idea. There are a lot of things which probably don’t belong in the stdlib, like parallel iterators a-la Rust’s rayon, that I think are valuable to enable.

Aren’t functions by definition more extensible than methods. The only difference I see is ergonomics.
If you wanted an parallel iterator I would rather add an language extension than for_each.

That depends. It’s much harder to inspect the result of a function for extra information like bounds. As for a language extension for for_each, there’s a million different ways you could implement it. For instance, do you want to only work on pcores or ecores if you’re on intel consumer? Do you want to avoid crossing sockets or SNC/NPS domains? Do you want to use Mojo’s main threadpool, or does this do blocking IO operations with direct IO which mean it should be a separate one? Should the current thread participate or should this be an async function that awaits the threadpool finishing things so this thread can do other stuff? What if I want my for_each to use both the CPU and the GPU? Should the iterator do prefetching for me?

Having an object means that we can attach extra information to it to allow for better generic programming. Making it something controlled by library code means that Mojo doesn’t need to have the several dozen parameters dealing with all of that would be exposed on every iterator.

I’m in favor of generators so long as we can add extensions to them to make the API better (ex: add a bounds method). However, that may mean delaying iterators for a year or more.

I have no issues with methods like bounds, because those don’t lead to unreadable code. However I think mojo should find another way to add extension for other use cases. Adding methods to generators requires access to underlying field, which can be tricky because of optimizations, i.e. the field can’t be removed. Nevertheless I see no reason why it shouldn’t be possible.

I mean generators in terms of things like this:

extension[T: Copyable & Movable] List[T](Iterable):
    alias Element = T

    fn __iter__(ref self):
        for i in range(len(self)):
            yield self[i]

We’d some way to name the return type of a function and some way to poke at the captured variables to add the bounds function, but that shouldn’t be that bad to do.

In this case, we can have a wildcard extension that makes all generators which return None and yield a value implement Iterator.

2 Likes

In addition to all the reasons Owen mentioned, it allows chaining iterators in a readable left to right manner without any additional language changes make things more readable.

I’d want your example to look like (as an example, I don’t think tuple patter matching works right now??, also assuming implicit return in lambdas):

return X.iter()
     .filter(lamda x: cond_x(x))
     .map(lambda x: (x, Y_fn(x)))
     .filter(lambda (_x, y):  cond_y(y) )
     .map(lambda (x, y): expr(x, y) )

This is as readable as the comprehension, and, IMO, scales better as you want to do more complex things like take(5) or skip or group_by, etc.

Again though, I think an important change is moving way from an Iterable trait so that all iterator chaining, and any builtin syntax for working with iterators can just work with the Iterator trait, which leaves space for creating those Iterators in whatever way works best, including creating parallel iterators etc.

3 Likes