My epoch of working on String is winding down (that said, one thing that I’d love is for StringLiteral to be non-materializable to String).
That said, I know there are bigger design decisions I know others have thought about, e.g. unicode support. I recently saw this blog post which is pretty interesting survey and covers some nice issues.
Is anyone interested in working on unicode support in String and have opinions? We have a design doc for string that would be great to fill out unicode support for, and I don’t know that anyone is working on it.
I am not sure about giant-grapheme-cluster attacks . The rules are meant for what is reasonably used. Extra processing power could be used to verify that a grapheme is more likely a glyph.
But the rules are already complex. For Codepoint to UTF-8, there is a correlation of 4 to 1, but for graphemes it is infinite.
For what do you limit the string length?
Not getting ddosed.
Charge the user for the amount of character typed.
There are only 64 continuation bytes. So a implicit breakpoint after 64 bytes would be inserted, effectively limiting the grapheme length to 64 codepoints. This could be limited even further.