In this particular case I think your original guess is right, that the writes to barrier_buffer are being cached and not visible to other blocks. A while ago I found that compute shader writes to global buffers, under some circumstances, were apparently not being committed some time after until the block completed. Threads inside the block did see the new values, but an OpenGL shader drawing the results did not.
For C/CUDA programming atomics are recommended for communicating between blocks, but I have no experience using atomics in Mojo so this is just a suggestion.