by evrimoztamur on 5/17/2025, 12:00:56 PM
by constantcrying on 5/17/2025, 11:22:27 AM
To be honest an unsurprising result.
But I think the paper fails to answer the most important question. It alleges that this isn't a statistical model: "it is not a statistical model that predicts the most likely next state based on all the examples it has been trained on.
We observe that it learns to use its attention mechanism to compute 3x3 convolutions — 3x3 convolutions are a common way to implement the Game of Life, since it can be used to count the neighbours of a cell, which is used to decide whether the cell lives or dies."
But it is never actually shown that this is the case. It later on isn't even alleged that this is true, rather the metric they use is that it gives the correct answers often enough, as a test for convergence and not that the net has converged to values which give the correct algorithm.
But there is no guarantee that it actually has learned the game. There are still learned parameters and the paper doesn't investigate if these parameters actually have converged to something where the Net is actually just a computation of the algorithm. The most interesting question is left unanswered.
by Dwedit on 5/17/2025, 1:10:39 PM
Rip John Conway, died of Covid.
by Nopoint2 on 5/17/2025, 12:47:05 PM
I don't get the point. A simple CNN with stride =1 should be able to solve it perfectly and generalize it to any size.
by wrs on 5/17/2025, 3:45:49 PM
I was hoping for an explanation of, or some insight from, the loss curve. Training makes very little progress for a long time, then suddenly converges. In my (brief) experience with NN training, I typically see more rapid progress at the beginning, then a plateau of diminishing returns, not an S-curve like this.
by bonzini on 5/17/2025, 11:18:26 AM
Do I understand correctly that it's brute forcing a small grid rather than learning the algorithm?
by eapriv on 5/17/2025, 11:23:00 AM
Great, we can spend crazy amount of computational resources and hand-holding in order to (maybe) reproduce three lines of code.
by amelius on 5/17/2025, 12:49:07 PM
But can it condense it into a small program?
by xchip on 5/17/2025, 12:53:58 PM
Even a simple regression will do that
I would like to point out a much more exciting modelling process, whereby neural networks extract the underlying boolean logic from simulation outputs: https://google-research.github.io/self-organising-systems/di...
I firmly believe that differentiable logic CA is the winner, in particular because it extracts the logic directly, and thus leads to generalize-able programs as opposed to staying stuck in matrix multiplication land.