by lordnacho on 10/8/2025, 7:28:11 PM
by Neywiny on 10/8/2025, 3:38:56 PM
That's an incredible find and once I saw the assembly I was right along with them on the debug path. Interestingly it doesn't need to be assembly for this to work, it's just that that's where the split was. The IR could've done it, it just doesn't for very good reasons. So another win for being able to read arm assembly.
Unsure if this would be another way to do it but to save an instruction at the cost of a memory access you could push then pop the stack size maybe? Since presumably you're doing that pair of moves on function entry and exit. I'm not really sure what the garbage collector is looking for so maybe that doesn't work, but I'd be interested to hear some takes on it
by yalok on 10/8/2025, 8:02:52 PM
Classic problem of non-atomic stack pointer modification.
Used to have a lot of fun with those 3 decades ago.
by dreamcompiler on 10/8/2025, 4:12:43 PM
Always adjust your stack pointer atomically, kids.
by riobard on 10/8/2025, 4:39:32 PM
What ARM64 machines are you using and what are they used for? Last year you were announcing Gen 12 servers on AMD EPYC (https://blog.cloudflare.com/gen-12-servers/), but IIRC there weren’t any mentions of ARM64. But now it seems you’re running ARM64 in full production.
by Agingcoder on 10/8/2025, 3:51:55 PM
Excellent article as always from the cloudflare blog - engineering without magic infrastructure and ml. One day I will apply !
Compiler bugs are actually quite common ( I used to find several a year in gcc ), but as the author says, some of them only appear when you work at a very large scale, and most people never dive that far.
by mperham on 10/8/2025, 6:30:40 PM
Did they ever explain why netlink was involved? Or was that a red herring?
by javierhonduco on 10/8/2025, 3:06:06 PM
Really enjoyed reading this. Thanks for writing it!
by pengaru on 10/8/2025, 4:30:23 PM
For the impatient, here's the fix: https://github.com/golang/go/commit/f7cc61e7d7f77521e073137c...
by renewiltord on 10/8/2025, 4:53:53 PM
Great technical blog. Good pathway for narrative, tight examples, description so clear it makes me feel smarter than I am because so easy to follow though the last time I even read assembly seriously was x86 years ago.
Also, fulfills the marketing objective because I cannot help but think that this team is a bunch of hotshots who have the skill to do this on demand and the quality discipline to chase down rare issues.
I assume these are Ampere Altra? I was considering some of those for web servers to fill out my rack (more space than power) but ended up just going higher on power and using Epyc.
by gok on 10/8/2025, 4:46:59 PM
The real lesson here should be that doing crazy shit like swizzling the program counter in a signal handler and writing your own assembler is not a good idea.
by brcmthrowaway on 10/8/2025, 6:41:48 PM
I don't get it, how were the machine threads being stopped in thr middle of two instructions? This is baremetal, right?
by wat10000 on 10/8/2025, 5:44:47 PM
I would have thought that unwinding would use the frame pointer and this wouldn't be a problem.
by berz01 on 10/8/2025, 6:28:32 PM
solid research, looks like you'd make a great CRUD engineer.
> This was a very fun problem to debug.
I'm sure it was a relief to find a thorough solution that addressed the root cause. But it doesn't seem plausible that it was fun while it was unexplained. When I have this kind of bug it eats my whole attention.
Something this deep is especially frustrating. Nobody suspects the standard library or the compiler. Devs have been taught from a young age that it's always you, not the tools you were given, and that's generally true.
One time, I actually did find a standard library bug. I ended up taking apart absolutely everything on my side, because of course the last hypothesis you test is that the pieces you have from the SDK are broken. So a huge amount of time is spent chasing the wrong lead when it actually is a fundamental problem.
On top of this, the thing is a race condition, so you can't even reliably reproduce it. You think it's gone like they did initially, and then it's back. Like cancer.