Hacker News Clone

Biggest CPU for the Bad System

by opentokix on 5/13/2024, 2:05:13 PM with 32 comments

by h2odragon on 5/13/2024, 2:16:43 PM
https://old.reddit.com/r/sysadmin/comments/1cqn3qa/whats_the...
> The whole thing is about 10k APIs that all share the same cluster of 10 databases on the backend, which was never designed to scale like this. This company did 500 million revenue 2010 and now 15 billion this year, all running on this fking sql back end. They have a team of 500 devs writing for these apps, the complexity is unbelievable. No one knows how to untangle it and scale out to micro services.
by rbanffy on 5/13/2024, 2:52:49 PM
It's hard to give significant advice with this little information - how much time the CPUs spend waiting for the memory, how many cache misses are happening, how many core execution units are doing something at any given time, etc.
HPE has single-image machines that can have up to 16 4th gen Xeons, which gives a top limit of 960 cores. IBM has POWER10 boxes that go up to 240 cores (but they are POWER 10 cores that can do, IIRC, up to 8 threads per core (increasing cache misses, but reducing unused execution units).
by upon_drumhead on 5/13/2024, 8:49:28 PM
This is fake, just rage bait. Besides the numbers in the post just not making any sense, the OP states that the company is in healthcare[1] but then says he's a 43 year old director[2], which still tracks, but then he says he's been 20 years in "big law"[3], then as a it director in fintech[4]. He says he's changed jobs twice in the last two years[5]. I gave up looking after just the first page of his post history.
[1] https://old.reddit.com/r/sysadmin/comments/1cqn3qa/whats_the...
[2] https://old.reddit.com/r/ITManagers/comments/1cqa0cp/genai_i...
[3] https://old.reddit.com/r/sysadmin/comments/1cotpdb/how_is_wo...
[4] https://old.reddit.com/r/Ameristralia/comments/1cnyxsh/what_...
[5] https://old.reddit.com/r/Intune/comments/ncj7oa/ios_sso_exte...
by sgt101 on 5/13/2024, 4:28:46 PM
I ran a stressed app some years ago. We only had a wee little backend because our revenue was v.low, but we wanted to do stuff like sleep inside and eat, and so were motivated to cut costs to make profit.
What I did was make a table of all the queries that were being run on my backend, and I ordered them by the number of times that they were called and the cost of calling them (I honestly can't remember the measure I used for that but it was like cputime*memory or similar). I then did two things for the top queries.
1) Optimised them where I could.
2) Looked for where they were being used and tried to stop it.
(2) was very successful.
by ktpsns on 5/13/2024, 3:37:48 PM
It's hard to believe for me how you would not start buying own hardware at this scale. In particular when the hyperscalers (at first glance) don't have anything to provide to match the needs.
by __turbobrew__ on 5/13/2024, 3:18:55 PM
This seems like a good use case for Spanner? The pain would be in migrating the backend to gke, but is you are hitting the limits of what azure can do you are going to have to migrate at some point.
by tristor on 5/13/2024, 4:53:01 PM
Having been through similar situations in my past life, I can confidently say that they don't need more CPU cores, they need to start really looking at their architecture holistically and identifying the critical path that can be rewritten in priority order for performance. At this point, throwing more hardware at the problem is the wrong thing to do /even/ if it temporarily kicks the can down the road. They have a fundamental system design issue that needs to be addressed, likely piecemeal and prioritized. The first step should be adding more performance instrumentation.
by coolkil on 5/13/2024, 4:44:52 PM
Shame it is running ms sql. anything Postgress, oracle or db2 and it might have been a candidate for running on a IBM Linuxone might even be a valid contender for the cost it is currently running at.