• by sourcefrog on 6/1/2023, 2:01:28 PM

    Hi, distcc's original author here. It's really nice that people are still enjoying and using it 20 years later.

    I have a new project that is in a somewhat similar space of wrapping compilers: https://github.com/sourcefrog/cargo-mutants, a mutation testing tool for Rust.

  • by jchw on 6/1/2023, 11:19:13 AM

    Related:

    https://github.com/icecc/icecream - another option that does what distcc does, but aimed at a somewhat different use case.

    https://ccache.dev/ - a similar idea but provides caching of build outputs instead of distributing builds. You can use it together with distcc to achieve even better performance.

  • by dspillett on 6/1/2023, 9:21:19 AM

    Back at Uni (two-and-a-half decades ago now), I setup a hacky version of this sort of thing to distribute compilations over several workstations:

    * use make -j to run multiple task at once

    * replace¹ gcc with a script that picked a host and ran the task on that²

    This had many limitations that distcc lists it doesn't have (the machines had to have the same everything installed, the input and output locations for gcc had to be on a shared filesystem mounted in the same place on each host, it only parallelised the gcc part not linking or anything else, and it broke if any make steps had side-effects like making environment changes, etc.) and was a little fragile (it was just a quick hack…), but it worked surprisingly well overall. For small processes like our Uni work it didn't make much difference over make -j on its own³ but for building some larger projects like libraries we were calling that were not part of the department's standard Linux build, it could significantly reduce build times. On the machines we had back then a 25% improvement⁴ could mean quite a saving in wall-clock time.

    One issue we ran into using it was that some makefiles were not well constructed in terms of dependency ordering, so would break⁵ with make -j even without my hack layered on top. They worked reliably for sequential processing which is presumably the only way the original author used them.

    ----

    [1] well, not actually replace, but arrange for my script to be ahead of the real thing in the search path

    [2] via rlogin IIRC, this predates SSH (or at least my knowledge of it)

    [3] despite the single-core CPUs we had at the time, task concurrency still helped on a single host as IO bottlenecks meant that single core was less aggressively used without -j2 or -j3

    [4] we sometimes measured noticeably more than that, but it depended on how patballisable the build actually was, and the shared IO resource was also shared by many other tasks over the network so that was a key bottleneck

    [5] sometimes intermittently, so it could be difficult to diagnose

  • by omerhj on 6/1/2023, 2:41:25 PM

    Nearly twenty years ago I had a little server farm of old PCs. Two or three Pentium-133s, one dual Pentium Pro 200 machine, and my pride and joy, a Pentium 3 running at 600 MHz. I was trying to get familiar with Gentoo and to make recompiling everything all the time more bearable I set up distcc so my P3 could do most of the work. It worked very well!

    But after a few weeks every Gentoo box in the house started crashing regularly. It took me a while to figure out what was going on: one of the slower machines had developed a single-bit memory error and was sharing corrupted .so files with all other machines.

  • by donaldihunter on 6/1/2023, 10:57:09 AM

    25+ years ago, our company used Clearcase for version control and it's clearmake had distributed build capability. Clearcase used a multi version file system (MVFS) and had build auditing so clearmake knew exactly what versions of source files were used in each build step. It could distribute build requests to any machine that could render the same "view" of the FS.

    Even without distributed builds, clearmake could re-use .o files built by other people if the input dependencies were identical. On a large multi-person project this meant that you would typically only need to build a very small percentage of the code base and the rest would be "winked in".

    If you wanted to force a full build, you could farm it out across a dozen machines and get circa 10x speedup.

    Clearcase lost the edge with the arrival of cheaper disk and multi-core CPUs. I'd say set the gold standard for version control and dependency tracking and nothing today comes close to it.

  • by anybodyz on 6/1/2023, 8:28:20 AM

    Fastbuild https://www.fastbuild.org/docs/home.html is the free distributed compilation system many game companies use. The combination of automatic unity builds (simply appending many .cpp source files together into very large combined files), caching and distributed compilation together gives you extremely fast C++ builds.

    Also supports creating project files that are compatible with XCode and Visual Studio so you can just build from those IDE's and a pretty flexible dependency based build file format that can accomodate any kind of dependency.

  • by Erlangen on 6/1/2023, 9:14:37 AM

    Isn't it a misnomer to call it a "compiler"? Even it's github README says otherwise,

    > distcc is not itself a compiler, but rather a front-end to the GNU C/C++ compiler (gcc), or another compiler of your choice. All the regular gcc options and features work as normal.

  • by awestroke on 6/1/2023, 8:04:05 AM

    We used it at a previous job. Every dev computer in the building ran distcc, compiles were distributed and finished super fast.

  • by mansilladev on 6/1/2023, 9:10:34 AM

    Good gracious. Just looked at CHANGELOG. It was 20 years ago this month that I made modest code contribution to this project.

  • by hiyer on 6/1/2023, 8:52:11 AM

    We used to use this in a previous company. It reduced the build time from ~20 minutes to ~2 minutes; until a bug in the ld linker at the time (linking is not distributed - only compilation is) pushed it back up to 20 minutes. We had a hell of a time finding the issue, but luckily there was a newer versions of binutils available where it was fixed; and upgrading to it got the build time back under 2 minutes.

    Fun times!

  • by ggerules on 6/1/2023, 4:38:40 PM

    Some other multi machine options that have worked well for me, well beyond just compilation of C/C++ on multiple machines, with multiple gpu(s) and cores.

    1) set up passwordless, ssh.

    and

    2) use the gnu parallel. https://www.gnu.org/software/parallel/

    gnu parallel is super flexible, very useful.

  • by ihaveajob on 6/1/2023, 12:52:21 PM

    So many memories. Back in grad school I would run this on a handful of workstations to speed up compilations. Really neat tool, and a clever setup.

  • by sagarm on 6/1/2023, 8:33:10 AM

    I really prefer icecream to distcc: it handles toolchain distribution, scheduling, and discovery.

  • by iveqy on 6/1/2023, 10:02:51 AM

    I'm curious about the security implications with using distcc. Doesn't this mean that if one computer gets compromised, the attacker can run code on all other computers using distcc, or secretly inject malicious code in the build result.

    So using distcc means that all computers using it must be trusted. And that means that using it on "all developers computers to share the load" is good for performance but bad for security.

  • by choeger on 6/1/2023, 9:11:50 AM

    IIRC, when using distcc you must make sure that the Toolchain is ABI-perfect identified by its path.

    So:

    /opt/ourcompany/dev/bin/cc

    absolutely *needs* to be the same on all machines involved or you risk very hard to spot issues.

    For that reason, either use the absolute same distribution for everyone (and then run /use/bin/cc but watch out for alternatives!) or roll-out your own toolstack but make sure to put its version in the path.

  • by Cieric on 6/1/2023, 2:54:45 PM

    While it's purpose is different it can be used to do distributed compiling, so I'll leave it here.

    https://github.com/Overv/outrun

    Since I was just going down this rabbit hole recently, I kind of wonder if it's possible to set the filesystem on something more like the BitTorrent protocol so things like the libraries/compilers/headers that are used during compilation dont all need to come from the main pc. It probably wouldn't be useful until you reached a stupid number of computers and you started reaching the limits of the Ethernet wire, but for something stupid that can run on a pi cluster it would be a fun project.

  • by torarnv on 6/1/2023, 9:22:09 AM

    How do these distributed build tools work across the Internet, i.e. outside of a local LAN office setting? Does the increased latency and slower bandwidth become a bottleneck?

  • by squarefoot on 6/1/2023, 9:49:46 AM

    Good memories! I had fun with distcc by compiling kernels across a few local machines back when desktops for mere mortals were dog slow, and it helped a lot. I never used it for cross compiling though, which is something that could help today when starting compilations from small embedded boards. Did anyone have success in a mixed environment, such as a small ARM board with native GCC plus one or more faster x86 machines with cross compiling tools installed?

  • by ironbound on 6/1/2023, 8:39:22 AM

    Is there a reason to uses this over something like bazel build?

  • by jdlyga on 6/1/2023, 5:13:48 PM

    It's a shame that distcc and ccache aren't more well known. Ccache, in particular, saved me probably years worth of compile time working with Qt.

  • by lxe on 6/1/2023, 7:51:36 PM

    Ooh this brings back memories. This project made my embedded cross-compilation iteration speed go from like 2-30 minutes to 5. This was 20 years ago. I don't understand why this isn't more of a thing these days. I think Bazel can do this, but I've felt nothing but pain from using Bazel.

  • by anonymousDan on 6/1/2023, 10:06:53 AM

    Can anyone explain the architecture/how it works at a high level? I get that it is distributed. Does it basically copy the complete source tree to every worker and have them compile some independent subset of the object files? Does performance scale linearly with the number of worker nodes?

  • by londons_explore on 6/1/2023, 1:15:11 PM

    > distcc sends the complete preprocessed source code across the network for each job, s

    I assume that for template-heavy c++, that could easily be hundreds of gigabytes for 1 gigabyte of c++ code to compile...?

    If you're working from a laptop, surely the wifi connection will by far be the bottleneck?

  • by IanCal on 6/1/2023, 10:12:09 AM

    I used this in a uni lab (good lord 15 years ago) when we needed to compile for a robot. It took maybe an hour or two to compile run directly on the robot but I was able to setup distcc to run on all the lab machines in that room and get things done fast.

  • by idatum on 6/1/2023, 4:49:47 PM

    Used distcc back in mid-2000s to crosscompile NetBSD kernel/userland for a TS-7200 SBC ARM device. Back then those devices didn't have much processing power, would have taken days otherwise.

  • by RustyRussell on 6/1/2023, 9:36:01 AM

    Wow, it's still active! I remember when MBP wrote it while we were working at OzLabs together.

    One day I'll get back to that rewrite of ccontrol using modern distcc's features...

  • by linhns on 6/1/2023, 6:01:39 PM

    Title may be a bit confusing since skimming the project's homepage reveals that it's not a compiler, just frontend for other C++ compilers.

  • by oarfish on 6/1/2023, 11:42:54 AM

    The biggest problem with distcc is that it fails with architecture-specific instruction sets like avx.

    A problem not shared by icecc.

  • by Alifatisk on 6/1/2023, 8:48:12 AM

    Had no idea about this tech, very cool!

  • by timetraveller26 on 6/1/2023, 4:20:57 PM

    This is such a nice tool, I use it in my arch linux machines, it even supports cross compiling!

  • by egwynn on 6/1/2023, 12:07:07 PM

    Xcode used to have native support for this, long ago. And it had mdns support too!

  • by mkarliner on 6/1/2023, 8:02:19 AM

    I've used it for remote raspberry pi compilation. Works very nicely.

  • by Avlin67 on 6/1/2023, 5:27:30 PM

    storage becomes much faster than network today, so is it still worth it ? (naive question sorry) or maybe the cpu demand is much bigger than IO demand ?

  • by _joel on 6/1/2023, 9:28:41 AM

    Used to use this with Gentoo when doing emerges. Reusing old P3s and stuff, those were the day..

    Insert XKCD "Compiling..." meme :)