Hacker News Clone

Borrow Checking, RC, GC, and Eleven Other Memory Safety Approaches

by PotatoPancakes on 12/16/2024, 9:25:44 PM with 140 comments

by naasking on 12/19/2024, 1:10:47 PM
More people need to read up on C#'s ref's:
https://em-tg.github.io/csborrow/
These kinda-sorta fall under borrow checking or regions, just without any annotations. Then again, Ada/Spark's strategy also technically falls under Tofte-Talpin regions:
https://www.cs.cornell.edu/people/fluet/research/substruct-r...
by chrismorgan on 12/19/2024, 9:14:59 AM
> Curséd
With an acute accent, that should be roughly /ˌkɜːrˈseɪd/ “curse-ay-d”. (Think “café” or “sashayed”.)
The stylised pronunciation being evoked is roughly /ˈkɜːrˌsɛd/, “curse-ed”, and would be written with a grave accent: “cursèd”.
by tialaramex on 12/19/2024, 11:18:18 AM
The list gets very woolly by the end. CHERI exists (though not at volume), Cornucopia Reloaded is a research paper, "plus some techniques to prevent use-after-free on the stack" is entirely hand waving.
It is really good as food for thought though.
by willvarfar on 12/19/2024, 9:13:40 AM
Meta comment, but I really like the formatting of the blog post!
It reminds me of the early days of the web, when text was king and content was king. I particularly like the sidenotes in the margins approach.
(Hope the author sees this comment :) Hats off)
by bitbasher on 12/19/2024, 5:05:26 PM
I'm kinda torn. It seems there are only three approaches.
1. laissez-faire / manual memory management (c, c++, etc)
In this approach, the programmer decides everything.
2. dictatorship / garbage collection (java, go, etc)
In this approach, the runtime decides everything.
3. deterministic / lifetime memory management (rust, c with arenas, etc)
In this approach, the problem determines everything.
by mgaunard on 12/19/2024, 9:32:54 AM
The fact that re-using a slot for a different object of the same type is considered a memory safety technique is ridiculous.
by hawski on 12/19/2024, 10:13:34 AM
That is very informational. Thank you.
I am interested in Vale and it feels very promising, though because my interested in bootstrapping I don't like that it is written in Scala. I know, that is shallow, but that's a thing that limits my enthusiasm.
If you are like me and don't like jumping around between notes and text and you prefer to read the notes anyway, here is a little snippet you can run in Web Inspector's Console:
```
  document.querySelectorAll(".slice-contents a[data-noteid]").forEach(e => {document.querySelectorAll('.slice-notes [data-noteid="' + e.attributes["data-noteid"].nodeValue + '"] p').forEach(p => {p.style.fontSize = 'smaller'; e.parentNode.insertBefore(p, e)}); e.remove() })
```
It will replace note links with notes themselves making them smaller, because they will not always fit smoothly.
by alexisread on 12/19/2024, 11:46:00 AM
Previous discussion:
https://news.ycombinator.com/item?id=40146615
https://news.ycombinator.com/item?id=41974185
by westurner on 12/19/2024, 5:58:16 PM
From https://news.ycombinator.com/item?id=33560227#33563857 :
- Type safety > Memory management and type safety: https://en.wikipedia.org/wiki/Type_safety#Memory_management_...
- Memory safety > Classification of memory safety errors: https://en.wikipedia.org/wiki/Memory_safety#Classification_o...
- Template:Memory management https://en.wikipedia.org/wiki/Template:Memory_management
- Category:Memory_management https://en.wikipedia.org/wiki/Category:Memory_management
by tliltocatl on 12/21/2024, 12:31:11 PM
Not mentioned: do not do any dynamic allocation at all. Never ever. Everything is either a global variable or goes on the stack. Doesn't work when you need to handle unknown input size, but when you need to make sure you don't OOM ever, it's the only way. Stack overflow is still a possibility, unfortunately, because existing languages cannot provide any guarantee here (Zig tried, but didn't got it done afair).
The only real problem with this approach is code reuse, because library writers will insist on opaque structs and malloc rather than letting the caller allocate.
by immibis on 12/20/2024, 2:37:00 AM
MMM++ is a variation of standard malloc/free. You can still UAF, but only to another object of the same type, which may or may not prevent an exploit.
Something that's missing is full-on formal verification where you write unrestricted C code and then mathematically prove it doesn't have any bugs. Nobody does this because proving a C program is correct is harder than mining a bitcoin block by hand, but it's useful to anchor one end of the safety/freedom spectrum. Many other approaches (such as borrow checking) can also be viewed as variants of this where you restrict the allowed program constructs to ones that are easier to prove things about.
by nahuel0x on 12/19/2024, 8:39:31 AM
It's surprising to see an article with such a large encompassing of different techniques, hybrid techniques and design interactions with the type system, but is more surprising that a whole dimension of memory (un)management was left out: memory fragmentation
by eklitzke on 12/20/2024, 12:13:16 AM
I don't understand why they say that reference counting is "slow". Slow compared to what? Atomic increments/decrements to integers are one of the fastest operations you can do on modern x86 and ARM hardware, and except in pathological cases will pretty much always be faster than pointer chasing done in a traditional mark and sweep VMs.
This isn't to say reference counting is without problems (there are plenty of them, inability to collect cyclical references being the most well known), but I don't normally think of it as a slow technique, particularly on modern CPUs.
by DanielHB on 12/19/2024, 11:29:35 AM
I am not experienced with rust and borrow checkers, but my impression is that borrow checkers also statically ensures thread/async safety while most other memory safety systems don't. Is this accurate?
by lilyball on 12/20/2024, 6:15:49 AM
I don't see any mention of epoch-based garbage collection (see crossbeam https://docs.rs/crossbeam/latest/crossbeam/epoch/index.html). Generational References sounds like a related concept but it's not the same. I'm also surprised nobody's mentioned that one lance corporal goat yet.
by pizlonator on 12/19/2024, 2:00:16 PM
The way you make garbage collection deterministic is not by doing regions but by making it concurrent. That’s increasingly common, though fully concurrent GCs are not as common as “sorta concurrent” ones because there is a throughput hit to going fully concurrent (albeit probably a smaller one than if you partitioned your heap as the article suggests).
Also, no point in calling it “tracing garbage collection”. Its just “garbage collection”. If you’re garbage collecting, you’re tracing.
by the__alchemist on 12/19/2024, 3:34:44 PM
After pondering, my single favorite capability of rust is this:
```
  fn modify(val: &mut u8) {
      // ...
  }
```
No other language appears to have this seemingly trivial capability; their canonical alternatives are all, IMO, clumsier. In light of the article, is this due to Rust's memory model, or an unrelated language insight?
by andrewstuart on 12/19/2024, 7:05:48 AM
I’d love to see a language that kept everything as familiar as possible and implement memory safety as “the hard bit”, instead of the Rust approach of cooking in multiple different new sub languages and concepts.
by xmcqdpt2 on 12/19/2024, 1:26:23 PM
Not a fan of the framing of the article. Firstly, there are millions of Mayans alive today,
https://en.wikipedia.org/wiki/Maya_peoples
and secondly, the reason why the pre-Colombian cultural texts and script are not in use today, even by the people who speak the 28 Mayan languages currently in use, is because of genocide by Columbus and those that followed. The Catholic church destroyed every piece of Mayan script they could get their hands on.
The article reads like the author is not aware of these basic facts of American geography and history.
by 4ad on 12/19/2024, 1:28:47 PM
> Interaction nets are a very fast way to manage purely immutable data without garbage collection or reference counting.[...] HVM starts with affine types (like move-only programming), but then adds an extremely efficient lazy .clone() primitive, so it can strategically clone objects instead of referencing them.
This is wrong, Interaction nets (and combinators) can model any kind of computational systems, including ones that use mutation. In fact, ICs are not really about types at all, although they do come from a generalization of Girard's proofs nets, which came from work in linear logic.
The interesting thing about ICs is that they are beta-optimal (any encoding of a computation will be done in the minimum number of steps required -- there is no useless work being done), and maximum-parallel with only local synchonization (all reduction steps are local, and all work that can be parallelized will be parallelized).
Additionally ICs have the property that any encoding of a different computational system in ICs will preserve the asymptotic behavior of all programs written for the encoded computational system. In fact, ICs are the only computational system with this property.
Interaction nets absolutely require garbage collection in the general sense. However, interaction combinators are linear and all garbage collection is explicit (but still exists). HVMs innovation is that by restricting the class of programs encoded in the ICs you can get very cheap lambda duplication and eschew the need for complex garbage collection while also reducing the overhead of implementing ICs on regular CPUs (no croissants or brackets, see Asperti[1] for what that means).
Having a linear language with the above restriction allows for a very efficient implementation with a very simple GC, while maximizing the benefits of ICs. In principle any language can be implemented on top of ICs, but to get most benefits you want a language with these properties. It's not that HVM starts with affine types and an efficient lazy clone operation, it's that a linear language allows extremely efficient lazy cloning (including lambda cloning) to be implemented on top of ICs, and the result of that is HVM.
> The HVM runtime implements this for Haskell.
This is very wrong. HVM has nothing to do with Haskell. HVM3 is written in C[2], HVM2 has three implementations, one in C[3], one in Rust[4], and a CUDA[5] one. HVM1 was just a prototype and was written in Rust[6].
HOC[7], the company behing HVM provides two languages that compile to HVM, Bend[8], and Kind[9]. Bend is a usual functional language, while Kind is a theorem prover based on self types.
Haskell is not involved in any of these things except that the HVM compiler (not runtime) is written in Haskell, but that is irrelevant, before Haskell it used to be written in TypeScript and then in Agda (Twitter discussion, sorry, no reference). It's an implementation detail, it's not something the user sees.
Please note that HVM adds some stuff on top of ICs that makes it not strictly beta-optimal, but nevertheless the stuff added is useful in practice and the practical downgrade from theoretical behaviour is minimal.
[1] Andrea Asperti, The Optimal Implementation of Functional Programming Languages, ISBN-13: 978-0060815424
[2] https://github.com/HigherOrderCO/HVM3/blob/main/src/HVML/Run...
[3] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.c
[4] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.rs
[5] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.cu
[6] https://github.com/HigherOrderCO/HVM1
[7] https://higherorderco.com
[8] https://github.com/HigherOrderCO/bend
[9] https://github.com/HigherOrderCO/kind
by nemetroid on 12/19/2024, 11:25:44 AM
No mention of RCU?
by amelius on 12/19/2024, 12:16:24 PM
I like many of the ideas of Rust, but I still think it is an unsuitable language for most projects.
The problem is that it is very easy to write non-GC'd code in a GC'd language, but the other way around it is much much harder.
Therefore, I think the fundamental choice of Rust to not support a GC is wrong.
by bluGill on 12/19/2024, 2:12:48 PM
Why is garbage collection called memory safety? Garbage collection in whatever form is only memory safe if it doesn't free memory that will still be used. (which means if you actually get all your free calls correct C is memory safe - most long lived C code bases have been beat on enough that they get this right for even the obscure paths).
Use after free is important, but in my experience not common and not too hard to track down when it happens (maybe I'm lucky? - we generally used a referenced counted GC for the cases where ownership is hard to track down in C++)
I'm more worried about other issues of memory safety that are not addressed: write into someone else's buffer - which is generally caused by write off the end of your buffer.