• by nequo on 2/22/2023, 12:28:42 AM

    This is the commit in the Rust repository that achieves the speedup by reducing branch misprediction:

    https://github.com/rust-lang/rust/commit/b7089e0dd3e988270f3...

    Do I understand correctly why this works?

    In the `if` condition on line 201, `child + i < v.len()` is easy to predict but `is_less(...)` is hard. Because of this, `child += 1` on line 202 needs to be walked back with a non-negligible probability. This wastes time.

      200   // Choose the greater child.
      201   if child + 1 < v.len() && is_less(&v[child], &v[child + 1]) {
      202       child += 1;
      203   }
      204
      205   // Stop if the invariant holds at `node`.
      206   if !is_less(&v[node], &v[child]) {
    
    After moving the `is_less(...)` call from the `if` condition to the consequent on line 205 below, we still need to fetch both `child += ...` and `is_less(...)` but we are much less likely to need to walk these instructions back. We waste less time and we reach the `if` condition on line 209 sooner.

      200   // Choose the greater child.
      201   if child + 1 < v.len() {
      202       // We need a branch to be sure not to out-of-bounds index,
      203       // but it's highly predictable.  The comparison, however,
      204       // is better done branchless, especially for primitives.
      205       child += is_less(&v[child], &v[child + 1]) as usize;
      206   }
      207
      208   // Stop if the invariant holds at `node`.
      209   if !is_less(&v[node], &v[child]) {