In the `if` condition on line 201, `child + i < v.len()` is easy to predict but `is_less(...)` is hard. Because of this, `child += 1` on line 202 needs to be walked back with a non-negligible probability. This wastes time.
200 // Choose the greater child.
201 if child + 1 < v.len() && is_less(&v[child], &v[child + 1]) {
202 child += 1;
203 }
204
205 // Stop if the invariant holds at `node`.
206 if !is_less(&v[node], &v[child]) {
After moving the `is_less(...)` call from the `if` condition to the consequent on line 205 below, we still need to fetch both `child += ...` and `is_less(...)` but we are much less likely to need to walk these instructions back. We waste less time and we reach the `if` condition on line 209 sooner.
200 // Choose the greater child.
201 if child + 1 < v.len() {
202 // We need a branch to be sure not to out-of-bounds index,
203 // but it's highly predictable. The comparison, however,
204 // is better done branchless, especially for primitives.
205 child += is_less(&v[child], &v[child + 1]) as usize;
206 }
207
208 // Stop if the invariant holds at `node`.
209 if !is_less(&v[node], &v[child]) {
This is the commit in the Rust repository that achieves the speedup by reducing branch misprediction:
https://github.com/rust-lang/rust/commit/b7089e0dd3e988270f3...
Do I understand correctly why this works?
In the `if` condition on line 201, `child + i < v.len()` is easy to predict but `is_less(...)` is hard. Because of this, `child += 1` on line 202 needs to be walked back with a non-negligible probability. This wastes time.
After moving the `is_less(...)` call from the `if` condition to the consequent on line 205 below, we still need to fetch both `child += ...` and `is_less(...)` but we are much less likely to need to walk these instructions back. We waste less time and we reach the `if` condition on line 209 sooner.