Summary
-
GCC’s addition of 3 to the branch-misprediction scale makes it wary of branch-mispredictions.
-
The SPEC CPU 2017 NAB test showed a ~12% speedup on modern AMD and Intel CPUs.
-
The change should reach GCC 17 in 2027.
It’s been a strange month for really small code tweaks that have yielded noticeable performance wins. It was just a few days ago that we have learned someone changed three lines of code in the Linux kernel and succeeded thanks to this, the memory speed increases by 5%. Now someone has stepped forward and claimed that a single line change in the GCC compiler added a 12% performance boost for modern AMD and Intel chips in the SPEC CPU 2017 benchmark.
Adding 3 to the variable made a big gain for the new AMD and Intel processors
In all fairness, the 3 was a very impressive addition
As seen PhoronixIntel software engineer Lili Cui found a way to squeeze more performance into the GCC compiler with minimal changes. The exact process Cui uses to get that extra performance is a bit complicated, so let’s break it down.
When the CPU runs code, it tries to “cheat” to increase its performance. When the CPU encounters a decision in code (such as an if/else statement), it must “wait” for the computation to tell it which path to take. However, with a process called “speculative execution,” the CPU predicts which path the program will take and begins processing subsequent code ahead of time.
It’s kind of like sending a friend a text asking if they want a burger or pizza, then predicting they’ll want a burger and getting a grilled patty. If you’re right, you can cook the burger faster and impress your friend with your speed. If you make a mistake, you have to stop, clean everything up, and bake pizza instead. Similarly, a wrong guess from the CPU means it will go back on the decision and take another path.
This is called “branch error prediction” and Cui noticed running them on modern CPUs costs a lot more performance than people first think:
Modern CPUs have deeper pipelines, making branch mispredictions more expensive. Increasing this value encourages conversion by avoiding pipeline stops from incorrectly predicted branches.
To fix this, Cui changed a line of code that sets the branch misprediction scale GCC uses to measure whether the internal code generation math is worth the risk of gambling on a branch. All Cui has done is add 3 to the scale, and now the compiler is more careful in creating standard branching code. This increases the likelihood of optimizing the code in some other way, such as branchless sequencing.
After finishing, Cui ran his processors through the SPEC CPU 2017 benchmark called 544.nab_r Nucleic Acid Builder (NAB), which calculates the physics and chemistry of molecules. Cui noted a 12% increase in performance for both Intel and AMD chips because they spent less time backtracking and more time running code.
It will be a while before we see this change as it is being consolidated for GCC 17 which will be released next year. However, this is a great story about how one small tweak can make a big difference.






