
Question:
This code...
bool condSet(int cond, int a, int b) {
return cond ? a : b;
}
..Generates for gcc 6.3...
test edx, edx
setne al
test edi, edi
jne .L6
rep ret
.L6:
test esi, esi
setne al
ret
.. For icc 17...
test edi, edi
cmovne edx, esi
mov eax, 1
test edx, edx
cmove eax, edx
ret
..And for clang 3.9
test edi, edi
cmove esi, edx
test esi, esi
setne al
ret
<s>Why do we have theses differences, for a code pattern, that I'd expect to be common? They all rely on conditional instruction, setne, cmovne, cmove, but gcc has a branch as well, and they all use different order of instructions and parameters</s>.
What pass in the compiler is responsible for this code generation? Is the difference due to how the register allocation is done; how the general dataflow analysis is done; or do the compiler pattern match against this pattern when generating the code?
The code and the asm listings: <a href="https://godbolt.org/g/7heVGz" rel="nofollow">https://godbolt.org/g/7heVGz</a>
Answer1:Changing the return type to int
results in branchless code from all three compilers, using the test/cmov
strategy.
I'd guess that gcc decides that booleanizing both sides of the conditional would be too much work, and decides to use a branch. Maybe it doesn't realize that it's the <em>same</em> work, and the expression can actually be done the other way (select the right input and then booleanize that).
The code it makes does booleanize b
, and only then tests the condition and booleanizes a
. So when cond
is true, it actually runs both test
/ setnz
pairs.
This smells like a missed-optimization bug. (Or an optimization-run-amok bug, where it shoots itself in the foot by applying the return-type to both inputs of the ?:
instead of only to the result).
Reported as <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78947" rel="nofollow">GCC Bug 78947</a>.
<hr />Until that's fixed, you can <a href="http://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(j:1,options:(colouriseAsm:'0',compileOnChange:'0'),source:'%0Abool+condSet_orig(int+cond,+int+a,+int+b)+%7B%0A++return+cond+%3F+a+:+b%3B%0A%7D%0A%0Abool+condSet_gcc_handhold(int+cond,+int+a,+int+b)+%7B%0A++int+tmp+%3D+cond+%3F+a+:+b%3B%0A++return+tmp%3B%0A%7D%0A'),l:'5',n:'1',o:'C%2B%2B+source+%231',t:'0')),k:23.54577223656297,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang390,filters:(b:'0',commentOnly:'0',directives:'0',intel:'0'),options:'-O1+-std%3Dgnu%2B%2B11+-Wall+-Wextra+-fverbose-asm'),l:'5',n:'0',o:'%231+with+x86-64+clang+3.9.0',t:'0')),k:30.49177504634904,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:icc17,filters:(b:'0',commentOnly:'0',directives:'0',intel:'0'),options:'-O3++-Wall+-Wextra+-fno-verbose-asm'),l:'5',n:'0',o:'%231+with+x86-64+icc+17',t:'0')),k:20.96245271708799,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:g63,filters:(b:'0',commentOnly:'0',directives:'0',intel:'0'),options:'-O3+-Wall+-Wextra+-fverbose-asm'),l:'5',n:'0',o:'%231+with+x86-64+gcc+6.3',t:'0')),k:25,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4" rel="nofollow">get gcc to make code like clang / icc</a> by splitting it into two steps:
bool condSet(int cond, int a, int b) {
int tmp = cond ? a : b; // better asm from gcc this way
return tmp;
}