Gentoo Wiki


It has been suggested that this article or section be merged into Acovea. (Discuss)

This article was taken out of the Gentoo on Acer TM803 HOWTO, as it can be useful to a wider public. Please help to turn it into a more general description of avocea's usage.

Please improve it in any way that you see fit, and remove this notice {{Cleanup}} from the article. For tips on cleaning and formatting see Cleanup process



For the more advanced gentoonians: I've now used a tool called acovea to test my CPU (a first-generation Pentium M) against several CFLAGS. It's a tool based on a evolutionary algorithm to test different GCC-layouts against different CPU types. Here is the output of the acovea tool:
Layout Pentium-M
optimistic options:
  -fno-delayed-branch (1.056)
       -fcaller-saves (1.577)
     -freorder-blocks (1.198)
  -freorder-functions (1.056)
         -falign-jumps (1.34)
   -finline-functions (1.435)
   -frename-registers (1.198)
                -fweb (2.428)
 -fomit-frame-pointer (1.529)
   -fno-trapping-math (1.198)
pessimistic options:
 -fno-guess-branch-probability (-2.209)
            -fno-if-conversion (-2.304)
           -fno-if-conversion2 (-1.121)
             -fcse-skip-blocks (-1.452)
                     -fregmove (-1.215)
                 -funroll-loops (-1.83)
-fbranch-target-load-optimize2 (-1.405)
                  -mfpmath=387 (-1.499)
                  -mfpmath=sse (-1.688)
              -mfpmath=sse,387 (-1.878)
     -momit-leaf-frame-pointer (-1.972)
Conclusion: The fastest code would be produced by this CFLAGS:
CFLAGS="-march=pentium-m -pipe -fno-delayed-branch -fcaller-saves -freorder-blocks 
        -freorder-functions -falign-jumps -finline-functions -frename-registers 
        -fweb -fomit-frame-pointer -fno-trapping-math -falign-functions=64"
Layout Pentium-4
I've also tested acovea against the Pentium-4 layout, so here are the results:
optimistic options:
     -fno-if-conversion2 (1.291)
 -foptimize-sibling-calls (1.08)
      -fcse-follow-jumps (1.417)
                  -fgcse (2.261)
   -frerun-cse-after-loop (1.46)
        -fschedule-insns (1.164)
       -fstrict-aliasing (1.333)
      -freorder-functions (1.08)
      -frename-registers (1.417)
    -mno-align-stringops (1.164)
  -minline-all-stringops (1.544)
pessimistic options:
          -fno-if-conversion (-1.619)
           -fstrength-reduce (-1.071)
                 -fpeephole2 (-1.534)
           -fschedule-insns2 (-1.197)
              -falign-labels (-1.113)
              -funroll-loops (-1.703)
          -funroll-all-loops (-1.703)
                -mfpmath=sse (-1.956)
            -mfpmath=sse,387 (-1.914)
        -fomit-frame-pointer (-1.619)
   -momit-leaf-frame-pointer (-1.534)
 -funsafe-math-optimizations (-1.028)
Conclusion: On this layout the fastest would be:
CFLAGS="-march=pentium4 -pipe --fno-if-conversion2 -foptimize-sibling-calls -fcse-follow-jumps
        -fgcse -frerun-cse-after-loop -fschedule-insns -fstrict-aliasing -freorder-functions
        -frename-registers -mno-align-stringops -minline-all-stringops"
Layout Pentium-3

The test against the Pentium-3 was concluding the following results:

optimistic options:
                  -fforce-mem (2.476)
 -fdelete-null-pointer-checks (1.419)
                     -fnew-ra (2.188)
                    -mieee-fp (1.034)
   -maccumulate-outgoing-args (1.082)
       -minline-all-stringops (1.13)
         -fomit-frame-pointer (2.236)
pessimistic options:
-fno-guess-branch-probability (-1.946)
           -fno-if-conversion (-2.138)
          -fno-if-conversion2 (-1.081)
                       -fgcse (-1.129)
            -fstrength-reduce (-1.321)
               -fcaller-saves (-1.081)
                  -fpeephole2 (-1.706)
             -fschedule-insns (-1.754)
               -funroll-loops (-1.129)
                 -mfpmath=387 (-1.946)
                 -mfpmath=sse (-1.225)
    -momit-leaf-frame-pointer (-1.994)
Reading this, the fastest for Pentium-3 would be:
CFLAGS="-march=pentium3 -pipe -fforce-mem -fdelete-null-pointer-checks -fnew-ra -mieee-fp
        -maccumulate-outgoing-args -minline-all-stringops -fomit-frame-pointer"
All original texts are taken from the GCC Homepage for demonstrative puposes only.
pessimistic: -funroll-loops
  • Original text: Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies both -fstrength-reduce and -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.
  • Comment: Really astounding about this analysis is the fact that -funroll-loops is at all three layout a pessimistic one. Merely all GCC featuring sites point this flag out as a speed increasing flag. Only the original GCC Homepage says it could slow down the entire code. This is very interesting, as it doesn't only seem to not work, it also seems to slow down the whole bunch of other flags.
Retrieved from ""

Last modified: Sun, 31 Aug 2008 23:47:00 +0000 Hits: 6,470