Peephole optimization ===================== Optimization opportunities: Combining: OPXXX a -> b OPYYY b -> c to OPZZZ a -> c Loop to SIMD/vector-reg: (needs a "loop-start-hint" flag on the EBB -or- a loop-start-hint op) OPXXX ... -> address OPXXX ... -> counter (divisible by 2^K) OPXXX ... -> b start: LOOP_START_HINT OPXXX b, [address] OPADD address, 1 OPSUB counter, 1 -> counter OPJNZ counter, start to OPXXX ... -> address OPXXX ... -> counter/(2^K) OPXXX widen(...) -> b start: LOOP_START_HINT OPXXX b, [address] OPADD address, 2^K OPSUB counter, 2^K -> counter OPJNZ counter, start This will of course clobber (some) SIMD/vector regs. - OK if no SIMD/vector regs are used - if SIMD/vector regs are used outside of the loop EBB, then it/they need to be saved. - if SIMD/vector regs are used inside the loop EBB, then those cannot be used in the optimization. Architecture selection / auto-detection ------------------------------------- Things like SIMD and combined operations can require knowledge of the CPU ISA version. This could be specified in the multi-arch triple: i586-linux-gnu x86, Pentium or later i686-linux-gnu x86, Pentium Pro or later x86_64-linux-gnu x86_64 v1 ...what is the name of x86_64 v2 and v3 ...? Auto-detection would be cool, but it would require runtime code. And it should definitely be off by default, to avoid unintentional dependencies on a newer CPU ISA version than expected.